Readit News logoReadit News
bluetidepro · 3 years ago
Shazam is one of the very few apps in the past 20 years that STILL "wows" me. I have no idea how the tech works, and I even sort of like not knowing, to be honest. It's one of the very few apps out there that still in exist in a "magical" way to me. I am constantly impressed with how fast/easy it works, even with very obscure music. What an amazing app.

Fun quick related story, about 10 or more years ago there was a back tracking song on a TV show (Scrubs) that I really liked that was only in the Netflix version. It was just an instrumental song with some French sounding words speaking in it so there was no easy way to search for it. However, it was distinct enough that it didn't seem like something made just for the show. It was also pretty quiet and under some talking in the tv show scene. I had posted on reddit asking if anyone knew it, and never got any responses. I searched all over the web, but no source had the track details. It drove me crazy every time I would hear the song in re-watching the show, and I still could not track it down every few years when I tried again. Back then, Shazam had no cataloging of it so it wasn't in there either yet. However, when re-watching it a few years back again, I tried Shazam again and to my surprise it finally worked. I was blown away that Shazam was finally able to solve this 10+ year mystery. It was one of the coolest feelings every to scratch that itch finding this rare French song and hearing it in full. It was truly magical.

EDIT: Oh sorry, I didn't think anyone would actually care about the song itself lol It was called "Sans Hésitation" by the French-Canadian band "Chapeaumelon". https://www.youtube.com/watch?v=Ju4d3YQhByU - It's also interesting cause now the song does in the episode in tv music database sites. Very cool.

quantumduck · 3 years ago
Shazam used to wow me, but then as others mentioned in the replies it's essentially matching the signature of the sound to the sounds in the database. If it's one of the song, it gets matched fairly quickly.

Wow blew my mind was when Google introduced 'hum and we'll recognize the song for you' in Google assistant: https://www.google.com/amp/s/blog.google/products/search/hum...

It works so well even with my shitty humming - even my girlfriend can't recognize what the song is but Google can. It doesn't even have the same signature as the original audio file, just similar hums in a noisy environment and it still works. Black magic fuckery.

tasty_freeze · 3 years ago
> it's essentially matching the signature of the sound to the sounds in the database.

You aren't giving it enough credit. The algorithm uses just a few seconds from any part of the song, and has to deal with phone audio quality and often background noise. I mean, you can be in a bar with all that jabber and hold up the phone and it could pick out the song. The app on the phone does the preprocessing to the audio before it is sent to the server that does the matching ... using the comparatively miserable power of a 2001 era cell phone.

thehappypm · 3 years ago
What is a signature? How is a signature computed from a noisy audio stream, over a mall speaker? How is a signature computed from an arbitrary starting point?
jakereps · 3 years ago
> Wow blew my mind was when Google introduced 'hum and we'll recognize the song for you' in Google assistant

Their announcement actually made me roll my eyes a bit, as Soundhound had that functionality nearly a decade before. I had both SH and Shazam installed on my old phone for these usecases - now Shazam is baked into Siri so I don’t even have the app itself installed.

PheonixPharts · 3 years ago
> Essentially matching the signature of the sound to the sounds in the database.

And Dall-E 2 is just doing fuzzy hashing of images with text keys.

Shazam continues to amaze me because it "just works", and still feels more magical to me than most of the AI out there since it directly solve a major problem I didn't even think was solvable "what is this song!!?"

MiddleEndian · 3 years ago
I enjoy salsa dancing, but I don't know any Spanish, so I use that built-in Google functionality to hum various songs all the time to figure out what they're called.
kQq9oHeAz6wLLS · 3 years ago
Dude, spoiler alert. Did you miss the part where OP said they liked not knowing how it works??
prawn · 3 years ago
There's another where you tap a beat with your space bar, and a website tries to guess the song.
baxtr · 3 years ago
I need to download the Google app (and I presume sign in) to use that feature? Count me out
babypuncher · 3 years ago
What really wows me is that Shazam started in 2002. It was a phone number you would call on your cell phone and let it listen to your environment.

Way back then, it was doing everything you describe, but over low quality band limited telephone lines.

swores · 3 years ago
As an almost teenager at the time, that (Shazam over the phone with an answer texted back - which I used on a Nokia 3310) was the one thing that convinced me we would soon have pocket devices that really could do anything.

And while it took a few iterations (for me, from palm pilot to blackberry as a teenager, then eventually moving to iPhone after a few too many painful Blackberry upgrades - still missing that unified inbox though, as is everyone else I know who had a BB of that era... and frankly missing a great physical keyboard on a phone, too) I still am impressed on a daily basis that I do indeed have the device in my pocket that 12 year old me dreamed of.

HeckFeck · 3 years ago
I remember Sony Ericsson handhelds all came with TrackID back in the day (2007/2008) and I used it to name music I heard in public. It was the same idea. I think it charged £1-2 per track!
ipaddr · 3 years ago
Back then phone quality was much better. Cellphones killed that.
nemo1618 · 3 years ago
You can't just post a story like that and not link the song!

Personally, my main usage of Shazam is for identifying vaporwave samples. Often all you have to do is throw the song in Audacity, tweak the speed a bit, and Shazam it.

actionfromafar · 3 years ago
There are entire albums on Spotify which are full songs of 80s pop classics, played at a slower speed, then uploaded as a new album from another artist.
bluetidepro · 3 years ago
haha Sorry, I updated the post. Didn't think anyone would care about that part lol
robbyking · 3 years ago
The first time I heard of Shazam was on a road trip with a friend of mine who had minimal tech skills at best. I was already 10 years into my career as an engineer, and when he told me about it, I honestly didn't believe him; I was positive he was mistaken, and speculated it was a service similar to Aardvark[1], which was a peer-to-peer information engine.

I was wrong, of course, Shazam really did live up to its hype. I think it's interesting that the someone knows about how a technology works the more sceptical they are of what it is capable of.

[1] https://en.wikipedia.org/wiki/Aardvark_(search_engine)

jasonwatkinspdx · 3 years ago
I don't know about Shazam's current algorithm specifically, but years ago I worked at a place with a mathematician that worked on gracenote's algorithms, and asked him for the basics on how it works.

Basically, it records audio chopping it up into small segments and throwing them through a FFT. Then it takes that, and thinking of the data like a greyscale spectrograph image, runs it through a quantization filter that helps reject some noise, then converts that to locality sensitive hashes that are sent to the server. So basically FFT, filter, hash, lookup.

muizelaar · 3 years ago
This paper from the Shazam founder describes an approach for doing it: https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
kleiba · 3 years ago
from the Shazam founder

...who by the way holds a PhD from Stanford...

turkeygizzard · 3 years ago
Don't want to spoil it for you if you really don't want to know but I want to share to others in case they do because I found it so interesting when I first learned!

It looks like others shared the paper: https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

It's short but very cool. I read it a while ago and honestly can't pretend I fully grokked everything, but my understanding was that you can't just use a Fourier transformation alone. Noise would basically make this impossible.

So what I'd consider the key insight is that they compressed songs down to "fingerprints". IIRC they noticed that songs, even in noisy environments, preserved certain bits of information. Particularly, they could look at the spectrogram and see peaks of amplitude in the tapestry. They essentially set some radius and scanned the spectrogram. In a given radius, only the largest amplitude value in time and frequency would be preserved. So you've reduce a 3MB song to several bits.

This would be good enough for small databases (I think). But it's intractable for anything practical. So they built hashes out of these fingerprints using pairs of the preserved peak bits. They would choose a certain peak (called the anchor point), record its time offset from the start of the song, and then form pairs with other nearby peaks, saving the pairs of frequencies (but discarding e.g their amplitudes). So for each of these anchor points, you would get a 64 bit value: 32 bits for the time offset and track ID and 32 bits of frequency-pairs.

When you wanted to look up a song, they would fingerprint your snippet into multiple 32bit hashes and compare them against the frequency-pair hashes in the database. If a song was a good match, then you would see that your snippet matched against multiple hashes from that song, and specifically they matched linearly over time (I'm struggling to explain this bit but it's visually obvious if you look at Figure 3 in the paper).

I probably got some of this wrong, but I hope it's a helpful summary of the paper. I remember struggling to understand parts of it, so please let me know if anything I said is egregiously wrong!

sdwr · 3 years ago
I had a similar experience looking for a background track in an episode of This American Life. I couldn't remember which episode it was, and none of the lyrics were in English. Pretty sure I went through backwards through the episodes and listened to all the credited songs to find it. The song was 69 Police by David Holmes, which still feels perfect to me. https://www.youtube.com/watch?v=IWissIWxqKk

On the topic of background music, tons of original background music copies/imitates famous stuff. Sometimes it's "I wanted the sound of X but couldn't afford it", but there are some in-jokes in there too. Wish I could remember some examples.

goldcd · 3 years ago
I think all (so simple) you have to do is parse all the tracks ever made, and say generate a sequence of snapshots of what the tune sounds like and the delta. e.g. if it was notes (for simplicity) E,D,C,D,E,E,E,D,D,D,E,E,E is the start of "Mary had a little Lamb" Millions of tracks contain the note E. Many hundreds of thousands probably have the note D next - and as you work through the sequence, you're pruning down that list until you who what it is. Bit that makes my mind hurt though, is the data-structure you put those sequences into to make it quickly searchable. Users can start recording at any point in the song - so you can't just prune a tree down from a known starting point. There's going be be background nose - so you need some way of "when you have no choice left", I presume sticking wild-cards into the previous decisions, to see if you end up back on a known track.

Yeah - I think it's magic as well.

Other thoughts: I used it back in the UK when it launched, and the first track I ever used it on dialling (2580 - the numbers down the middle of your keypad) was also a French track (MC Solaar – La Vie Est Belle)

I always felt they missed a trick, just identifying music (and then trying to sell you stuff). Surely they could have used the same tech to seamlessly mix all music together. (i.e. take the sequences within tracks they find hard to differentiate, and then use these points to allow two tracks to be mixed together). What's the minimum number of tracks it would say take to seamlessly mix from Megadeth to Mozart?

zelos · 3 years ago
They used to have a paper on their website describing their algorithm in simplified form but I can't find it any more. Wikipedia has some details: https://en.wikipedia.org/wiki/Acoustic_fingerprint

I believe it's very sensitive to changes in timing, so it doesn't work on live performances etc.

(based on reading I did 13 years ago before an interview at Shazam, which to this day still remains my worst interview performance)

xhevahir · 3 years ago
>if it was notes (for simplicity) E,D,C,D,E,E,E,D,D,D,E,E,E is the start of "Mary had a little Lamb"

As far as I can tell these operate on audio, not symbolic music.

saghm · 3 years ago
My instinct is that it probably isn't as simple as you describe because not only are there multiple notes at a time in a given track (i.e. chords), but there are also several tracks playing at once! It's possible that they're literally generating data like {guitar 1: C chord, guitar 2: single note E, bass: single note E} for every point in time, but even then each instrument isn't playing the exact same rhythm most of the time, so the notes won't exactly line up. I guess I don't think it's completely computationally infeasible to do it this way, but it seems more likely that they're just trying to separate the music from the background noise and then try to find the closest match to the music audio as a whole rather than trying to separate it into component.
nibbleshifter · 3 years ago
> Surely they could have used the same tech to seamlessly mix all music together. (i.e. take the sequences within tracks they find hard to differentiate, and then use these points to allow two tracks to be mixed together). What's the minimum number of tracks it would say take to seamlessly mix from Megadeth to Mozart?

I noodled around with this idea in my free time a few years ago, got absolutely nowhere really usable with it (I probably put in a couple hundred hours).

I knew I was limited by my dataset (small), code quality (terrible) and understanding of musical theory (virtually nil).

Maybe I'll pick up that idea again - even doing beat matching would be kind of neat.

bambataa · 3 years ago
Shazam as a product feels a bit odd. Almost as if they’ve never quite outgrown their slightly sketchy “advertised on MTV2 alongside the Crazy Frog” origins.

They must have loads of data on songs people actually want to know yet never really managed to turn themselves into anything more sophisticated.

Deleted Comment

WalterBright · 3 years ago
> I have no idea how the tech works

It does a Fourier analysis of sections of the song, and puts the results in a database. A Fourier analysis yields what frequencies make up a waveform along with their amplitudes, so it is very compact.

duped · 3 years ago
Taking the DTFT of a signal yields exactly the same amount of information, so it's not really more compact. Shazam used a spectrogram (which is more information than the original signal) and searched for peaks to create a finger print.

It's not the analysis that is compact, but the fingerprint derived from it.

BrandoElFollito · 3 years ago
And now Chapeaumelon is wondering why the sudden surge of the youtube views. Comments are disabled so we cannot even help them to understand :)
terramex · 3 years ago
Shazam is great but a similar app that really "wowed" me around 2007 was Midomi - it could recognise humming with good results, even though I'm really bad at hitting right notes and key. It still exist but is not really talked about anymore, Shazam seems to have dominated that market.
oDot · 3 years ago
Don't skip the credits next time :)
bluetidepro · 3 years ago
haha wasn't in there! Def lookied :)
jb3689 · 3 years ago
Shazam is not particularly complex, however it is a very clever solution and a great example of applying a simple engineering concept broadly. I still hold it as one of the best examples of clever engineering in the app world
rrrrrrrrrrrryan · 3 years ago
It actually wasn't an app in the beginning - it was a phone number that you dialed.
caseyf7 · 3 years ago
Shazam is probably the only Apple watch app I ever use. Very convenient to have this on the watch.
dec0dedab0de · 3 years ago
which episode? and is it still in there? DVD, streaming, and syndication have some different songs because of rights issues.
bluetidepro · 3 years ago
https://scrubs.fandom.com/wiki/My_Ocardial_Infarction - In the Netflix version. It does now show the song in there, which is cool. But for like 10 years, it was unknown.
adnanaga · 3 years ago
What was the song ??
sexy_panda · 3 years ago
You will get the result once $commenter is online again
bluetidepro · 3 years ago
Updated the post, sorry! haha
cooperadymas · 3 years ago
Well?
Waterluvian · 3 years ago
It’s all just Fourier analysis I’m guessing?

Which I always find to be simultaneously simple and obvious as well as total magic.

tasty_freeze · 3 years ago
I have a relevant story here. The inventor of the Shazam algorithm, Avery Wang, gave me a demo of it within a couple weeks of it being created. Here is the backstory (partly from personal knowledge, partly relayed by Avery).

Avery had gotten his PhD from CCRMA at Stanford under Julius O Smith. His PhD had been on the topic of automatically ("blind") recovery of individual vocal/instrument tracks from a final mix. From there he joined a startup, Chromatic Research, where I was also at. He created lots of code and patented some algorithms related to resampling and MIDI synthesis, stuff like that. Avery was (and is) a super nice guy, humble but incredibly smart -- he could work not only with the high level mathematics but was also equally excited to fine tune assembly code.

After Chromatic folded, Avery had been struggling to get his own startup off the ground. About the same time, the Shazam guys had the idea for the product but didn't know how to create the algorithm. They approached Smith at CCRMA looking for someone capable of creating something that worked. Smith suggested they try Avery Wang.

At first Avery said, "Hmm, that seems difficult, but let me think about it." Within a week he had a demo running on a few thousand songs he had gathered from CDs. I'm sure a lot of refinements went into it after that, but the core idea took him a weekend.

[ All factual mistakes above are due to me and 20 year old memories -- if I misrepresented something it certainly wasn't due to Avery telling me something that wasn't true ]

EDIT: A blurb about Avery https://www.seti.org/avery-wang

damagednoob · 3 years ago
At the beginning of this video[1], Avery talks about the beginning of Shazam.

[1]: https://youtu.be/YVTnj3OIhwI

nomilk · 3 years ago
> August 2002: Shazam launches as a text message service based in the UK. At the time, users could identify songs by dialing “2580” on their phone and holding it up as a song played. They were then sent an SMS message telling them the song title and the name of the artist.

Incredible! Curious to know what exactly happened backend after it listened to the audio, and what hardware it ran on.

mortenjorck · 3 years ago
Just as amazing to me is that the algorithms could identify a song through the extremely limited bandwidth and spectrum of an early-2000s CDMA stream and a cheap Kyocera microphone.
KMnO4 · 3 years ago
Low bandwidth is perfectly suitable for low frequency data (ie melody). You lose some of the high frequency details (ie timbre), but it’s still very easy to recognize songs.

It’s the same as recognizing objects in a 256x256px image.

Try resampling a song from 44kHz to 4kHz and you’ll still have no trouble recognizing it.

LeoPanthera · 3 years ago
The UK was one of the first countries to introduce GSM-EFR which used the ACELP codec at 12.2 kbit/s for phone calls. The quality was actually pretty good.

I don't really understand why phone call fidelity hasn't improved since then. Sometimes it seems like it's even worse!

manderley · 3 years ago
GSM, not CDMA.
ID1452319 · 3 years ago
Yep, I remember dialling that number and holding my phone up to the telly!
cannam · 3 years ago
It's obviously a cracking algorithm, but what made Shazam doubly remarkable was how efficiently they turned it into a working product.

It wasn't just a case of developing an algorithm that could in theory be used to match an audio signal against all the world's pop songs. They presumably also had to get hold of a substantial number of those songs, fingerprint them, and roll out the search robustly against generally very poor audio hardware using simple telephony services at (for the time) quite considerable scale. They did it very quickly, it worked super well from launch, and it's been running continuously ever since.

I've read the paper about the method, but I would love to know more about the original development and deployment.

mkarliner · 3 years ago
Well, I was CTO at the time, AMA...
schoen · 3 years ago
How did you get the underlying corpus of audio for all commercially recorded music?
ThrowawayTestr · 3 years ago
How'd y'all monetise back then?

Deleted Comment

duped · 3 years ago
The paper giving an overview of how shazam works (1) is one of my favorites

(1) https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

xtracto · 3 years ago
Reminds me a lot of MusicBrainz Tagger, I remember being fascinated by it in 2001 ( https://web.archive.org/web/20010107213100/http://musicbrain... ) because it was able to "identify" the song in the mp3/wma/ogg file and download the correct tags.
rmnclmnt · 3 years ago
I remember studying this paper as a student, was completely amazing, a bit mysterious and not so difficult to understand at the same time.

And most of all: no ML involved! All hail the heuristics!

grensley · 3 years ago
I'm just imagining the founders as every stoned business school student at the time like "Bro, wouldn't it be cool if there was an app where you just press a button and it tells you what song is playing" and they actually managed to find the guy capable of building it.

To this day, Shazam still has that aesthetic of "press big button, do magic".

monkpit · 3 years ago
Actually, the first version in 2002 worked by listening to a phone call. There were no apps really at the time, at least not in the iPhone sense (because it didn’t exist).
mattigames · 3 years ago
Maybe in a parallel dimension where internet took a few more decades to be created we call a number to listen to our Twitter feed narrated by some robotic voice, and we get a bulletin/form every week where we write down who we want to follow or unfollow, and of course a field where to write our own tweets!
llaolleh · 3 years ago
Ha. I would watch this movie.
ksala_ · 3 years ago
Shazam always blows my mind. It doesn't work 100% of the times, but when it does it feels like magic. On top of that they introduced (I don't know exactly when) the feature to see lyrics for the song which are automatically synched with the music. This is also mind-blowing.

Only Google has managed to top Shazam in blowing my mind, and only ~recently, by making this whole process happen completely offline and continuously in the background on a phone. It's not as broad but still incredible. Google's paper: https://arxiv.org/abs/1711.10958

elboru · 3 years ago
Sometimes, I like to stop and think about all the amazing things that we can do with our phones and that we take for granted.

What I do is to imagine myself finding a smartphone in elementary school (90s kid). These are a few things that would blow my mind:

- Having a digital global map, with multitouch, that can show me where I am in that map. I can search anything and find reviews from virtually anywhere in the world. I can zoom and see my actual house. I can use street view.

- I have access to any song I want.

- The phone can listen to a song and it can tell me the name of it (then I can listen to it again)

- I can play video games with much better graphics than my N64

- I can watch movies and TV in there.

- I can video call

- I have a digital assistant

- I can find any answer online

- I can buy anything online

- In the future all this technology is not just for the rich, virtually anyone can buy a smartphone.

Vinnl · 3 years ago
My goto example is that I'm now able to see the text of the text message I'm replying to while typing the reply.
jedberg · 3 years ago
In 2004 I took my grandpa for a ride in my car, which had GPS. He couldn't stop staring at it. I asked him what's wrong, and he said, "When I was an insurance salesman driving door to door in LA, I had entire file folder of maps I kept on the passenger seat to find my way. If I had this gadget back then, I could have sold twice as much insurance".
sh4rks · 3 years ago
> I can play video games with much better graphics than my N64

And - you can play N64 games on a rectangle you can carry in your pocket.