Omnizart: Library for automatic music transcription

Anybody experimenting with this for piano: I highly recommend the Google 'Onsets and Frames' algorithm as embodied in their demo:

https://piano-scribe.glitch.me/

I built something similar which is a lot faster but on a large scale test the google software handily outperforms my own (92% accuracy versus 87% or so, and that is a huge difference because it translates into ~30% fewer errors).

pizza · 4 years ago

Wow the Onsets and Frames algorithm is insanely interesting. It's like a mixture of run-length encoding of (vertical and horizontal) strings of (0dim/time) and (1d/time) structures (onsets as points in time, activations as lines in time). But..hm.. why stop at such low dimensionality structures..! :^)

koningrobot · 4 years ago

There has since been follow-up work to extend Onsets & Frames to multi-instrument music: https://magenta.tensorflow.org/transcription-with-transforme...

pizza · 4 years ago

Shame that it uses quadratically scaling transformers - there are many sub-quadratic transformers that work quite well or better (https://github.com/lucidrains?tab=repositories) - because that 4 second sub-sample limitation seems quite unlike how I imagine most people experience music. Interesting, though. I wonder if I could take a stab at this..

Also interesting that the absolute timing of onsets worked better than relative timing - that also seems kinda bizarre to me, since, when I listen to music, it is never in absolute terms (e.g. "wow I just loved how this connects to the start of the 12th bar" vs "wow I loved that transition from what was playing 2 bars ago".

Another thing on relative timing.. when I listen to music, for me, very nuanced, gradual, and intentional deviations of tempo have significant sentimenal effects - which suggests to me that you need a 'covariant' description of how the tempo needs to change over time, so, not only do you need relative timing of events, you also need relative timing of the relative timing of events as well

Some examples:

- Jonny Greenwood's Phantom Thread II from the Phantom Thread soundtrack [0]

- the breakdown in Holy Other's amazing "Touch" [1], where the song basically grinds to a halt before releasing all the pent up emotional potential energy.

[0] https://www.youtube.com/watch?v=ztFmXwJDkBY, especially just before the violin starts at 1:04

[1] https://www.youtube.com/watch?v=OwyXSmTk9as, around 2:20

slantedview · 4 years ago

I just tried this and it works very very nicely! Thank you for sharing!

frutiger · 4 years ago

I just tried this with Sexy Sadie and the result was awful.

For anyone interested I've transcribed this song [1] using the replicate link the author provided (Colab throws errors for me) using mode music-piano-v2. It spits out mp3s there instead of midis so you can hear how it did [2]

[1] https://www.youtube.com/watch?v=h-eEZGun2PM [2] https://replicate.com/p/qr4lfzsqafc3rbprwmvg2cw5ve

bonestormii_ · 4 years ago

Awesome, thanks for running the test!

I can't help but feel it is heavily impacted by ambience of the recording as well. The midi is of course a very rigid and literal interpretation of what the model is hearing as pitches over time, but of course it lacks the subtlety of realizing a pitch is sustaining because of an ambient effect, or that the attach is is actually a little bit before the beginning of the pitch, etc.

If it could be enhanced to consider such things, I bet you would get much cleaner, more machine-like midis, which are generally preferable.

SquibblesRedux · 4 years ago

Listening to the input and output, the reproduction is like comparing a cat to a picture that resembles a cat, but isn't a cat.

8bitsrule · 4 years ago

Music's a lot more than a collection of notes ... and the timbre of one piano is about as far away from a mixture of reeds and dulcit electronics as you can get. (The Fulero is very nice.)

slantedview · 4 years ago

How do you get a midi?

redka · 4 years ago

You'd have to run it yourself. There's a docker image available but it's a pretty big download (11.7GB)

jacquesm · 4 years ago

vji · 4 years ago

I'm still looking over this to see its capabilities, but if reading this right, we can turn any mp3/wav into a set of midis, which allows us to import into music editing software (like Finale). If this works, this is huge. Congrats to the team.

emerged · 4 years ago

That is a tremendously big “if” considering how many times that problem has been attempted. Even just detecting the key of a song is awfully fuzzy.

pindab0ter · 4 years ago

Detecting a key of a song is also not deterministic. Some song’s keys are truly ambiguous and/or subjective.

nsonha · 4 years ago

the result can be used as audio fingerprints, which is not a new thing. This has something to do with how things like Shazam work.

jawns · 4 years ago

I tried transcribing from a YouTube link via the Colab link, but it generated a bunch of errors.

I found this link to be more helpful than the GitHub repo for understanding what it does:

https://music-and-culture-technology-lab.github.io/omnizart-...

Dead Comment

marcodiego · 4 years ago

I don't have absolute ear. I once tried to "reverse engineer" a guitar solo using a tuner. Too much work, not very good result. Hope this finally brings music transcription to less skilled/gifted musicians/hobbyists.

still_grokking · 4 years ago

Neither the replicate.com nor the colab.research.google.com demos work for me.

The colab notebook is full of warnings and crashes with errors in the "Transcribe" box. Replicate.com does something but the results are garbage.

What am I doing wrong?

weinzierl · 4 years ago

Sounds incredible and I'm curious how well it works. For a quick intro about the state of the art in this space, watch the Melodyne videos on Youtube. In short and without having tried Omniscent Mozart: I would not expect that it gives perfect results without manual help. If it could aid transcription in a semi-automtic way, like Melodyne does, that would already be a big victory for an open source alternative.

jamsch · 4 years ago

Been using Omnizart for drum transcription, however the most accurate piano transcription model I've come across is from ByteDance -- https://github.com/bytedance/piano_transcription