Something interesting... the first 10 seconds or so of the "Death Growl" example[1] is basically copied verbatim from "Ov Fire And The Void" by Behemoth.
More specifically, I think the part that seems copied is at 2:13 of the original[2], as it leads into a solo-ish bit which in the AI version sounds similar still, but goes on to do its own thing:
> Additionally, our memorization-effect experiments in Section 11 demonstrate that our design maintains creativity without plagiarizing, even under strong training set conditioning.
That decision was ridiculous. It's pretty obvious that the Robin Thicke song is a $1.50 Great Value version of "Got To Give It Up" because of the aesthetic similarities but they have nothing to do with each other melodically or harmonically... "Blurred Lines" sounds like I V with a walk at the end whereas "Got To Give It Up" is more like a I IV V. The vocal melodies aren't the same nor is the bass. They have different arrangements. The percussion isn't the same.
The only things they have in common are vibes (in the contemporary sense, not vibraphones). Two dudes singing about sex in falsetto at 120bpm over prototypical R&B/funk elements isn't special. If that's the bar for copyright infringement then 99% of the popular music canon is illegally-derivative. Marvin Gaye was a singular talent but that doesn't mean that his heirs should be able to collect money every time somebody plays an electric piano bassline and sings about making whoopie in alto II.
I had many times on Suno when asking for something, for instance, a metal song with melodious guitar solos, it basically, almost note for note but NOT completely, Megadeth Marty Friedman, especially from around Rust In Peace times. It's good, but why does it pick that to copy specifically?
Does it matter? If the AI "comes up" with "Let it Be" melody on kazoo, it wont match "Let it Be" the Beatles single either, but it will still be plagiarized.
Very nice. Anyone know of projects that aren't tackling the full-song problem but rather instrument parts/loops/stems/acapellas? I'd like something that's more like "infinite AI Loopcloud/Splice" most of these full-song models don't do well to be asked for individual parts in my experience (though I will have to try it with this one).
This gets discussed a lot but unfortunately there's just not much out there around this.
The closest thing I've seen is virtual drummers in Logic X which will follow along with the structure of your song and generate a percussive accompaniment. It's no substitute for a real drummer but it's serviceable.
Yeah. Or like, a loop that plays continuously and has style parameters exposed you can tweak with a controller like a Midi Fighter Twister and get feedback from in real-time. Then you could do something akin to DJ/live production by having two of these going in sync with each other into a mixer. (Tweak params of the cue track until you like it, transition at a phrase point, repeat).
What is the use case for music generation models? I see usecases for alot of the other foundation models like text, image, tts, sst, but why do I want AI generated music?
I’ve mostly used them for laughs with my friends. Sometimes generating “custom” songs with funny lyrics, but most fun so far is editing lyrics of existing songs to say ridiculous things for fun.
No real clue how someone would use them for a more serious endeavor, only thing I could imagine would be to quickly iterate/prototype with song structures on a fixed seed to generate ideas for a real composition. Consider the case of an indie game developer or film maker getting some placeholder music to test the experience during early throwaway iterations.
A major subset is people pretending to be musicians but not care to make music themselves. They just want the title for free. That's the demographic that used loops off of Splice (not even crate digging) + note generators and such and called it a day.
Another more valid subset would be something like a music bed for a video or podcast etc.
A third use is for spamming streaming platforms and making money off undiscerning suckers.
In order to make better ai tools for generating specific parts of a song, you ideally want models that understand what good music sounds like when put together. These sorts of "generate whole songs" are a predecessor to more specific tooling. These tools are slowly moving downstream (look at the evolution of Suno) and will almost certainly eventually move to a place where they are just a part of the music production workflow. We increasingly have improved tools to break down full tracks into stems, stems to/from midi/lyrics.
Lots of potential musicians / producers that can write a catchy tune, lyrics, create midi work, etc; but maybe can't play / don't own the instruments they want to use (could be disabled) or maybe don't have a great singing voice. These ai tools can lower the bar for more people to create music at a higher level. It can also act as a improvisational partner, to explore more musical space faster.
As a personal anecdote of where AI might be useful, as a hobby I occasionally participate in game jams, sometimes working on music / sound effects to stretch my legs form my day job. One game jam game I worked on was inspired by a teammates childhood in Poland. So I listened to a bunch of traditional Polish music and created a track inspired by said music. I'm pretty happy with how it came out, but with current AI I'm sure I could have improved the results significantly. If I were to be making it now, I would be able to upload the tracks I wrote, see how the AI might bring it closer to something that sounds authentic, and using that to help me rewrite parts of the melody where it was lacking. Then I could have piped in my final melody with it's inauthentic midi instrument (I neither own, nor play traditional polish stringed instruments) and used it to make something that sounds much closer to my target, with a more organic feel.
> What is the use case for music generation models?
New types of electronic instruments.
We’ve been able to use analog circuits, digital logic, and then computers to generate sounds for decades… aka synthesizers.
I would love to see synthesizers which use music generation models under the hood to create new sounds. And / or new interfaces to create different types of instruments.
There’s a lot to explore here, in the same way there was (is) lots of exploration of electronic music starting I suppose with the Theremin in the 1930s.
It's the only way for Spotify to turn a profit without all the human work of having to scout, sign and promote flesh-and-blood artists, the things that real music labels do.
streamers and youtubers are constantly looking for royalty free music options.
i could see something like this baked into an editing tool that allowers video editors to specify a tone or style of music in plain language to serve as background music.
An actual serious answer is to help musicians brainstorm while writing. It's so good at helping me come up with ideas, or converting an idea to another genre.
I get the incentives for full-song generation models, but it looks like it's the only thing that pops up. Where are the audio models I can use with a positive effect while working on music? Style transfer for instruments, restoration (but not just trained on over-compressed mp3s, talking about bad recording conditions), audio-to-audio editing?
You'd think those would be easier to achieve than something that tries to just replace me completely.
There's a few commercial offerings but they seriously lag behind.
Some very desirable features are just not available as plugins (or I didn't find them), like enhancing the recording quality: this is only available as paid services aimed for podcasters so work on spoken voice only.
Again, the problem is that most of the offerings are trying to leverage the neural network for some complete solution, in the way replacing the steps professionals are perfectly able (and need to decide on) to take themselves. I'm constantly looking for specialized solutions that do the job that's impossible to make manually. The best example is Demucs for stem-splitting: it does one job and leaves me to work on the rest.
Not using streaming services help somewhat, but I'm not looking forward to having to vet artists I come across for whether they did any substantial original thinking or work for themselves. Tired of snake oil foolishness.
Subscribing to magazines of genres I like, consulting reddit and forums and using youtube to get an idea, and then buying losless files is what i've been doing.
More specifically, I think the part that seems copied is at 2:13 of the original[2], as it leads into a solo-ish bit which in the AI version sounds similar still, but goes on to do its own thing:
[1] https://map-yue.github.io/music/moon.death_metal.mp3
[2] https://youtu.be/vAmnsKKrt9w?t=133
https://arxiv.org/html/2503.08638v1#S11
The only things they have in common are vibes (in the contemporary sense, not vibraphones). Two dudes singing about sex in falsetto at 120bpm over prototypical R&B/funk elements isn't special. If that's the bar for copyright infringement then 99% of the popular music canon is illegally-derivative. Marvin Gaye was a singular talent but that doesn't mean that his heirs should be able to collect money every time somebody plays an electric piano bassline and sings about making whoopie in alto II.
The closest thing I've seen is virtual drummers in Logic X which will follow along with the structure of your song and generate a percussive accompaniment. It's no substitute for a real drummer but it's serviceable.
Dead Comment
No real clue how someone would use them for a more serious endeavor, only thing I could imagine would be to quickly iterate/prototype with song structures on a fixed seed to generate ideas for a real composition. Consider the case of an indie game developer or film maker getting some placeholder music to test the experience during early throwaway iterations.
Another more valid subset would be something like a music bed for a video or podcast etc.
A third use is for spamming streaming platforms and making money off undiscerning suckers.
Lots of potential musicians / producers that can write a catchy tune, lyrics, create midi work, etc; but maybe can't play / don't own the instruments they want to use (could be disabled) or maybe don't have a great singing voice. These ai tools can lower the bar for more people to create music at a higher level. It can also act as a improvisational partner, to explore more musical space faster.
As a personal anecdote of where AI might be useful, as a hobby I occasionally participate in game jams, sometimes working on music / sound effects to stretch my legs form my day job. One game jam game I worked on was inspired by a teammates childhood in Poland. So I listened to a bunch of traditional Polish music and created a track inspired by said music. I'm pretty happy with how it came out, but with current AI I'm sure I could have improved the results significantly. If I were to be making it now, I would be able to upload the tracks I wrote, see how the AI might bring it closer to something that sounds authentic, and using that to help me rewrite parts of the melody where it was lacking. Then I could have piped in my final melody with it's inauthentic midi instrument (I neither own, nor play traditional polish stringed instruments) and used it to make something that sounds much closer to my target, with a more organic feel.
New types of electronic instruments.
We’ve been able to use analog circuits, digital logic, and then computers to generate sounds for decades… aka synthesizers.
I would love to see synthesizers which use music generation models under the hood to create new sounds. And / or new interfaces to create different types of instruments.
There’s a lot to explore here, in the same way there was (is) lots of exploration of electronic music starting I suppose with the Theremin in the 1930s.
i could see something like this baked into an editing tool that allowers video editors to specify a tone or style of music in plain language to serve as background music.
Deleted Comment
You'd think those would be easier to achieve than something that tries to just replace me completely.
Some very desirable features are just not available as plugins (or I didn't find them), like enhancing the recording quality: this is only available as paid services aimed for podcasters so work on spoken voice only.
Again, the problem is that most of the offerings are trying to leverage the neural network for some complete solution, in the way replacing the steps professionals are perfectly able (and need to decide on) to take themselves. I'm constantly looking for specialized solutions that do the job that's impossible to make manually. The best example is Demucs for stem-splitting: it does one job and leaves me to work on the rest.
i kept plucking away at it until i got it to a point where it could generate sheet music and guitar tabs in the style of various artists.
would be fun to revisit that project with fresh eyes.
Dead Comment
Dead Comment