Show HN: Sonauto API – Generative music for developers

Show HN: Sonauto API – Generative music for developers sonauto.ai/developers...

Hello again HN,

Since our launch ten months ago, my cofounder and I have continued to improve our music model significantly. You can listen to some cool Staff Picks songs from the latest version here https://sonauto.ai/ , listen to an acapella song I made for my housemate here https://sonauto.ai/song/8a20210c-563e-491b-bb11-f8c6db92ee9b , or try the free and unlimited generations yourself.

However, given there are only two of us right now competing in the "best model and average user UI" race we haven't had the time to build some of the really neat ideas our users and pro musicians have been dreaming up (e..g, DAW plugins, live performance transition generators, etc). The hacker musician community has a rich history of taking new tech and doing really cool and unexpected stuff with it, too.

As such, we're opening up an API that gives full access to the features of our underlying diffusion model (e.g., generation, inpainting, extensions, transition generation, inverse sampling). Here are some things our early test users are already doing with it:

- A cool singing-to-video model by our friends at Lemon Slice: https://x.com/LemonSliceAI/status/1894084856889430147 (try it yourself here https://lemonslice.com/studio)

- Open source wrapper written by one of our musician users: https://github.com/OlaFosheimGrostad/networkmusic

- You can also play with all the API features via our consumer UI here: https://sonauto.ai/create

We also have some examples written in Python here: https://github.com/Sonauto/sonauto-api-examples

- Generate a rock song: https://github.com/Sonauto/sonauto-api-examples/blob/main/ro...

- Download two songs from YouTube (e.g., Smash Mouth to Rick Astley) and generate a transition between them: https://github.com/Sonauto/sonauto-api-examples/blob/main/tr...

- Generate a singing telegram video (powered by ours and also Lemon Slice's API): https://github.com/Sonauto/sonauto-api-examples/blob/main/si...

You can check out the full docs/get your key here: https://sonauto.ai/developers

We'd love to hear what you think, and are open to answering any tech questions about our model too! It's still a latent diffusion model, but much larger and with a much better GAN decoder.

On one hand this is impressive, and I've been wondering when something like this would appear. On the other hand, I am -- like others here have expressed -- saddened by the impact this has on real musicians. Music is human, music theory is deeply mathematical and fascinating -- "solving" it with a big hammer like generative AI is rather unsatisfying.

The other very real aspect here is "training data" has to come from somewhere, and the copyright implications of this are beyond solved.

In the past I worked on real algorithmic music composition: algorithmic sequencer, paired with hardware- or soft- synthesizers. I could give it feedback and it'd evolve the composition, all without training data. It was computationally cheap, didn't infringe anyone's copyright, and a human still had very real creative influence (which instruments, scale, tempo, etc.). Message me if anyone's still interested in "dumb" AI like that. :-)

Computer-assisted music is nothing new, but taking away the creativity completely is turning music into noise -- noise that sounds like music.

tgv · 6 months ago

> "solving" it with a big hammer like generative AI is rather unsatisfying.

The reason is greed. They jump on the bandwagon to get rich, not to bring art. They don't care about long term effects on creativity. If it means that it kills motivation to create new music, or even learn how to play an instrument, that's fine by these people. As long as they get their money.

zaptrem · 6 months ago

If our sole goal was to get rich we would have pivoted to some b2bsaas thing as many suggested to us. What we’ve actually seen is so much new creativity from people who otherwise would never have made music.

karpierz · 6 months ago

> Message me if anyone's still interested in "dumb" AI like that. :-)

Not sure how to reach out, but I'm definitely interested in reading about procedural methods in music synthesis. Any links describing your approach?

mco · 6 months ago

Added a link in my profile that leads to a brief demo and description. Not posting here as it'd crumble under too much load. :-/

bambinella · 6 months ago

You mean like how real pianists suffered when the automated piano came, or how live music died when the record player came?

Actually, noise that sounds like music is some of the best music there is: electroacoustic music.

A lot better than most music on the radio. ;-)

0xEF · 6 months ago

> Message me

I don't see any contact info in your profile, but I have an email in mine. I am interested in hearing more about your process and if you have music for sale anywhere, I like to support electronic artists doing interesting stuff.

dudefeliciano · 6 months ago

same here

unraveller · 6 months ago

Anyone with ears can find music satisfying. You don't need an artist's backstory or blessing for that. By all means use slow AI to get the same point fast AI can get to, but don't ask me to value it differently.

tbossanova · 6 months ago

I have many times watched guitar players at their work, and gone home to try and do the same thing. I definitely value that differently.

6stringmerc · 6 months ago

And AI doesn’t make satisfying music. Music is partially derivative in the human sector, but only derivative in AI. That’s why it sounds like shit to reasonable ears.

It’s less than worthless.

webprofusion · 6 months ago

Interesting that Suno et al miss out on the obvious problem that actual musicians need extra musicians for their own projects.

For instance a guitarist will have a track they wish they had vocals for(and lyrics) for and if they could pay for that they would.

Literally if you could highlight a tune section in your DAW, prompt it, and vocals + lyrics were generated, possibly different version or harmonies for existing parts etc. Musicians already pay for plugins but the singing ones are awful to use so far.

We're super interested in working on this (and melody conditioning) and even have some of the code written to generate the training data, but we want our base model to get a bit better before this becomes our main focus. Check back in a few months!

SnowingXIV · 6 months ago

Honestly, this is a good use case and I think I still am not a fan. It's an extra step-away from a drum machine so maybe I can stomach it eventually but as a guitarist I love writing riffs and songs but just don't have the time and patience to put together decent sounding drum tracks against it. Garageband/Logic and others have added an AI drummer but still doesn't feel great.

I probably would be happy paying a service I could drop a riff into and get decent drum track that goes with it. Even more would be while recording or playing it modifies and adapts, it can be recorded and clipped. Something that fits a clean workflow. If anyone makes this please don't make it such a pain as most VSTs and plugin systems where there are like 4 different installers and licensing software layers.

akrymski · 6 months ago

I really wish this trend of prompting gen AI models with text would stop. It's really meaningless. Musicians need gen AI they can prompt with a melody on their keyboard. Or a bit of whistling into the microphone. Or a beat they can tap on the table. That is what allows humans to unleash their creativity. Not AI generating random bits that fit a distribution of training data. English language is not the right input for anything except for information retrieval tasks.

Agreed! Those will be much more fun and we plan to support that. However, right now we're focused on making the base model slightly better, then we can easily add all of those controls (a-la ControlNets with Stable Diffusion).

But this is not easy, it's the real challenge here as there are lots of text-to-audio models out there. It is far from solved for Stable Diffusion as well. ControlNet is pretty bad. Just try taking the photo of an empty room and asking an image model to add furniture. Or to change a wall colour. Or to style an existing photo as per the style of another and so on. We are very far from being able to truly control the output generated by the AI models, which is something that a DAW excels at. I'd start with an AI-powered DAW rather than text-to-audio and try to add controls to it. It's like Cursor vs Lovable if you get my drift.

vasco · 6 months ago

> Not AI generating random bits that fit a distribution of training data

How is that specific to text prompting? If you tap your fingers to a model and it generates a song from your tapping, it's still just fitting the training data as you say.

Deleted Comment

8474_s · 6 months ago

The current AI music apps have a certain chunking problem: they force extending the song with segments that may or may not fit, which users likely choose as "good enough" and get Frankenstein mash-up songs that have no coherent "flow" or "progression" as its actually chunks of "similar sounding songs" not a coherent "full song generation" by AI but editing result of multiple chunks merged into something.

I don't think that is the problem for Sonauto V2, on the contrary it is more a challenge that the model is too consistent with preceding content.

Here are a few of my songs, I think they are fairly consistent?

https://sonauto.ai/song/e2e3d210-69b4-4ad7-96d1-fb5744d0c648

https://sonauto.ai/song/a94e04a9-7b74-4b87-b5ed-ca3e8d2798d0

https://sonauto.ai/song/55a36595-c60a-4346-81d8-6f03ebe690ff

One thing I've been thinking about is how to do a better hobbyist plan system. It would be cool to do a flat rate unlimited plan, but we wouldn't want that to then be abused by larger customers/companies. Are there existing API providers you think solve this particularly well?

zoogeny · 6 months ago

I don't think it meets your ask of "solve this particularly well" but the unlimited plans in video that I am familiar with have a fast/slow queue system. This effectively limits the plan. It seems, as well, that these kind of queue systems are tiered. So you can have N number of fast queued items, X number of tier one slow queue, Y number of tier two slow queue, etc. On the backend this is probably just some kind of weighted priority queue where the number of requests in some time duration determines some weight scaling factor.

I think this is a good start, X high speed queries per hour then unlimited low-priority ones after. Do you know of any specific companies that do this we could take a look at?

mvdtnz · 6 months ago

Why would a hobbyist need an unlimited plan?

E.g., in the case of a future "LibreMusic" open source UI or an integration into their DAW they work with on the weekends. I'd get pretty annoyed if I had to keep putting a coin in the machine to adjust Logic Pro effects.

weberer · 6 months ago

So if I make a song using this API, who owns the copyright? Is it me or Sonauto?

I'm not sure to what extent AI music is copyrightable (I think it depends on a case-by-case amount of human influence) but our TOS assigns any rights we may have to the user.

magicmicah85 · 6 months ago

From their terms (https://sonauto.ai/tos):

8. OUTPUT As between You and the Services, and to the extent permitted by applicable law, You own any right, title, or interest that may exist in the musical and/or audio content that You generate using the Services ("Outputs"). We hereby assign to You all our right, title, and interest, if any, in and to Your Outputs. This assignment does not extend to other users' Outputs, regardless of similarity between Your Outputs and their Outputs. You grant to us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide license to use Your Output to provide, maintain, develop, and improve the Services, to comply with applicable law, and/or to enforce our terms and policies. You are solely responsible for Outputs and Your use of Outputs, including ensuring that Outputs and Your use thereof do not violate any applicable law or these terms of service. We make no warranties or representations regarding the Outputs, including as to their copyrightability or legality. By using the Services, You warrant that You will use Outputs only for legal purposes.

You own the rights, but Sonauto is granted the rights to use it as well.

fuhsnn · 6 months ago

>You own any right, title, or interest that may exist

>We hereby assign to You all our right, title, and interest, if any

>You are solely responsible for Outputs and Your use of Outputs

I love how it clearly laid out the scenario that the right don't exist, yet you are responsible.

amarant · 6 months ago

This is pretty cool! It's noticeably better than any of the other similar music generation tools I've tried, kudos!

column · 6 months ago

This looks pretty cool to integrate in hobby projects, however after creating an account via Google, clicking "Payment portal" shows this error :

Error creating billing portal Failed to create billing portal session: No configuration provided and your live mode default configuration has not been created. Provide a configuration or create your default by saving your customer portal settings in live mode at https://dashboard.stripe.com/settings/billing/portal.

Also when trying to update my profile picture :

Failed to update image! column users.current_period_end does not exist

Stripe issue should be fixed, second issue likely happens if you go to the api page sometime in your session before going to the profile page and then you try to edit your picture. We'll work on that. Thanks for reporting!