Google Soundstorm had the best demo so far. It takes few seconds of original audio and continues it with the same voices. Just hearing those examples you wont figure out where original finished and generated one started.
Yeah, neural codecs are pretty amazing. The most incredible part is that they can do compression well across the temporal domain, something which has been non-trivial.
Hey everyone - I'm the founder of Fluxon. Just saw that we were on HN today - sorry for the late replies.
The app you're using right now is essentially an alpha-version - even the website was only made this week. Sorry about the somewhat-broken experience so far.
I'll try to get back to everything here in the next couple hours but if I miss something / you have other questions, please ping me on akshay@fluxon.ai
the prosody seems a little robotic, and kind of jarring. maybe I'm spoiled by Bark, even in its rough and slow state, but is this really that much of a step up from Tacotron2?
Every time I see one of those, as a big fan of TV crime dramas, I cannot help but think that voice recordings as proof are going to be a thing in the past very soon.
I think the killer feature here is supposed to be voice cloning, which IIUC Google Cloud offers only as a custom enterprise thing that takes weeks (which suggests that it's not fully automated).
Cloning is nice, but what's the point if it doesn't sound natural? Will people really pay 100-1000x (if not more) just to get their preferred voice, but won't sound anything like that person when speaking?
The app you're using right now is essentially an alpha-version - even the website was only made this week. Sorry about the somewhat-broken experience so far.
I'll try to get back to everything here in the next couple hours but if I miss something / you have other questions, please ping me on akshay@fluxon.ai
(Other cofounder of Fluxon)