Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

dang · 6 months ago

We've moved the (relevant) comments to https://news.ycombinator.com/item?id=44807868, which was posted by the project creators.

I've re-upped that thread to the same position the previous discussion (this one) was at.

divamgupta · 6 months ago

Thanks for posting about our project in HN! I am one of the creators of KittenTTS

Here is the link to our repo: https://github.com/KittenML/KittenTTS

jainilprajapati · 6 months ago

<3

colonCapitalDee · 6 months ago

Very cool model, but the post is a caricature of AI writing. "Okay, let's get into the nitty-gritty. What makes this little beast tick? These aren't just bullet points on a GitHub README; these are the specs that will fundamentally redefine what you thought was possible with local AI." Sure.

esseph · 6 months ago

Everybody always thinks everything is AI. AI learned from consuming writing.

This is a ouroboros that will continue.

(Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)

treyd · 6 months ago

This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.

jainilprajapati · 6 months ago

This is HOW I WRITE man yes I agree I take LITTLE help Of AI

dismalaf · 6 months ago

The writing style we associate with AI is the 2010's blogging style that AI learned from... So it definitely could have been written by a person.

hildolfr · 6 months ago

No it isn't, it's something new born from ingesting that stuff... That's exactly why a lot of us can detect it from a mile away.

No human comments on meta formatting like that outside the deepest trenches of Apple/FB corporate stuff.

namuol · 6 months ago

I think it’s fair enough to just say that the writing is cringe, AI or not.

Dead Comment

PontifexMinimus · 6 months ago

Indeed the blurb is absurd and very off-putting. It's not a big deal that "It clocks in at under 25MB with just 15 million parameters", because text to speech is a long-solved problem, in fact the Texas Speak and Spell from 1978 (half a century ago FFS) solved it, probably with a good deal less than 25MB.

paulryanrogers · 6 months ago

Speak and Spell was a toy. I loved it as a kid in the eighties. But it was very limited and sounded terrible.

Deleted Comment