Readit News logoReadit News
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
Aachen · 5 months ago
Impressive technical achievement, but in terms of whether I'd use it: oof, that male voice is like one of these fake-excited newsreaders. Like they're always at the edge of their breath. The female one is better but still someone reading out an advertisement for a product they were told they must act extra excited for. I assume this is what the majority of training data was like and not an intentional setting for the demo. Unsure whether I could get used to that

I use TTS on my phone regularly and recently also tried this new project on F-Droid called SherpaTTS, which grabs some models from Huggingface. They're super heavy (the phone suspends other apps to disk while this runs) and sound good, but in the first news article there were already one or two mispronunciations because it's guessing how to say uncommon or new words and it's not based on logical rules anymore to turn text into speech

Google and Samsung have each a TTS engine pre-installed on my device and those sound and work fine. A tad monotonous but it seems to always pronounce things the same way so you can always work out what the text said

Espeak (or -ng) is the absolute worst, but after 30 seconds of listening closely you get used to it and can understand everything fine. I don't know if it's the best open source option (probably there are others that I should be trying) but it's at least the most reliable where you'll always get what is happening and you can install it on any device without licensing issues

divamgupta · 5 months ago
Thanks a lot for the detailed feedback. We are working on some models which do not use a phonemizer
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
archon810 · 5 months ago
Sounds like Mort from Family Guy.
divamgupta · 5 months ago
Lol
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
bouchard · 5 months ago
Never thought I'd see the name LimeWire again, wow
divamgupta · 5 months ago
Haha interesting pivot!
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
WhyNotHugo · 5 months ago
Dedicated single-purpose hardware with models would be even less energy-intensive. It's theoretically possible to design chips which run neural networks and alike using just resistors (rather than transistors).

Such hardware is not general-purpose, and upgrading the model would not be possible, but there's plenty of use-cases where this is reasonable.

divamgupta · 5 months ago
The thing is that the new models keep coming every day. So it’s economically not feasible to make chips for a single model
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
nine_k · 5 months ago
I hope this is the future. Offline, small ML models, running inference on ubiquitous, inexpensive hardware. Models that are easy to integrate into other things, into devices and apps, and even to drive from other models maybe.
divamgupta · 5 months ago
This is our goal too.
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
KaiserPro · 5 months ago
was it cross trained on futurama voices?
divamgupta · 5 months ago
It was not
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
sergiotapia · 5 months ago
https://vocaroo.com/1njz1UwwVHCF

It doesn't sound so good. Excellent technical achievement and it may just improve more and more! But for now I can't use it for consumer facing applications.

divamgupta · 5 months ago
We are still training the model. We expect the quality to go up in the next release. This is just a preview release :)
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
peanut_merchant · 5 months ago
I ran some quick benchmarks.

Ubuntu 24, Razer Blade 16, Intel Core i9-14900HX

  Performance Results:

  Initial Latency: ~315ms for short text

  Audio Generation Speed (seconds of audio per second of processing):
  - Short text (12 chars): 3.35x realtime
  - Medium text (100 chars): 5.34x realtime
  - Long text (225 chars): 5.46x realtime
  - Very Long text (306 chars): 5.50x realtime

  Findings:
  - Model loads in ~710ms
  - Generates audio at ~5x realtime speed (excluding initial latency)
  - Performance is consistent across different voices (4.63x - 5.28x realtime)

divamgupta · 5 months ago
Thanks for running the benchmarks. Currently the models are not optimized yet. We will optimize loading etc when we release an SDK meant for production :)
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
nxnsxnbx · 5 months ago
Thanks, I was looking for that. While the reddit demo sounds ok, even though on a level we reached a couple of years ago, all TTS samples I tried were barley understandable at all
divamgupta · 5 months ago
This is just an early checkpoint. We hope that the quality will improve in the future.
divamgupta commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model   github.com/KittenML/Kitte... · Posted by u/divamgupta
bkyan · 5 months ago
I got an error when I tried the demo with 6 sentences, but it worked great when I reduced the text to 3 sentences. Is the length limit due to the model or just a limitation for the demo?
divamgupta · 5 months ago
Currently we don't have chunking enabled yet. We will add it soon. That will remove the length limitations.

u/divamgupta

KarmaCake day855March 20, 2021View Original