Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser

It took some time, but we finally got Kokoro TTS (v1.0) running in-browser w/ WebGPU acceleration! This enables real-time text-to-speech without the need for a server. Looking forward to your feedback!

amelius · 7 months ago

Now that's what I call "server-less" computing!

deivid · 7 months ago

Amazing! I'm interested in models running locally and Kokoro seems amazing. Are you aware of similar models but for Speech to text?

xenova · 7 months ago

We have released a bunch of speech recognition demos (using whisper, moonshine, and others). For example:

- https://huggingface.co/spaces/Xenova/whisper-web

- https://huggingface.co/spaces/Xenova/whisper-webgpu

- https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu

- https://huggingface.co/spaces/webml-community/moonshine-web

Ono-Sendai · 7 months ago

whisper

sebastiennight · 7 months ago

This is brilliant. All we need now is for someone to code a frontend for it so we can input an article's URL and have this voice read it out loud... built-in local voices on MacOS are not even close to this Kokoro model

satvikpendem · 7 months ago

There are a few already, I assume MacWhisper will add it. That being said, I am also working on a (crossplatform, in Flutter) UI for this.

waynenilsen · 7 months ago

Incredible work! I have listened to several tts and to have this be free and in complete control of the customer is absolutely incredible. This will unlock new use cases

I made https://app.readaloudto.me/ as a hobby thing and now it could be enhanced with a local tts option!

reach-vb · 7 months ago

Brilliant job! Love how fast it is, I'm sure if the rapid pace of speech ML continues we'll have Speech to Speech models directly running in our browser!

dust42 · 7 months ago

It's already there, Hibiki by Kyutai.org was released yesterday with speech to speech, french to english on Iphone:

https://x.com/neilzegh/status/1887498102455869775

https://github.com/kyutai-labs/hibiki

moralestapia · 7 months ago

This is great but far from real-time.

(I get the joke that for some definition of real-time this is real-time).

The reason why I use an API is because time to first byte is the most important metric in the apps I'm working on.

That aside, kudos for the great work and I'm sure one day the latency on this will be super low as well.

itishappy · 7 months ago

Sounds terrible on Chrome with an AMD 5700XT.

Sounds great on Chrome with an Nvidia 1650Ti.

Sounds great on Chrome on a Pixel 6.

Sound like being bitcrushed. Maybe a 64 vs 32 bit error? Solid results when working.

ASalazarMX · 7 months ago

Ubuntu 24.04 LTS. Works great on Firefox, on Chromium audio files are silent, even when downloaded and opened with a media player.

Edit: Sorry, it was a problem of my specific audio setup, it works equally well on Chromium.

SubiculumCode · 7 months ago

Kokoro gives pretty good voices and is quite light...making it useful despite its lack of voice cloning capability. However, I haven't figured out how to run it in the context of a tts server without homebrewing the server...which maybe is easy? IDK.

phildougherty · 7 months ago

https://github.com/remsky/Kokoro-FastAPI

C-Loftus · 7 months ago

Fantastic work. My dream would be to use this for a browser audiobook generator for epubs. I made a cli audiobook generator with Piper [0] that got some traction and I wanted to port it to the browser, but there were too many issues. [1]

Is there source anywhere? Seems the assets/ folder is bundled js. In my opinion, there's a ton of opportunity for private, progressive web apps with this while WebGPU is still relatively newly implemented.

Would love to collaborate in some way if others are also interested in this

[0] https://github.com/C-Loftus/QuickPiperAudiobook/ [1] https://github.com/rhasspy/piper/issues/352

Asmod4n · 7 months ago

Sounds horribly in chrome with an amd gpu, why is that?

mdaniel · 7 months ago

Are you somehow implying that everyone in the AI arms race believes that only CUDA exists?! /s

But, in a more serious tone: the story that I hear about AMD GPUs is that they are, in fact, shittier because AMD themselves give fewer shits. GIGO

CyberDildonics · 7 months ago

What is this comment saying? You think the results are different just because of AMD hardware? If there is a difference it would be a software bug.