I'd rather accept a little compromise regarding the voice and intonation quality, as long as the TTS system doesn't frequently garble words. The AAC app is used on tablet PCs running from battery, so the lower the CPU usage and energy draw, the better.
However, it is unmaintained and the Apple Silicon build is broken.
My app also uses whisper.cpp. It runs in real time on Apple Sillicon or on modern fast CPUs like AMD's gaming CPUs.
Do you possibly have links to the voices you found?
https://huggingface.co/canopylabs/orpheus-3b-0.1-ft
(no affiliation)
it's English only afaics.
But I wish there were an offline, on-device, multilingual text-to-speech solution with good voices for a standard PC — one that doesn't require a GPU, tons of RAM, or max out the CPU.
In my research, I didn't find anything that fits the bill. People often mention Tortoise TTS, but I think it garbles words too often. The only plug-in solution for desktop apps I know of is the commercial and rather pricey Acapela SDK.
I hope someone can shrink those new neural network–based models to run efficiently on a typical computer. Ideally, it should run at under 50% CPU load on an average Windows laptop that’s several years old, and start speaking almost immediately (less than 400ms delay).
The same goes for speech-to-text. Whisper.cpp is fine, but last time I looked, it wasn't able to transcribe audio at real-time speed on a standard laptop.
I'd pay for something like this as long as it's less expensive than Acapela.
(My use case is an AAC app.)
In any case, HN's guidelines ask to use the original title of an article, unless it is misleading or linkbait. I'd agree that Apple's software quality has been going down.
This requires about 5 to 10 minutes to set up. You'll find instructions for this on the web or via some LLM. I've looked right now for a suitable article, but the ones I've found are subtly different from my Quick Action. I've asked ChatGPT and its instructions seem to be correct.
I was impressed at those high-level summaries. If I had assigned this task to several humans, I'm not sure how many would have been able to achieve similar results.
When playing astro‑maze, the delay is noticeable, and in a 2D action game such delays are especially apparent. Games that don’t rely on tight real‑time input might perform better. (I'm connecting from Europe, though.)
If you add support for drawing from images (such as spritesheets or tilesheets) in the future, and the client stores those images and sounds locally, the entire screen could be drawn from these assets, so no pixel data would need to be transferred, only commands like "draw tile 56 at position (x, y)."
(By the way, opening abstra.io in a German-language browser leads to https://www.abstra.io/deundefined which shows a 404 error.)