Readit News logoReadit News
karimf commented on Claude 4 System Card   simonwillison.net/2025/Ma... · Posted by u/pvg
albert_e · 3 months ago
OT

> data provided by data-labeling services and paid contractors

someone in my circle was interested in finding out how people participate in these exercises and if there are any "service providers" that do the heavy lifting of recruiting and managing this workforce for the many AI/LLM labs globally or even regionally

they are interested in remote work opportunities that could leverage their (post-graduate level) education

appreicate any pointers here - thanks!

karimf · 3 months ago
karimf commented on Show HN: Real-time AI Voice Chat at ~500ms Latency   github.com/KoljaB/Realtim... · Posted by u/koljab
koljab · 4 months ago
I built RealtimeVoiceChat because I was frustrated with the latency in most voice AI interactions. This is an open-source (MIT license) system designed for real-time, local voice conversations with LLMs.

Quick Demo Video (50s): https://www.youtube.com/watch?v=HM_IQuuuPX8

The goal is to get closer to natural conversation speed. It uses audio chunk streaming over WebSockets, RealtimeSTT (based on Whisper), and RealtimeTTS (supporting engines like Coqui XTTSv2/Kokoro) to achieve around 500ms response latency, even when running larger local models like a 24B Mistral fine-tune via Ollama.

Key aspects: Designed for local LLMs (Ollama primarily, OpenAI connector included). Interruptible conversation. Smart turn detection to avoid cutting the user off mid-thought. Dockerized setup available for easier dependency management.

It requires a decent CUDA-enabled GPU for good performance due to the STT/TTS models.

Would love to hear your feedback on the approach, performance, potential optimizations, or any features you think are essential for a good local voice AI experience.

The code is here: https://github.com/KoljaB/RealtimeVoiceChat

karimf · 4 months ago
Do you have any information on how long each step take? Like how many ms on each step of the pipeline?

I'm curious how fast it will run if we can get this running on a Mac. Any ballpark guess?

karimf commented on Show HN: Dia, an open-weights TTS model for generating realistic dialogue   github.com/nari-labs/dia... · Posted by u/toebee
toebee · 4 months ago
Hey HN! We’re Toby and Jay, creators of Dia. Dia is 1.6B parameter open-weights model that generates dialogue directly from a transcript.

Unlike TTS models that generate each speaker turn and stitch them together, Dia generates the entire conversation in a single pass. This makes it faster, more natural, and easier to use for dialogue generation.

It also supports audio prompts — you can condition the output on a specific voice/emotion and it will continue in that style.

Demo page comparing it to ElevenLabs and Sesame-1B https://yummy-fir-7a4.notion.site/dia

We started this project after falling in love with NotebookLM’s podcast feature. But over time, the voices and content started to feel repetitive. We tried to replicate the podcast-feel with APIs but it did not sound like human conversations.

So we decided to train a model ourselves. We had no prior experience with speech models and had to learn everything from scratch — from large-scale training, to audio tokenization. It took us a bit over 3 months.

Our work is heavily inspired by SoundStorm and Parakeet. We plan to release a lightweight technical report to share what we learned and accelerate research.

We’d love to hear what you think! We are a tiny team, so open source contributions are extra-welcomed. Please feel free to check out the code, and share any thoughts or suggestions with us.

karimf · 4 months ago
This is super awesome. Several questions.

1. What GPU did you use to train the model? I'd love to train a model like this, but currently, I only have a 16GB MacBook. Thinking about buying a 5090 if it's worth.

2. Is it possible to use this for real time audio generation, similar to the demo on the Sesame website?

karimf commented on Crossing the uncanny valley of conversational voice   sesame.com/research/cross... · Posted by u/monroewalker
karimf · 6 months ago
This might be a game changer for learning English.

I'm from a developing country and it's sad that most English teachers on public schools here can't speak English well. There are good English teachers, but they are expensive and they are not affordable for the average people.

OpenAI realtime models are good, but we can't deploy it to masses since it's very expensive.

This model might be able to solve the issue since it's better or on par with the OpenAI model, yet it's significantly cheaper since it's a fairly small model.

Deleted Comment

karimf commented on Ask HN: Who are your most treasured live coders?    · Posted by u/nomilk
karimf · 2 years ago
ThePrimeagen.

Sometimes I lost my spark with programming. Watching him reminds me to enjoy programming more.

karimf commented on Whisky: Wine supercharged with the power of Apple's game porting toolkit   getwhisky.app... · Posted by u/robin_reala
karimf · 2 years ago
I've been following the scene of Mac gaming for a while. Isaac, the author of this project, is someone who contributes a lot to this space. He created Whisky and contributed to Ryujinx, a Switch emulator that works on Mac, and Playcover, a way to play iOS apps and games on Mac. Also, he's still 17 years old. [0]

[0] https://isaacmarovitz.com/

u/karimf

KarmaCake day1387December 24, 2016
About
https://github.com/fikrikarim

hello@fikrikarim.com

View Original