karimf (u/karimf) - Readit News

karimf commented on Claude 4 System Card simonwillison.net/2025/Ma... · Posted by u/pvg

albert_e · 3 months ago

OT

> data provided by data-labeling services and paid contractors

someone in my circle was interested in finding out how people participate in these exercises and if there are any "service providers" that do the heavy lifting of recruiting and managing this workforce for the many AI/LLM labs globally or even regionally

they are interested in remote work opportunities that could leverage their (post-graduate level) education

appreicate any pointers here - thanks!

karimf · 3 months ago

https://mercor.com/

karimf commented on Show HN: Real-time AI Voice Chat at ~500ms Latency github.com/KoljaB/Realtim... · Posted by u/koljab

koljab · 4 months ago

I built RealtimeVoiceChat because I was frustrated with the latency in most voice AI interactions. This is an open-source (MIT license) system designed for real-time, local voice conversations with LLMs.

Quick Demo Video (50s): https://www.youtube.com/watch?v=HM_IQuuuPX8

The goal is to get closer to natural conversation speed. It uses audio chunk streaming over WebSockets, RealtimeSTT (based on Whisper), and RealtimeTTS (supporting engines like Coqui XTTSv2/Kokoro) to achieve around 500ms response latency, even when running larger local models like a 24B Mistral fine-tune via Ollama.

Key aspects: Designed for local LLMs (Ollama primarily, OpenAI connector included). Interruptible conversation. Smart turn detection to avoid cutting the user off mid-thought. Dockerized setup available for easier dependency management.

It requires a decent CUDA-enabled GPU for good performance due to the STT/TTS models.

Would love to hear your feedback on the approach, performance, potential optimizations, or any features you think are essential for a good local voice AI experience.

The code is here: https://github.com/KoljaB/RealtimeVoiceChat

karimf · 4 months ago

Do you have any information on how long each step take? Like how many ms on each step of the pipeline?

I'm curious how fast it will run if we can get this running on a Mac. Any ballpark guess?

karimf commented on Show HN: Dia, an open-weights TTS model for generating realistic dialogue github.com/nari-labs/dia... · Posted by u/toebee

toebee · 4 months ago

Hey HN! We’re Toby and Jay, creators of Dia. Dia is 1.6B parameter open-weights model that generates dialogue directly from a transcript.

Unlike TTS models that generate each speaker turn and stitch them together, Dia generates the entire conversation in a single pass. This makes it faster, more natural, and easier to use for dialogue generation.

It also supports audio prompts — you can condition the output on a specific voice/emotion and it will continue in that style.

Demo page comparing it to ElevenLabs and Sesame-1B https://yummy-fir-7a4.notion.site/dia

We started this project after falling in love with NotebookLM’s podcast feature. But over time, the voices and content started to feel repetitive. We tried to replicate the podcast-feel with APIs but it did not sound like human conversations.

So we decided to train a model ourselves. We had no prior experience with speech models and had to learn everything from scratch — from large-scale training, to audio tokenization. It took us a bit over 3 months.

Our work is heavily inspired by SoundStorm and Parakeet. We plan to release a lightweight technical report to share what we learned and accelerate research.

We’d love to hear what you think! We are a tiny team, so open source contributions are extra-welcomed. Please feel free to check out the code, and share any thoughts or suggestions with us.

karimf · 4 months ago

This is super awesome. Several questions.

1. What GPU did you use to train the model? I'd love to train a model like this, but currently, I only have a 16GB MacBook. Thinking about buying a 5090 if it's worth.

2. Is it possible to use this for real time audio generation, similar to the demo on the Sesame website?

karimf commented on Crossing the uncanny valley of conversational voice sesame.com/research/cross... · Posted by u/monroewalker

karimf · 6 months ago

This might be a game changer for learning English.

I'm from a developing country and it's sad that most English teachers on public schools here can't speak English well. There are good English teachers, but they are expensive and they are not affordable for the average people.

OpenAI realtime models are good, but we can't deploy it to masses since it's very expensive.

This model might be able to solve the issue since it's better or on par with the OpenAI model, yet it's significantly cheaper since it's a fairly small model.

karimf commented on Big Tech groups say their $100B AI spending spree is just beginning ft.com/content/b7037ce1-4... · Posted by u/elsewhen

karimf · a year ago

https://archive.is/gs1xT

karimf commented on Ask HN: Who are your most treasured live coders? · Posted by u/nomilk

karimf · 2 years ago

ThePrimeagen.

Sometimes I lost my spark with programming. Watching him reminds me to enjoy programming more.

karimf commented on Whisky: Wine supercharged with the power of Apple's game porting toolkit getwhisky.app... · Posted by u/robin_reala

karimf · 2 years ago

I've been following the scene of Mac gaming for a while. Isaac, the author of this project, is someone who contributes a lot to this space. He created Whisky and contributed to Ryujinx, a Switch emulator that works on Mac, and Playcover, a way to play iOS apps and games on Mac. Also, he's still 17 years old. [0]

[0] https://isaacmarovitz.com/