Readit News logoReadit News
pstroqaty commented on Voxtral Transcribe 2   mistral.ai/news/voxtral-t... · Posted by u/meetpateltech
m1el · 6 days ago
pstroqaty · 5 days ago
Thank you for sharing! Does your implementation allow running the Nemotron model on Vulkan? Like whisper.cpp? I'm curious to try other models, but I don't have Nvidia, so my choices are limited.
pstroqaty commented on Ask HN: What Are You Working On? (December 2025)    · Posted by u/david927
mysfi · 2 months ago
I really liked wisprflow on my mac but my daily driver is Manjaro KDE. I have stitched together a bash script that copies the transcription (right now I am using the Parakeet TDT 0.6B) to my clipboard. I would give this a try on linux when it becomes available.
pstroqaty · 2 months ago
Would you be open to sharing your script? I run whisper.cpp in Linux through some stitched together scripts (https://news.ycombinator.com/item?id=44949314), but would be very curious to try Parakeet. I don't believe I can run it through whisper.cpp?
pstroqaty commented on Show HN: Whispering – Open-source, local-first dictation you can trust   github.com/epicenter-so/e... · Posted by u/braden-w
pstroqaty · 6 months ago
If anyone's interested in a janky-but-works-great dictation setup on Linux, here's mine:

On key press, start recording microphone to /tmp/dictate.mp3:

  # Save up to 10 mins. Minimize buffering. Save pid
  ffmpeg -f pulse -i default -ar 16000 -ac 1 -t 600 -y -c:a libmp3lame -q:a 2 -flush_packets 1 -avioflags direct -loglevel quiet /tmp/dictate.mp3 &
  echo $! > /tmp/dictate.pid
On key release, stop recording, transcribe with whisper.cpp, trim whitespace and print to stdout:

  # Stop recording
  kill $(cat /tmp/dictate.pid)
  # Transcribe
  whisper-cli --language en --model $HOME/.local/share/whisper/ggml-large-v3-turbo-q8_0.bin --no-prints --no-timestamps /tmp/dictate.mp3 | tr -d '\n' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
I keep these in a dictate.sh script and bind to press/release on a single key. A programmable keyboard helps here. I use https://git.sr.ht/%7Egeb/dotool to turn the transcription into keystrokes. I've also tried ydotool and wtype, but they seem to swallow keystrokes.

  bindsym XF86Launch5 exec dictate.sh start
  bindsym --release XF86Launch5 exec echo "type $(dictate.sh stop)" | dotoolc
This gives a very functional push-to-talk setup.

I'm very impressed with https://github.com/ggml-org/whisper.cpp. Transcription quality with large-v3-turbo-q8_0 is excellent IMO and a Vulkan build is very fast on my 6600XT. It takes about 1s for an average sentence to appear after I release the hotkey.

I'm keeping an eye on the NVidia models, hopefully they work on ggml soon too. E.g. https://github.com/ggml-org/whisper.cpp/issues/3118.

u/pstroqaty

KarmaCake day26July 9, 2021View Original