pbbakkum (u/pbbakkum)

pbbakkum commented on OpenAI charges by the minute, so speed up your audio george.mand.is/2025/06/op... · Posted by u/georgemandis

pbbakkum · 6 months ago

This is great, thank you for sharing. I work on these APIs at OpenAI, it's a surprise to me that it still works reasonably well at 2/3x speed, but on the other hand for phone channels we get 8khz audio that is upsampled to 24khz for the model and it still works well. Note there's probably a measurable decrease in transcription accuracy that worsens as you deviate from 1x speed. Also we really need to support bigger/longer file uploads :)

pbbakkum commented on Ask HN: Conversational AI to Learn a Language · Posted by u/edweis

jc4p · 7 months ago

Thank you so much!! While the transcription is technically in the API it's not a native part of the model and runs through Whisper separately, in my testing with it I often end up with a transcription that's a different language than what the user is speaking and the current API has no way to force a language on the internal Whisper call.

If the language is correct, a lot of the times the exact text isn't 100% accurate, if that's 100% accurate, it comes in slower than the audio output and not in real time. All in all not what I would consider feature ready to release in my app.

What I've been thinking about is switching to a full audio in --> transcribe --> send to LLM --> TTS pipeline, in which case I would be able to show the exact input to the model, but that's way more work than just one single OpenAI API call.

pbbakkum · 7 months ago

Heyo, I work on the realtime api, this is a very cool app!

With transcription I would recommend trying out "gpt-4o-transcribe" or "gpt-4o-mini-transcribe" models, which will be more accurate than "whisper-1". On any model you can set the language parameter, see docs here: https://platform.openai.com/docs/api-reference/realtime-clie.... This doesn't guarantee ordering relative to the rest of the response, but the idea is to optimize for conversational-feeling latency. Hope this is helpful.

pbbakkum commented on Show HN: Inshellisense – IDE style shell autocomplete github.com/microsoft/insh... · Posted by u/cpendery

pbbakkum · 2 years ago

If you're interested in GPT-powered shell autocomplete, check out https://butterfi.sh

This also enables shell-aware LLM prompting!

pbbakkum commented on Butterfish – A Shell with AI Superpowers butterfi.sh... · Posted by u/pbbakkum

pbbakkum · 2 years ago

Do you work from the command line? Butterfish is a project I wrote for myself to use AI prompting seamlessly directly from the shell. I hope it's useful to others, give it a try and send feedback!

pbbakkum commented on Amazon acquires Fig fig.io/blog/post/fig-join... · Posted by u/fatfox

xcdzvyn · 2 years ago

Just curious, do you have any intent on adding local model support?

pbbakkum · 2 years ago

I've experimented with it, the reason I haven't yet added it is that I want deployment to be seamless, and it's not trivial to ship a binary that would (without extra fuss or configuration) efficiently support Metal and CUDA, plus download the models in a graceful way. This is of course possible, but still hard, and not clear if it's the right place to spend energy. I'm curious how you think about it - is your primary desire to work offline or avoid sending data to OpenAI? Or both?

pbbakkum commented on Amazon acquires Fig fig.io/blog/post/fig-join... · Posted by u/fatfox

pbbakkum · 2 years ago

Plugging a project of mine: I've been working on a similar idea for the era of LLMs: https://butterfi.sh.

It's much more bare-bones than Fig but perhaps useful if you're looking for an alternative! Send me feedback!

pbbakkum commented on Btop++: Resource monitor for processor, memory, disks, network and processes github.com/aristocratos/b... · Posted by u/ingve

pbbakkum · 2 years ago

If you're looking for a lighter weight TUI for Top information check out a recent project of mine here: https://github.com/bakks/poptop