Readit News logoReadit News
Posted by u/the_king 9 months ago
Show HN: Aqua Voice 2 – Fast Voice Input for Mac and Windowswithaqua.com...
Hey HN - It’s Finn and Jack from Aqua Voice (https://withaqua.com). Aqua is fast AI dictation for your desktop and our attempt to make voice a first-class input method.

Video: https://withaqua.com/watch

Try it here: https://withaqua.com/sandbox

Finn is uber dyslexic and has been using dictation software since sixth grade. For over a decade, he’s been chasing a dream that never quite worked — using your voice instead of a keyboard.

Our last post (https://news.ycombinator.com/item?id=39828686) about this seemed to resonate with the community - though it turned out that version of Aqua was a better demo than product. But it gave us (and others) a lot of good ideas about what should come next.

Since then, we’ve remade Aqua from scratch for speed and usability. It now lives on your desktop, and it lets you talk into any text field -- Cursor, Gmail, Slack, even your terminal.

It starts up in under 50ms, inserts text in about a second (sometimes as fast as 450ms), and has state-of-the-art accuracy. It does a lot more, but that’s the core. We’d love your feedback — and if you’ve got ideas for what voice should do next, let’s hear them!

idk1 · 9 months ago
I’ve been using this for some time and I have to say it is fantastic. I’m intentionally not writing this with Aqua but by hand and it is taking so much longer. This to me feels like what Apple Intelligence could be, it is so much better than stuff all of the big tech is doing. For example, if you tell Siri voice dictation to go back and delete something what Siri will do is just write out “go back and delete something“ also if you tell Siri to go back and spell a name differently all Siri will do is write out the letters that you said to go back and type out. Honestly, for voice dictation software it feels like travelling to another planet in terms of improvement.
niel · 9 months ago
Real-time text output à la Apple Dictation with the accuracy of Whisper is something I've been looking for recently - I'll definitely give Aqua a spin.

MacWhisper [0] (the app I settled on) is conspicuously missing from your benchmarks [1]. How does it compare?

[0]: https://goodsnooze.gumroad.com/l/macwhisper

[1]: https://withaqua.com/blog/benchmark-nov-2024

the_king · 9 months ago
We're more accurate and much faster than Mac Whisper, even their strongest model (Whisper Cpp Large V3).

For that benchmarking table, you can use Whisper Large V3 as a stand-in for Mac Whisper and Super Whisper accuracy.

pbowyer · 8 months ago
I've been using Superwhisperapp for over a year and I get nothing like the error level your comparison table suggests. Which model were you using with it?

Aqua looks good and I will be testing it, but I do like that with superwhisper nothing leaves my computer unless I add AI integrations.

aylmao · 9 months ago
This is super impressive, great job!

Side-comment of something this made me think of (again): tech builds too much for tech. I've lived in the Bay before, so I know why this happens. When you're there, everyone around you is in tech, your girlfriend is in tech, you go to parties and everyone invariably ends up talking about work, which is tech. Your frustrations are with tech tools and so are your peers', so you're constantly thinking about tech solutions applicable to tech's problems.

This seems very much marketed to SF people doing SF things ("Cursor, Gmail, Slack, even your terminal"). I wonder how much effort has gone into making this work with code editors or the terminal, even though I doubt this would a big use-case for this software if it ever became generally popular. I'd imagine the market here is much larger in education, journalism, film, accessibility, even government. Those are much more exciting demos.

the_king · 9 months ago
thanks!

I share the same sentiment. I remember thinking in college how annoying it was that I was reading low-resolution, marked-up, skewed, b&w scans of a book using Adobe Acrobat while CS concentrators were doing everything in VS Code (then brand new).

but we do think voice is actually great with Cursor. It’s also really useful in the terminal for certain things. Checking out or creating branches, for example.

fxtentacle · 9 months ago
This looks like it'll slurp up all your data and upload it into a cloud. Thanks, no. I want privacy, offline mode and source code for something as crucial to system security as an input method.

"we also collect and process your voice inputs [..] We leverage this data for improvements and development [..] Sharing of your information [..] service providers [..] OpenAI" https://withaqua.com/privacy

FloatArtifact · 9 months ago
Local inference only is an absolute requirement. It's not even really all that accessible if it's online only. I can say this as someone that's used over 20000 hours worth of voice dictation and computer control.
canada_dry · 9 months ago
First thing I looked for and read: the FAQ.

No mention of privacy (or on prem) - so assumed it's 100% cloud.

Non-starter for me. Accuracy is important, but privacy is more so.

Hopefully a service with these capabilities will be available where the first step has the user complete a brief training session, sends that to the cloud to tailor the recognition parameters for their voice and mannerisms... then loads that locally.

oulipo · 8 months ago
A similar but offline tool is VoiceInk, it's also open-source so you can extend it
pokstad · 9 months ago
This should be on the FAQ. I was trying to find out if it was 100% processed locally.
jmcintire1 · 9 months ago
fair point. offline+local would be ideal, but as it stands we can't run asr and an llm locally at the speed that is required to provide the level of service we want to.

given that we need the cloud, we offer zero data retention -- you can see this in the app. your concern is as much about ux and communications as it is privacy

fxtentacle · 8 months ago
The problem if you actually need the cloud is that it kind of completely destroys your business model. OpenAI is bleeding money every month because they massively subsidize the hosting cost of their models. But eventually they will have to post a profit. And then if they know that your product is completely dependent on their API, they can milk you until there's no profits left for you.

And self-hosting real-time streaming LLMs will probably also come out at 50 cents per hour. Arguing a $120/month price for power users is probably going to be very difficult. Especially so if there is free open-source alternatives.

mrtesthah · 9 months ago
MacWhisper does realtime system-wide dictation on your local machine (among other things). Just a one-time fee for an app you download -- the way shareware is supposed to be. Of course it doesn't use MoE transcription with 6 models like Aqua Voice, but if you guys expect to be acquired by Apple (that is your exit strategy, right?), you're going to need better guarantees of privacy than "we don't log".
toddmorey · 9 months ago
And man it's another monthly subscription. I'm not mad at them for finding a gap in the market and putting a business around it. I'm mad at Apple for leaving that gap... hopefully built in voice dictation improves quickly.
FireBeyond · 9 months ago
Is there a gap in the market? It's being rapidly filled with the likes of MacWhisper, etc., which offer local-only, one-off pricing.
pablopeniche · 8 months ago
"hopefully built in voice dictation improves quickly." I would not hold my breath on that one lol
jackthetab · 9 months ago
Agreed.

This is where I bounce (out of this discussion).

thmsmlr · 9 months ago
I totally agree, I created BetterDictation (.com) exactly because of that. Offline was a super important requirement for me.
jrvarela56 · 8 months ago
Feedback: I use MacWhisper and Tiny wisperkit model (english only) is way faster than any cloud service on my M1 macbook pro.

I’d say local is necessary for delightful product experience and the added bonus is that it ticks the privacy box

Deleted Comment

brianjking · 8 months ago
How much ram is in your m1?
jrvarela56 · 8 months ago
16gb
alxlu · 9 months ago
I’ve been using this for a while now and I really enjoy it. I ran into a semi-obscure bug and emailed them and they basically fixed it the same day.

I do wish there was a mobile app though (or maybe an iOS keyboard). It would also be nice to be able to have a separate hotkey you can set up to send the output to a specific app (instead of just the active one).

the_king · 9 months ago
thanks! We're working on iOS, but it's tough to get the ergos right given all of Apple's restrictions and neglected APIs.
polishdude20 · 8 months ago
Android app please!
pablopeniche · 8 months ago
<3
rkagerer · 9 months ago
You mentioned it "lives on your desktop". How does licensing work, and can you install and use it on a machine without internet access?
rickydroll · 8 months ago
I've been using Aqua since it was announced on HNN. I've survived the teething pains by using a mixture of Aqua and Dragon, depending on what I was doing. With this new Windows app, I've given up using Dragon for anything.

Things I've learned are:

1. It works better if you're connected by Ethernet than by Wi-Fi.

2. It needs to have a longer recognition history because sometimes you hit the wrong key to end a recognition session, and it loses everything.

3. Besides the longer history, a debugging mode that records all the characters sent to the dictation box would be useful. Sometimes, I see one set of words, blink, and then it's replaced with a new recognition result. Capturing would be useful in describing what went wrong.

4. There should be a way to tell us when a new version is running. Occasionally, I've run into problems where I'm getting errors, and I can't tell if it's my speaking, my audio chain, my computer, the network, or the app.

5. Grammarly is a great add-on because it helps me correct mis-speakings and odd little errors, like too many spaces caused by starting and stopping recognition.

When Dragon Systems went through bankruptcy court, a public benefits corporation bid for the core technology because it recognized that Dragon was a critical tool for people with disabilities to function in a digital world.

In my opinion, Aqua has reached a similar status as an essential tool. Well, it doesn't fully replace Dragon for those who need command and control (yet). The recognition accuracy and smoothness are so amazing that I can't envision returning to Dragon Systems without much pain. The only thing worse would be going back to a keyboard.

Aqua Guys, don't fuck it up.