Show HN: Aqua Voice 2 – Fast Voice Input for Mac and Windows

This looks like it'll slurp up all your data and upload it into a cloud. Thanks, no. I want privacy, offline mode and source code for something as crucial to system security as an input method.

"we also collect and process your voice inputs [..] We leverage this data for improvements and development [..] Sharing of your information [..] service providers [..] OpenAI" https://withaqua.com/privacy

FloatArtifact · 9 months ago

Local inference only is an absolute requirement. It's not even really all that accessible if it's online only. I can say this as someone that's used over 20000 hours worth of voice dictation and computer control.

canada_dry · 9 months ago

First thing I looked for and read: the FAQ.

No mention of privacy (or on prem) - so assumed it's 100% cloud.

Non-starter for me. Accuracy is important, but privacy is more so.

Hopefully a service with these capabilities will be available where the first step has the user complete a brief training session, sends that to the cloud to tailor the recognition parameters for their voice and mannerisms... then loads that locally.

oulipo · 8 months ago

A similar but offline tool is VoiceInk, it's also open-source so you can extend it

pokstad · 9 months ago

This should be on the FAQ. I was trying to find out if it was 100% processed locally.

jmcintire1 · 9 months ago

fair point. offline+local would be ideal, but as it stands we can't run asr and an llm locally at the speed that is required to provide the level of service we want to.

given that we need the cloud, we offer zero data retention -- you can see this in the app. your concern is as much about ux and communications as it is privacy

fxtentacle · 8 months ago

The problem if you actually need the cloud is that it kind of completely destroys your business model. OpenAI is bleeding money every month because they massively subsidize the hosting cost of their models. But eventually they will have to post a profit. And then if they know that your product is completely dependent on their API, they can milk you until there's no profits left for you.

And self-hosting real-time streaming LLMs will probably also come out at 50 cents per hour. Arguing a $120/month price for power users is probably going to be very difficult. Especially so if there is free open-source alternatives.

mrtesthah · 9 months ago

MacWhisper does realtime system-wide dictation on your local machine (among other things). Just a one-time fee for an app you download -- the way shareware is supposed to be. Of course it doesn't use MoE transcription with 6 models like Aqua Voice, but if you guys expect to be acquired by Apple (that is your exit strategy, right?), you're going to need better guarantees of privacy than "we don't log".

toddmorey · 9 months ago

And man it's another monthly subscription. I'm not mad at them for finding a gap in the market and putting a business around it. I'm mad at Apple for leaving that gap... hopefully built in voice dictation improves quickly.

FireBeyond · 9 months ago

Is there a gap in the market? It's being rapidly filled with the likes of MacWhisper, etc., which offer local-only, one-off pricing.

pablopeniche · 8 months ago

"hopefully built in voice dictation improves quickly." I would not hold my breath on that one lol

jackthetab · 9 months ago

Agreed.

This is where I bounce (out of this discussion).

thmsmlr · 9 months ago

I totally agree, I created BetterDictation (.com) exactly because of that. Offline was a super important requirement for me.

I've been using Aqua since it was announced on HNN. I've survived the teething pains by using a mixture of Aqua and Dragon, depending on what I was doing. With this new Windows app, I've given up using Dragon for anything.

Things I've learned are:

1. It works better if you're connected by Ethernet than by Wi-Fi.

2. It needs to have a longer recognition history because sometimes you hit the wrong key to end a recognition session, and it loses everything.

3. Besides the longer history, a debugging mode that records all the characters sent to the dictation box would be useful. Sometimes, I see one set of words, blink, and then it's replaced with a new recognition result. Capturing would be useful in describing what went wrong.

4. There should be a way to tell us when a new version is running. Occasionally, I've run into problems where I'm getting errors, and I can't tell if it's my speaking, my audio chain, my computer, the network, or the app.

5. Grammarly is a great add-on because it helps me correct mis-speakings and odd little errors, like too many spaces caused by starting and stopping recognition.

When Dragon Systems went through bankruptcy court, a public benefits corporation bid for the core technology because it recognized that Dragon was a critical tool for people with disabilities to function in a digital world.

In my opinion, Aqua has reached a similar status as an essential tool. Well, it doesn't fully replace Dragon for those who need command and control (yet). The recognition accuracy and smoothness are so amazing that I can't envision returning to Dragon Systems without much pain. The only thing worse would be going back to a keyboard.

Aqua Guys, don't fuck it up.

idk1 · 9 months ago

I’ve been using this for some time and I have to say it is fantastic. I’m intentionally not writing this with Aqua but by hand and it is taking so much longer. This to me feels like what Apple Intelligence could be, it is so much better than stuff all of the big tech is doing. For example, if you tell Siri voice dictation to go back and delete something what Siri will do is just write out “go back and delete something“ also if you tell Siri to go back and spell a name differently all Siri will do is write out the letters that you said to go back and type out. Honestly, for voice dictation software it feels like travelling to another planet in terms of improvement.

niel · 9 months ago

Real-time text output à la Apple Dictation with the accuracy of Whisper is something I've been looking for recently - I'll definitely give Aqua a spin.

MacWhisper [0] (the app I settled on) is conspicuously missing from your benchmarks [1]. How does it compare?

[0]: https://goodsnooze.gumroad.com/l/macwhisper

[1]: https://withaqua.com/blog/benchmark-nov-2024

the_king · 9 months ago

We're more accurate and much faster than Mac Whisper, even their strongest model (Whisper Cpp Large V3).

For that benchmarking table, you can use Whisper Large V3 as a stand-in for Mac Whisper and Super Whisper accuracy.

pbowyer · 8 months ago

I've been using Superwhisperapp for over a year and I get nothing like the error level your comparison table suggests. Which model were you using with it?

Aqua looks good and I will be testing it, but I do like that with superwhisper nothing leaves my computer unless I add AI integrations.

aylmao · 9 months ago

This is super impressive, great job!

Side-comment of something this made me think of (again): tech builds too much for tech. I've lived in the Bay before, so I know why this happens. When you're there, everyone around you is in tech, your girlfriend is in tech, you go to parties and everyone invariably ends up talking about work, which is tech. Your frustrations are with tech tools and so are your peers', so you're constantly thinking about tech solutions applicable to tech's problems.

This seems very much marketed to SF people doing SF things ("Cursor, Gmail, Slack, even your terminal"). I wonder how much effort has gone into making this work with code editors or the terminal, even though I doubt this would a big use-case for this software if it ever became generally popular. I'd imagine the market here is much larger in education, journalism, film, accessibility, even government. Those are much more exciting demos.

thanks!

I share the same sentiment. I remember thinking in college how annoying it was that I was reading low-resolution, marked-up, skewed, b&w scans of a book using Adobe Acrobat while CS concentrators were doing everything in VS Code (then brand new).

but we do think voice is actually great with Cursor. It’s also really useful in the terminal for certain things. Checking out or creating branches, for example.

fxtentacle · 9 months ago

jrvarela56 · 8 months ago

Feedback: I use MacWhisper and Tiny wisperkit model (english only) is way faster than any cloud service on my M1 macbook pro.

I’d say local is necessary for delightful product experience and the added bonus is that it ticks the privacy box

Deleted Comment

brianjking · 8 months ago

How much ram is in your m1?

16gb

alxlu · 9 months ago

I’ve been using this for a while now and I really enjoy it. I ran into a semi-obscure bug and emailed them and they basically fixed it the same day.

I do wish there was a mobile app though (or maybe an iOS keyboard). It would also be nice to be able to have a separate hotkey you can set up to send the output to a specific app (instead of just the active one).

thanks! We're working on iOS, but it's tough to get the ergos right given all of Apple's restrictions and neglected APIs.

polishdude20 · 8 months ago

Android app please!

rkagerer · 9 months ago

You mentioned it "lives on your desktop". How does licensing work, and can you install and use it on a machine without internet access?

rickydroll · 8 months ago