Google’s new voice recognition system works instantly and offline (Pixel only)

"But it’s sort of funny considering hardly any of Google’s other products work offline. Are you going to dictate into a shared document while you’re offline? Write an email? Ask for a conversion between liters and cups? You’re going to need a connection for that!"

While offline, you might write email drafts, your blog, or even a book:

https://medium.com/@augustbirch/what-i-learned-writing-an-en...

What's missing is the ability to make edits using your phone. You can probably speak at over 100 words a minute but then you need to stop to bring up the software keyboard.

ehsankia · 6 years ago

The offline aspect is hardly the main draw here though. As mentioned earlier in the article, the latency reduction is huge. Another aspect they didn't really cover is privacy implications. Lastly, you may not be offline, but dodgy connections can also be a pain if you need a stable stream of packets going back and forth.

pault · 6 years ago

I refuse to put an amazon/apple/google surveillance device in my home, so I am very interested in a DIY digital assistant device. I'm aware of a few options but it seems like offline voice recognition is always a little sub-par. I am really looking forward to the day when an offline, open source digital assistant can compare in quality to a proprietary/cloud device.

ksec · 6 years ago

>As mentioned earlier in the article, the latency reduction is huge.

Well, on macOS offline voice recognition is actually much slower than online. Not to mention the choice of words and Vocab is quite limited. I love to get an offline version, but so far every online version seems to be better.

semi-extrinsic · 6 years ago

FWIW Google Translate (including the "translate from picture" feature) is an example of a product that has had offline option for quite some time. You have to tell it to download for each language pair IIRC.

ohnoesjmr · 6 years ago

Can't wait for different language models to be available to download for recognition, so one could have a genuine offline dictation between languages.

mellow-lake-day · 6 years ago

For the record it wasn't always this way, the last couple of years though they have made a lot of improvements on this front. I think it may have something to do with Google's "next billion of devices" being in countries with bad connectivity.

With that said I especially like the Google Maps offline features which have been added recently. You can even have it calculate driving directions completely offline if you have the starting and ending addresses.

maxxxxx · 6 years ago

Google Maps has an excellent offline mode on iOS and Android. I wish Apple Maps had that too.

int_19h · 6 years ago

If only it didn't force-expire downloaded maps after a while...

atomical · 6 years ago

It's one of those features that makes one think... Why am I being notified about this at all? Can't this be taken care of without my input?

jfoster · 6 years ago

Most of their products work offline and sync when a connection is regained. That includes Google Docs and Gmail.

bufferoverflow · 6 years ago

It means google doesn't need to pay for all the servers busy doing speech recognition. They shifted that work to the user's device.

jasonvorhe · 6 years ago

It's hilarious how they can't do anything right. If it's in the cloud it's evil because Google, if it happens on device it's evil because Google.

Just to be clear: This has nothing to do with "Wake Words" (e.g. OK Google, Alexa, Hey Siri, etc) which have always been handled offline/locally.

This is translating what you said after the wake word from voice to text on the local [Pixel] hardware rather than sending it into Google's Cloud.

The biggest benefits here are speed and reliability. It could also handle some actions offline.

penagwin · 6 years ago

Another benefit is privacy, this eliminates an entire set of potentially personal data from being handed off to Google.

usrusr · 6 years ago

On the other hand, when you can transcribe locally, uploading whole days worth of eavesropping would not cause a noticeable spike in traffic. I'd consider it more a lateral change than an improvement.

Someone1234 · 6 years ago

I'm unclear on if this moves the privacy needle. It says they do offline translation, but they still may attempt to send the audio clip to compare with the resulting text translation.

It could be used to improve privacy, I just don't know if it will be used that way.

__jal · 6 years ago

I doubt it.

I generally think of Google the same way I think of the NSA. If they stop doing something invasive, either it didn't work, they found a better way of doing it, or it was transferred to a legally distinct category, and we only hear about it because of PR considerations.

propogandist · 6 years ago

Gboard is governed by Google's catch all privacy policy, that allows them to gather all data and mine everything.

If you have an android device with Google services and a firewall, you'll see that the device is constantly phoning home, which is also noted in the privacy policy.

This does nothing for privacy, rather than provide the illusion of privacy.

modeless · 6 years ago

Google AI blog: https://ai.googleblog.com/2019/03/an-all-neural-on-device-sp...

arXiv: https://arxiv.org/abs/1811.06621

gok · 6 years ago

These are much much better links

p1esk · 6 years ago

Interesting that they're using RNN transducer. I thought everyone's moved to CNN lately.

lawrenceyan · 6 years ago

CNN's are only mentioned because of potential processing considerations as computationally, they are easier to deal with. But given the nature of speech recognition, which is so highly temporally correlated, it shouldn't be a surprise that a recurrent neural network would be used. This is pretty much exactly the purpose the RNN type of model architecture was designed for.

Also if you haven't looked into the properties of how exactly a RNN Transducer functions, I highly recommend doing so. They help resolve a great deal of problems that traditional RNNs and CNNs are unable to deal with.

pakl · 6 years ago

Feedforward CNNs cannot tolerate as much noise and real-world variability as RNNs.

Sorry to go on a tangent, but this is the first time I've heard the word "transducer" outside of a conversation about clojure. Is it the same concept?

braindead_in · 6 years ago

DeepSpeech 3 from Baidu also uses transducers. http://research.baidu.com/Blog/index-view?id=90.

melling · 6 years ago

hathawsh · 6 years ago

I just switched my Pixel 1 to airplane mode and tried voice input. Sure enough, it worked offline and it was fast! Very impressive work. (I've tried that before, but in the past it could only understand a few special phrases.) I suppose this new feature came with the security update my phone downloaded a few days ago.

There are lots of ways to spin this, but I see it as a significant improvement for any app that could benefit from voice input. It's immediate and not susceptible to network glitches. The benefit for Google, IMHO, is primarily more sales of updated Android devices.

tacomonstrous · 6 years ago

Unless you very recently (meaning today) accepted a download of a new language pack for English, it's likely just the old model, which is perfectly functional, while not being as accurate as the online version.

More specifically:

Gboard > Voice Typing > Faster voice typing

It says its an 85MB download for US-English

OK, thanks for the clarification.

dragonwriter · 6 years ago

> But it’s sort of funny considering hardly any of Google’s other products work offline.

I dunno, Android and a lot of Google's mobile apps that aren't about online communication work fine offline. Actually, a lot of the online communications ones do too, as much as is even conceivable, they just don't transmit and receive offline, because, how would they?

adzm · 6 years ago

Does the Pixel have some specific hardware that this uses, or is it simply limited to Pixel to limit the rollout? I am curious if I should get my hopes up to see this on gboard with non-Pixel Android devices.

joshvm · 6 years ago

The Pixel 2+ does have a coprocessor for compute workloads (the Visual Core). However users here have reported this working on a Pixel 1, which doesn't have that chip.

The Verge says it may reach other devices later.

It sounds like it's both better than the old dictation model, and significantly smaller.

bad_user · 6 years ago

AI systems that are able to work offline are great for privacy.

The thought that every interaction with my phone is being streamed in realtime to a third party server freaks me out.

Kudos to Google for working on this.

amelius · 6 years ago

They can still send the information they collected from your microphone later, when you connect to the internet ...

You want an open source solution, not just an offline solution.

Even so, offline AI solutions have been piss poor and Google moving the state of the art in spite of their vested interest in keeping people online is a good thing.

Yes, we want an open source solution, but I'm not going to work on it. So who's going to work on it? Are you?

In absence of resources working towards the ideal, I'll applaud any step in the right direction.

eternal_virgin · 6 years ago

I hope they add this to the Web Speech API in Chrome. It doesn't have punctuation right now and that's a killer for me.

jsight · 6 years ago

Didn't they advertise something like this a few years ago? I seem to remember trying it and finding that it didn't really work as well as the online recognition at the time.

EDIT: Looks like something was added in Jelly Bean: https://stackoverflow.com/questions/17616994/offline-speech-...