RNN-Based Handwriting Recognition in Gboard

The "Making it Work, On-device" paragraph makes it seem like TensorFlow Lite will easily get your model running fast on-device, but in reality RNNs aren't currently supported by the TFLite Converter and the TFLiteLSTMCell example is super slow for training, so this is actually based on proprietary code not available to mere mortals using open source TensorFlow. If you were to actually try reproducing this work, you'd have to use several workarounds, dig deep into the TensorFlow source code, and possibly still end up with a suboptimal TFLite model.

Don't get me wrong, in terms of deployability and flexibility for production usage, TensorFlow/TFLite is really good, specially compared to other frameworks, but Google tends to oversell the abilities of open-source TensorFlow significantly in their marketing material, and you only find out when you go and try doing it yourself.

lawrenceyan · 6 years ago

For industry/real world work, TensorFlow is best in class. It is far superior to any other existing framework. I agree that there are always areas for improvement, but the way you worded your comment makes it almost sound like TF is pretty subpar compared to other offerings.

The reality is more, TensorFlow is really the only option you have if you don’t want to build everything from scratch again. Whether that’s a good or bad thing, well at least it’s because TensorFlow is actually a good product and not because Google is preventing others from building their own / pushing others down.

ru999gol · 6 years ago

what are you even talking about, tf is a mess. pytorch, mxnet, caffee2, etc. are all superior fameworks

perone · 6 years ago

That is an important point, Google is the master of releasing things in half, this was a common practice in Tensorflow since the initial release, they basically removed a lot of things to release it and it became a Frankenstein base of code. Bazel is another example, inside Google it works amazingly, but the open source project is a pain in the ass.

this is so awesome!

but how is it that we have RNN solutions for handwriting when we don't even have a standard, canned RNN for OCR?

I know tesseract and related projects exist, but when I've tried them they have been fairly brittle with lower accuracy than I was expecting. Accuracy was especially problematic for letter combinations like "-ing" that would consistently be recognized as "-mg".

Is there a good ML OCR library I'm missing?

pfortuny · 6 years ago

Just a side comment: take into account that (as per the paper) there is temporal input in Gboard (i.e. the timestamp of each stroke is important).

You do not have that for ing, so the software does not know that the dot is “independent”).

ocrcustomserver · 6 years ago

The reason is that online OCR (this particular case) is entirely different from offline OCR.

Online OCR is when you input the strokes directly on the tablet/phone, so it becomes a sequence of XY coordinates with an associated timestamp. It takes into account where you start and where you end the stroke on the canvas, along with the intermediate points (information galore).

Offline OCR is when you take a photo of your handwriting in your notebook, so you just get the raw pixels of a image. In offline OCR, you'd also have to properly segment and binarize the image before the OCR step.

With that being said, tesseract (version 4) uses an LSTM.

reubenmorais · 6 years ago

modeless · 6 years ago

Wow, it is really surprising to me that bezier curve control points produced by an optimization process would be good inputs to a neural net model. Small perturbations to the inputs could produce radically different bezier control points depending on the decisions made by the curve optimizer, so this forces the neural network to learn about the characteristics of the optimizer as well as the input.

Neural nets usually thrive on raw high dimensional inputs, so dramatically reducing the dimensionality of the input seems like a strange decision. I'm sure it improves speed, but I would expect higher accuracy by processing the raw input.

nielsbot · 6 years ago

I don't know ML, but I was surprised too, since converting a series of points into Bezier paths of a certain degree seems arbitrary...

appleflaxen · 6 years ago

AlphaWeaver · 6 years ago

Really cool stuff! My phone isn't big enough to do handwriting on, so I'm not really sure where this is supposed to be used? On a tablet I guess?

yorwba · 6 years ago

I just tried tout for refirsttime, and although the keyboardspace on my phone is barely large enough to cran five characters in there, the input scrolls sideraysautomatically if you litt your finger long enoy gh. So longer words can be entered as well. It doesn't seem to reevaluate previously decoded segments baselon what follows, though, so you can end up with weird misspellings at the beyinning of words. I dont think! Im going to use it from now on, beaurthere cognition is balenow ghto requiresigniti cant editing and the frictonisatittooncomfortable vihout a stylus.

Edited with the QWERTY keyboard:

I just tried it out for the first time, and although the keyboard space on my phone is barely large enough to cram five characters in there, the input scrolls sideways automatically if you lift your finger long enough. So longer words can be entered as well. It doesn't seem to reevaluate previously decoded segments based on what follows, though, so you can end up with weird misspellings at the beginning of words. I don't think I'm going to use it from now on, because the recognition is bad enough to require significant editing and the friction is a bit too uncomfortable without a stylus.

arbie · 6 years ago

How can I train an RNN to OCR my scribbles? It would be the perfect mix of physical paper and digital notes.

geophertz · 6 years ago

I miss the time when in Gboard you could use the slide typing to type multiple words at once. This was so useful and made people like me who are unable to type quickly on a virtual keyboard (touch) to type very fast.

dvh · 6 years ago

Isn't swiping inherently faster? With sweeping you need 1 angle (corner) for letter. Typical letter uses much more than 1 corner.

cbhl · 6 years ago

Only if you're writing words in the dictionary, and you're using the English alphabet.

Handwriting recognition is way more impactful for users in, say, Chinese.

dajohnson89 · 6 years ago

I just switched from Android to iPhone, and Gboard on iPhone doesn't have the translation function. It also doesn't have multiple languages -- if I want to switch languages I have to exit out of Gboard and use the default iOS keyboard. Anyone know why these features for Gboard are missing on iOS?

colde · 6 years ago

It does have multiple languages. Clicking the settings icon right between the button to switch to numbers and the emoji button switches languages. You can also hold it down to go to settings.

jjtheblunt · 6 years ago

these features are not missing: check settings