I built a DIY license plate reader with a Raspberry Pi and machine learning

The video referenced in the blog: https://www.youtube.com/watch?v=gsYEZtecXlA

mentions that they used 20 K80 instances to get real time inferencing. That seems a bit excessive IMO for an object detector.

xFlynnRider · 6 years ago

I'm the creator here. Yeah, 20 K80s is a bit excessive. That's because cortex (cortexlabs), which is the ML-model-deployment platform didn't initially have multiprocessing on each of their replicas - so I was bound to using just one CPU per GPU. AWS has instances with 4, 8, 16 vCPUs and so on.

Once cortex started supporting gunicorn (still yet unreleased but present on their master branch), I was able to reduce the number of GPUs significantly. Finally, the detector only needs 2 T4 GPUs and the identification part is the most expensive one (10 T4s GPUs) for a grand total of 12.

Converting the models to use single-precision could further reduce the need to about 1.5 GPUs for T4s or just 1.2 GPUs for V100 - which in both cases it would still mean using 2 GPUs.

dheera · 6 years ago

Consider using a smaller, lighter-weight network (e.g. TinyYOLO) and object tracking (instead of running inferences on every camera frame) for faster throughput -- I imagine you should be able to get through with <1/4 of a V100 and still real-time for all practical cases.

You can also customize the network to your use case, e.g. you don't need YOLO's default 5 anchor box sizes if you know the thing you're detecting is a license plate.

Also, profile your code and see where your bottleneck is. If your bottleneck is at NMS for example there are things you can do to speed it up. I've seen a lot of cases where the neural network runs fast but there's a lot of Python bloat for pre/post-processing -- not sure about yours without seeing code.

You really should be able to run a license plate detector/reader on something a lot smaller than a V100. A Xavier or quite possibly even a Jetson Nano would very likely be good enough if you use it well.

SergeAx · 6 years ago

I believe you may achieve the same result locally using https://github.com/openalpr/openalpr and cut your AWS and cell bills to exactly zero. It has Tesseract and OpenCV inside. Would love to see it as a part two of the article!

The obvious question is why not use a local accelerator?

Either the Neural Compute Stick or the Google Coral both have more than enough grunt to run real-time object detection models. Both will run on USB2 power. I don't know the overhead of good OCR, but license plates are a very standard format so perhaps you could train a second detector to extract the letters?

Even if you do OCR in the cloud, local bounding box extraction would save a huge amount of bandwidth.

siftrics · 6 years ago

Hey, founder of a relevant startup here. Just wanna chime in on OCR + bounding boxes performance. We offer text recognition with bounding boxes as a service. Our average processing duration, between reading bytes off the wire and writing the JSON response, is just under 3 seconds on average. Obviously that throws it out the window for frame-by-frame applications, but I think it’s still worth mentioning. The recognition is just as accurate as Google Cloud Vision —- it can handle human handwriting and even cursive, in most cases.

If you’re interested in trying it out: https://siftrics.com/

amenod · 6 years ago

Off topic: great video, love it how you explain what it is about. Good luck!

01100011 · 6 years ago

I'd think a Jetson Nano would be more than enough as well. No need to invoke the cloud.

xFlynnRider · 6 years ago

Probably yeah, but the potential of the cloud was much more appealing to me.

Detecting the license plates is really cheap computationally speaking, but not on the RPi. The most expensive part computationally was identifying the words (letters) - that's because detecting the text within the bounding boxes obtained from YOLOv3 is based on a VGG-16 model. Running that multiple times in a single frame (for multiple license plates) is expensive.

Surprisingly, the bandwidth was the least of my concerns. I was very surprised to see I didn't need much at all. For YOLOv3@ 416p and @30FPS I need about ~3Mbps. I wouldn't consider that much.

Now, this is a demo of what a production system could theoretically look like. I know it could be much better optimized.

joshvm · 6 years ago

Sure, it's a good exercise in infrastructure.

3Mbps doesn't sound much, but that's constant bandwidth. I have a 15 minute commute, which would make each journey about 400MB of data. That adds up quite fast especially on a mobile contract.

jlarocco · 6 years ago

"Machine Learning" is the wrong tool for the job here.

Tesseract OCR can do this, using only the Raspberry Pi, at a "good enough" framerate for any real driving situation.

beagle3 · 6 years ago

Tesseract only does well if the input is clean, aligned, two tone image, already cut down -- but LPR images you get from a moving camera are anything but.

Machine learning is a fine tool for this job. The specific machine learning setup is an overkill, though.

heavenlyblue · 6 years ago

Most of these problems are much easier to solve by simply picking a better camera. Moreover license plates are specifically made to be easier to read (ie reflective).

nine_k · 6 years ago

It's more like a "learnèd machine" approach. No need to train the model, but a pre-trained model may be (or not be, depending on circumstances!) more efficient than "hand-written" OCR approaches.

The article mentions using a bunch of GPUs in AWS to do the processing, so I'm guessing that's a pretty firm "no" in this case...

mister_hn · 6 years ago

OCR itself is machine learning?

mrweasel · 6 years ago

No, not necessarily.

alexcnwy · 6 years ago

Tesseract really baaaad imo

Why is it bad?

semi-extrinsic · 6 years ago

The solution described in the blog uses 12 Nvidia T4s, which are basically specialized RTX 2080s. Depending on what vCPUs etc. get used, it's running 1000-1500 W of computing power continuously.

If their car was electric, this project would be increasing the power consumption of the vehicle by around 10%.

figo22 · 6 years ago

I may have missed something, but I believe the T4s are running in the cloud, so it wouldn't have any effect on the car's power consumption.

To clarify, I mean indirectly the power consumption. I guess this will become an important point of discussion soon-ish, as we get more and more electric cars and ML stuff in them.

My view is, if your car uses electrons, and some of those are for compute, and you just offload the compute to the cloud, you haven't actually reduced the total electricity consumption.

Similarly here, the total "footprint" of the car is increased by adding this feature.

Piskvorrr · 6 years ago

For a project that's intended to be a part of a car, cloud only makes sense for prototyping. 900 ms latency? Uh-oh.

Aspos · 6 years ago

Plenty of room for optimization. Raspberry PI has enough power to detect license plate numbers on its own, offline.

ageofwant · 6 years ago

I really do appreciate the effort and the quality of the writeup, but this approach is a very inefficient and round-about way of going about things.

You can do real-time number plate recognition on a Pi3/4 with a Coral TPU or Movidius 2. Both of which costs ~US80. It's probably got a lot to do with hammers and nails. Not everything needs to be 'web-centric' or 'cloud based'

jfries · 6 years ago

Surprised by the negativity in the comments. This is an extremely impressive presentation of ability to put together current technologies and get the thing working front start to finish.

A+, would hire.

0x0aff374668 · 6 years ago

The reason for the negativity is that this demonstrates the "Design by StackOverflow" mentality where the solution is like swatting a fly with a sledgehammer and no real domain knowledge. Plus the author didn't even train the neural nets: it's just a LEGO project. I'd higher this person to be a lab intern, but nothing above that. The fact the author couldn't solve it locally and had to invoke the CLOUD is... laughable. This problem has been solved for over two decades on lesser hardware.

TL;DR: reinventing wheels is a good way to learn a lot.

For trying out something in a few hours, of course you don't want to spend hundreds of hours setting it up, by definition. Yes, the result is "just about works, but doesn't scale" - but that's the point of experimenting. Sure, this is a LEGO-style experiment in reinventing the wheel, but exactly for that, an excellent way to start learning about this problem domain: power consumption? Latency? ML basics? Sure. That's hacking at its core - even though the project is rudimentary.

Deleted Comment

bowmessage · 6 years ago

Would be interested in the $/hr cost to run something like this OP, 4G data & AWS bill combined.