YOLOv5 on FPGA with Hailo-8 and 4 Pi Cameras

After years of wondering, I have to ask.

What are real life actual useful cases for this tech?

I can imagine in manufacturing: detecting defects or layout mismatch - that's one.

Is there any open source project that uses a image recognition library to achieve any useful task? All I've seen from board partners seem to at most provide very simple demos, where a box with label is drawn around an object. Who actually is using that information, how and for what?

I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK and still have a very hard time defending this tech in eyes of coworkers that only see this as a surveillance tech

zerojames · a year ago

Great question! I work for a computer vision company (Roboflow) and have seen computer vision used for everything from accident prevention on critical infrastructure to identifying defects on vehicle parts to detecting trading cards for use in video game applications.

Drawing bounding boxes is a common end point for demos, but for businesses using computer vision there is an entire world after that: on device deployment. This can be on devices like an NVIDIA Jetson (a very common choice), to Raspberry Pis to central CUDA GPU servers for processing large volumes of data (maybe connected to cameras over RTSP).

Note: There are many models that are faster and perform better than YOLOv5 (i.e. YOLOv8, YOLOv10, PaliGemma). Roboflow Inference that our ML team maintains has various guides on deploying models to the edge: https://inference.roboflow.com/#inference-pipeline

alandarev · a year ago

Can you go into some examples?

dekhn · a year ago

I use object detection to track tardigrades in my custom motorized microscope. It's very useful for making long observations in a field much larger than the scope's field of view.

The system works quite simply: I start with an existing object detector and train it with a small (<100) number of manually labelled images. Then during inference, I move the scope's field of view using motor commands to put the center of the tardigrade at the center of the field of view.

This technology is very useful for doing long-term observations of tardigrades (so, useful for science).

eurekin · a year ago

Thank you! Is the detection accurate enough, or simply the observation conclusions are not that sensitive to minor errors?

That makes me want to revisit my previous idea: boiling soup spillage detector. I once had a google meeting with a cooking soup to keep an eye on it and thought, heck, that seems like a nice exercise for a visual detector finetune

VTimofeenko · a year ago

Frigate uses models like this one for NVR:

https://frigate.video/

QueenAdrielle · a year ago

This hailo 8l is extremely useful in mobile robotic systems particularly when paired with an rpi 5. The main board used in this demo is.... not particularly useful however. Generally speaking boards like fpga miss the forest for the trees. As a systems engineer, where am I supposed to put such big board? Also I can expect that, even if my team incorporated this board into a mobile system, it become vaporware well before we deploy anything due to low production numbers and we'd be paying ebay scalpers 2x as much because our distributors wouldn't carry them anymore lol.

As for the rpi 5 combination, the power draw is relatively low. The whole thing clocks in at about 14 watts on an rpi 5 which allows us to run this platform off of a battery. With 26 tops, this setup can contend with the jetson xavier nx (21 tops, ~$500) and the jetson orin nano (40 tops, ~$500) for for a cost of around $170. Furthermore the cpu on the rpi 5 is generally more performant than the xavier nx.

Specifically, this is an excellent vision module for real-time object detection from multiple cameras if setup properly, while maintaining access to the prolific raspberry pi hardware ecosystem which is typically cheaper than the jetson ecosystem.

daemonologist · a year ago

I'm working on a project that detects climbing holds and lets you set routes by selecting them. (The usual method is putting a bit of colored tape on each hold, where the color corresponds to a route. This works great but becomes difficult to read once more than four or five routes share a hold.) YOLO made the computer vision part of this pretty smooth sailing.

adolph · a year ago

Embed the holds with a led and ir sensor a la swift concert [0] and you’ve got the whole package.

0. https://news.ycombinator.com/item?id=40492515

mechagodzilla · a year ago

As you guessed, high-speed machine vision stuff is frequently used in manufacturing settings for sorting or various quality control tasks. Imagine a picking out bad potatoes on a conveyor belt moving 10s of potatoes per second, or identifying particle counts and size distributions in a stream of water to gauge water quality.

eurekin · a year ago

Plus, being a NN it might be possible to detect a foreign object with relative ease (comparing to the classic computer vision); like a rat

sachin9 · a year ago

I've been very persistent over the past few months in developing a system for agriculture as a primary use case. I want to deploy features to classify crop type, height, vegetation stage, and other important metrics to achieve real-time or near real-time analytics.

Do you have any suggestions on how to proceed further? So far, I've procured a Jetson, five cameras, a stand to fix and calibrate the modules, and a cam array hat to equip four cameras and the jetson. I was checking out VPU and NPUS and other hardware as well but struggling to identify compatible hardware. How can I get ahead and build such model to test and validate in 3 Months of time ?

mmcwilliams · a year ago

I've used YOLOv5 models in robotics for object detection. While VLMs are great at describing images more generally, having a bounding box of a detection with a confidence score is very useful when paired with depth cameras for locating objects in an open environment. Especially when it can be run on-board and at framerate.

nickpsecurity · a year ago

Field mice that I thought were moles have destroyed my yard. There’s so many tunnels that I can’t tell which are most active. A camera AI that can show which parts of the ground changed significantly would be nice.

At a hotel, we had a problem of luggage carts going missing. There’s a few ways to deal with that. A generic one that would support other, use cases would be to let the camera tell you the last room it went in. Likewise, outdoor cameras might tell you which vehicles had a customer walk in the hotel and which might be non-guests.

euroderf · a year ago

> I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK

I ought a Kinect but never got it working correctly with a Mac. What is the state of the art here ? Is there active development anywhere ?

yazzku · a year ago

To surveil people in the streets.

eurekin · a year ago

Yeah, that's what I'm afraid

That looked interesting as a self-hostable project until it got to the requirement for a $3200 AMD board. Maybe the price will come down one day...

DoingIsLearning · a year ago

The ultrascale is definitely pricy for non-industry applications. The FPGA design is probably larger because of the 4x camera pipeline.

Perhaps with a single camera you could port this to fit on a Zynq 7000 footprint with something like a Pynq Z1 or Numato Styx, which are in the $250 hobbies price point for example.

15155 · a year ago

No chance does the Zynq 701x series perform capably for this task: it's weak Artix fabric and there's only ~50kLUT on the common parts. Even fully pipelined designs, especially those that interface with the hard AXI bus, will struggle to clock above 50MHz.

The Zynq Z7030+ chip would probably be able to get the job done but aren't as common - https://www.lcsc.com/product-detail/Programmable-Logic-Devic.... The Kintex 410T-480T are available new <$160 from reputable vendors, used <$50 from less-reputable vendors - they do have the performance (and overall IO) for this task.

sorenjan · a year ago

Depending on what you're trying to do you might not need it. A popular option for object detection people use with their home hosted video surveillance (using Frigate and Home Assistant) is the Coral TPU and Raspberry Pi or some decommissioned thin client.

zimpenfish · a year ago

I did consider the Intel Neural Compute Stick to accelerate with OpenVINO but they're discontinued and it turns out I can get away with doing fewer detections (birds, not people) by doing a pre-filter of motion detection (reduces the number of frames through OpenCV's DNN by 10x).

muxamilian · a year ago

Google Coral seems abandoned by Google. Not official but last news on their page from May 2022

15155 · a year ago

~400kLUT is very obtainable these days - if you don't need the ARM cores:

https://www.lcsc.com/product-detail/Programmable-Logic-Devic...

(420T/480T are priced similarly new.)

dailykoder · a year ago

At the moment I am wondering if I could build an accelerator for george hotz' tinygrad[1] with cheap FPGAs (i do have an arty a7 35K, this might be too small, i guess?). According to the readme it shall be "easy" to add new accelerator hardware. Sadly my knowledge is still a bit limited around all the python-machinelearning-ecosystem, but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.

Didn't have enough time to dive into it yet and still working on some other project, but this still tickles the back of my head and would be cool even if I could only run mnist on it.

- [1] https://github.com/tinygrad/tinygrad/

gh02t · a year ago

> but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.

To use it with an FPGA accelerator you also have to build all the "hardware" to run said openCL kernel efficiently, manage data transfer, talk to the host, etc for the FPGA. This is very foreign if you're only used to doing software design and still very nontrivial even if you've done FPGA work, though I think there are some open hardware projects around doing this.