What are real life actual useful cases for this tech?
I can imagine in manufacturing: detecting defects or layout mismatch - that's one.
Is there any open source project that uses a image recognition library to achieve any useful task? All I've seen from board partners seem to at most provide very simple demos, where a box with label is drawn around an object. Who actually is using that information, how and for what?
I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK and still have a very hard time defending this tech in eyes of coworkers that only see this as a surveillance tech
Great question! I work for a computer vision company (Roboflow) and have seen computer vision used for everything from accident prevention on critical infrastructure to identifying defects on vehicle parts to detecting trading cards for use in video game applications.
Drawing bounding boxes is a common end point for demos, but for businesses using computer vision there is an entire world after that: on device deployment. This can be on devices like an NVIDIA Jetson (a very common choice), to Raspberry Pis to central CUDA GPU servers for processing large volumes of data (maybe connected to cameras over RTSP).
Note: There are many models that are faster and perform better than YOLOv5 (i.e. YOLOv8, YOLOv10, PaliGemma). Roboflow Inference that our ML team maintains has various guides on deploying models to the edge: https://inference.roboflow.com/#inference-pipeline
I use object detection to track tardigrades in my custom motorized microscope. It's very useful for making long observations in a field much larger than the scope's field of view.
The system works quite simply: I start with an existing object detector and train it with a small (<100) number of manually labelled images. Then during inference, I move the scope's field of view using motor commands to put the center of the tardigrade at the center of the field of view.
This technology is very useful for doing long-term observations of tardigrades (so, useful for science).
Thank you! Is the detection accurate enough, or simply the observation conclusions are not that sensitive to minor errors?
That makes me want to revisit my previous idea: boiling soup spillage detector. I once had a google meeting with a cooking soup to keep an eye on it and thought, heck, that seems like a nice exercise for a visual detector finetune
This hailo 8l is extremely useful in mobile robotic systems particularly when paired with an rpi 5. The main board used in this demo is.... not particularly useful however. Generally speaking boards like fpga miss the forest for the trees. As a systems engineer, where am I supposed to put such big board? Also I can expect that, even if my team incorporated this board into a mobile system, it become vaporware well before we deploy anything due to low production numbers and we'd be paying ebay scalpers 2x as much because our distributors wouldn't carry them anymore lol.
As for the rpi 5 combination, the power draw is relatively low. The whole thing clocks in at about 14 watts on an rpi 5 which allows us to run this platform off of a battery. With 26 tops, this setup can contend with the jetson xavier nx (21 tops, ~$500) and the jetson orin nano (40 tops, ~$500) for for a cost of around $170. Furthermore the cpu on the rpi 5 is generally more performant than the xavier nx.
Specifically, this is an excellent vision module for real-time object detection from multiple cameras if setup properly, while maintaining access to the prolific raspberry pi hardware ecosystem which is typically cheaper than the jetson ecosystem.
I'm working on a project that detects climbing holds and lets you set routes by selecting them. (The usual method is putting a bit of colored tape on each hold, where the color corresponds to a route. This works great but becomes difficult to read once more than four or five routes share a hold.) YOLO made the computer vision part of this pretty smooth sailing.
As you guessed, high-speed machine vision stuff is frequently used in manufacturing settings for sorting or various quality control tasks. Imagine a picking out bad potatoes on a conveyor belt moving 10s of potatoes per second, or identifying particle counts and size distributions in a stream of water to gauge water quality.
I've been very persistent over the past few months in developing a system for agriculture as a primary use case. I want to deploy features to classify crop type, height, vegetation stage, and other important metrics to achieve real-time or near real-time analytics.
Do you have any suggestions on how to proceed further? So far, I've procured a Jetson, five cameras, a stand to fix and calibrate the modules, and a cam array hat to equip four cameras and the jetson.
I was checking out VPU and NPUS and other hardware as well but struggling to identify compatible hardware.
How can I get ahead and build such model to test and validate in 3
Months of time ?
I've used YOLOv5 models in robotics for object detection. While VLMs are great at describing images more generally, having a bounding box of a detection with a confidence score is very useful when paired with depth cameras for locating objects in an open environment. Especially when it can be run on-board and at framerate.
Field mice that I thought were moles have destroyed my yard. There’s so many tunnels that I can’t tell which are most active. A camera AI that can show which parts of the ground changed significantly would be nice.
At a hotel, we had a problem of luggage carts going missing. There’s a few ways to deal with that. A generic one that would support other, use cases would be to let the camera tell you the last room it went in. Likewise, outdoor cameras might tell you which vehicles had a customer walk in the hotel and which might be non-guests.
A good point that shouldn't be dismissed so quickly. This tracking also has little use beyond surveillance, so one has to wonder what is it that the author thinks they are doing, or why they think it's interesting or useful. That they then go ahead and film a crowd without their consent is more telling of their position on this moral question than it is any direct harm to the people in the video.
Oh, the horror. Holy shit was that Jason walking by? He told me he was in Alaska...I'm going to need to have a talk with him. Thank goodness I ran across this video.
The ultrascale is definitely pricy for non-industry applications. The FPGA design is probably larger because of the 4x camera pipeline.
Perhaps with a single camera you could port this to fit on a Zynq 7000 footprint with something like a Pynq Z1 or Numato Styx, which are in the $250 hobbies price point for example.
No chance does the Zynq 701x series perform capably for this task: it's weak Artix fabric and there's only ~50kLUT on the common parts. Even fully pipelined designs, especially those that interface with the hard AXI bus, will struggle to clock above 50MHz.
The Zynq Z7030+ chip would probably be able to get the job done but aren't as common - https://www.lcsc.com/product-detail/Programmable-Logic-Devic.... The Kintex 410T-480T are available new <$160 from reputable vendors, used <$50 from less-reputable vendors - they do have the performance (and overall IO) for this task.
Depending on what you're trying to do you might not need it. A popular option for object detection people use with their home hosted video surveillance (using Frigate and Home Assistant) is the Coral TPU and Raspberry Pi or some decommissioned thin client.
I did consider the Intel Neural Compute Stick to accelerate with OpenVINO but they're discontinued and it turns out I can get away with doing fewer detections (birds, not people) by doing a pre-filter of motion detection (reduces the number of frames through OpenCV's DNN by 10x).
At the moment I am wondering if I could build an accelerator for george hotz' tinygrad[1] with cheap FPGAs (i do have an arty a7 35K, this might be too small, i guess?). According to the readme it shall be "easy" to add new accelerator hardware. Sadly my knowledge is still a bit limited around all the python-machinelearning-ecosystem, but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.
Didn't have enough time to dive into it yet and still working on some other project, but this still tickles the back of my head and would be cool even if I could only run mnist on it.
> but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.
To use it with an FPGA accelerator you also have to build all the "hardware" to run said openCL kernel efficiently, manage data transfer, talk to the host, etc for the FPGA. This is very foreign if you're only used to doing software design and still very nontrivial even if you've done FPGA work, though I think there are some open hardware projects around doing this.
Opencv, YOLO, and darknet were some of the best tools back when I was doing research projects with this. While it is way more efficient compared to say feeding it into a RAG LLM.....it is process intensive.
Its hard to handle multiple data streams of video, and I think I maxed out around 20 camera feeds before the computer slowed to a crawl. NVME storage and better internet would defiantly push the limit on what is possible, but unfortunately it is impossible to get good internet where I live. Also, cameras are not cheap and neither is flash storage. But I do know this stuff sees use in both commercial and I bet they use some better version for the defense industry.
I'm thinking of an external camera (weather resistant) and hardware. The hardware could be a small computer that connects to the camera (maybe with wifi?), and runs the YOLO model.
That already exists. Lilin is one of several CCTV companies implementing on-camera YOLO. Axis, Hanwha, and i-Pro all have options for you to run your own software/models on camera as well.
According to [1] the manufacturers claim 2% of the TOPS of a RTX 4090, and only 0.8% the power consumption.
$200 in prototype quantities [2] which is 12% the price of a 4090 - but perhaps the price drops when you order them in bulk?
They claim it compares favourably to an 'Nvidia Xavier NX' for image classification tasks, providing somewhat more FPS at significantly lower power consumption. 218 fps running YOLOv5m on 640×640 inputs.
They're completely silent about the amount of memory it has, but you can fit int8 YOLOv5m into about 20 MB so it'll certainly be an amount measured in megabytes rather than gigabytes.
Their target market is "CCTV camera that tracks cars and people" rather than "Run an LLM" or "train a network from scratch"
From what I know it doesn’t have memory but streams everything to the chip. So there’s no limit regarding the size of the the neural network (unlike Google Coral)
Hailo is one of the newer GPU startups, so not surprising that many people have not heard of them yet. So far price/performance/power consumption of the Hailo products seems to be filling a rater large gap between the Amba stuff that is very well suited for 1-4 camera streams in a typical SoC-based device, and the Jetson, which is really kind of overpriced and power hungry for a lot of video applications (at least IMO).
It’s a competitor to Google Coral (seems abandoned) and NVIDIA Jetson. I’ve been using it for more than a year and the hardware seems to be one of the best on the market. The software (how to actually do inference on the chip) is subpar though.
What are real life actual useful cases for this tech?
I can imagine in manufacturing: detecting defects or layout mismatch - that's one.
Is there any open source project that uses a image recognition library to achieve any useful task? All I've seen from board partners seem to at most provide very simple demos, where a box with label is drawn around an object. Who actually is using that information, how and for what?
I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK and still have a very hard time defending this tech in eyes of coworkers that only see this as a surveillance tech
Drawing bounding boxes is a common end point for demos, but for businesses using computer vision there is an entire world after that: on device deployment. This can be on devices like an NVIDIA Jetson (a very common choice), to Raspberry Pis to central CUDA GPU servers for processing large volumes of data (maybe connected to cameras over RTSP).
Note: There are many models that are faster and perform better than YOLOv5 (i.e. YOLOv8, YOLOv10, PaliGemma). Roboflow Inference that our ML team maintains has various guides on deploying models to the edge: https://inference.roboflow.com/#inference-pipeline
The system works quite simply: I start with an existing object detector and train it with a small (<100) number of manually labelled images. Then during inference, I move the scope's field of view using motor commands to put the center of the tardigrade at the center of the field of view.
This technology is very useful for doing long-term observations of tardigrades (so, useful for science).
That makes me want to revisit my previous idea: boiling soup spillage detector. I once had a google meeting with a cooking soup to keep an eye on it and thought, heck, that seems like a nice exercise for a visual detector finetune
https://frigate.video/
As for the rpi 5 combination, the power draw is relatively low. The whole thing clocks in at about 14 watts on an rpi 5 which allows us to run this platform off of a battery. With 26 tops, this setup can contend with the jetson xavier nx (21 tops, ~$500) and the jetson orin nano (40 tops, ~$500) for for a cost of around $170. Furthermore the cpu on the rpi 5 is generally more performant than the xavier nx.
Specifically, this is an excellent vision module for real-time object detection from multiple cameras if setup properly, while maintaining access to the prolific raspberry pi hardware ecosystem which is typically cheaper than the jetson ecosystem.
0. https://news.ycombinator.com/item?id=40492515
Do you have any suggestions on how to proceed further? So far, I've procured a Jetson, five cameras, a stand to fix and calibrate the modules, and a cam array hat to equip four cameras and the jetson. I was checking out VPU and NPUS and other hardware as well but struggling to identify compatible hardware. How can I get ahead and build such model to test and validate in 3 Months of time ?
At a hotel, we had a problem of luggage carts going missing. There’s a few ways to deal with that. A generic one that would support other, use cases would be to let the camera tell you the last room it went in. Likewise, outdoor cameras might tell you which vehicles had a customer walk in the hotel and which might be non-guests.
I ought a Kinect but never got it working correctly with a Mac. What is the state of the art here ? Is there active development anywhere ?
It’s a detailed breakdown of a technically impressive project and your main takeaway is the 5 seconds in the demo where a guy covers his face?
Kudos to the author for making something neat and sharing it.
Crazy, right?
Perhaps with a single camera you could port this to fit on a Zynq 7000 footprint with something like a Pynq Z1 or Numato Styx, which are in the $250 hobbies price point for example.
The Zynq Z7030+ chip would probably be able to get the job done but aren't as common - https://www.lcsc.com/product-detail/Programmable-Logic-Devic.... The Kintex 410T-480T are available new <$160 from reputable vendors, used <$50 from less-reputable vendors - they do have the performance (and overall IO) for this task.
https://www.lcsc.com/product-detail/Programmable-Logic-Devic...
(420T/480T are priced similarly new.)
Didn't have enough time to dive into it yet and still working on some other project, but this still tickles the back of my head and would be cool even if I could only run mnist on it.
- [1] https://github.com/tinygrad/tinygrad/
To use it with an FPGA accelerator you also have to build all the "hardware" to run said openCL kernel efficiently, manage data transfer, talk to the host, etc for the FPGA. This is very foreign if you're only used to doing software design and still very nontrivial even if you've done FPGA work, though I think there are some open hardware projects around doing this.
Its hard to handle multiple data streams of video, and I think I maxed out around 20 camera feeds before the computer slowed to a crawl. NVME storage and better internet would defiantly push the limit on what is possible, but unfortunately it is impossible to get good internet where I live. Also, cameras are not cheap and neither is flash storage. But I do know this stuff sees use in both commercial and I bet they use some better version for the defense industry.
I'm thinking of an external camera (weather resistant) and hardware. The hardware could be a small computer that connects to the camera (maybe with wifi?), and runs the YOLO model.
Does anyone have a comparison between Hailo and, say, a mid or high-end GPU or a TPU?
$200 in prototype quantities [2] which is 12% the price of a 4090 - but perhaps the price drops when you order them in bulk?
They claim it compares favourably to an 'Nvidia Xavier NX' for image classification tasks, providing somewhat more FPS at significantly lower power consumption. 218 fps running YOLOv5m on 640×640 inputs.
They're completely silent about the amount of memory it has, but you can fit int8 YOLOv5m into about 20 MB so it'll certainly be an amount measured in megabytes rather than gigabytes.
Their target market is "CCTV camera that tracks cars and people" rather than "Run an LLM" or "train a network from scratch"
[1] https://hailo.ai/products/ai-accelerators/hailo-8-ai-acceler...
[2] https://up-shop.org/hailo-m2-key.html