Readit News logoReadit News
eurekin · a year ago
After years of wondering, I have to ask.

What are real life actual useful cases for this tech?

I can imagine in manufacturing: detecting defects or layout mismatch - that's one.

Is there any open source project that uses a image recognition library to achieve any useful task? All I've seen from board partners seem to at most provide very simple demos, where a box with label is drawn around an object. Who actually is using that information, how and for what?

I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK and still have a very hard time defending this tech in eyes of coworkers that only see this as a surveillance tech

zerojames · a year ago
Great question! I work for a computer vision company (Roboflow) and have seen computer vision used for everything from accident prevention on critical infrastructure to identifying defects on vehicle parts to detecting trading cards for use in video game applications.

Drawing bounding boxes is a common end point for demos, but for businesses using computer vision there is an entire world after that: on device deployment. This can be on devices like an NVIDIA Jetson (a very common choice), to Raspberry Pis to central CUDA GPU servers for processing large volumes of data (maybe connected to cameras over RTSP).

Note: There are many models that are faster and perform better than YOLOv5 (i.e. YOLOv8, YOLOv10, PaliGemma). Roboflow Inference that our ML team maintains has various guides on deploying models to the edge: https://inference.roboflow.com/#inference-pipeline

alandarev · a year ago
Can you go into some examples?
dekhn · a year ago
I use object detection to track tardigrades in my custom motorized microscope. It's very useful for making long observations in a field much larger than the scope's field of view.

The system works quite simply: I start with an existing object detector and train it with a small (<100) number of manually labelled images. Then during inference, I move the scope's field of view using motor commands to put the center of the tardigrade at the center of the field of view.

This technology is very useful for doing long-term observations of tardigrades (so, useful for science).

eurekin · a year ago
Thank you! Is the detection accurate enough, or simply the observation conclusions are not that sensitive to minor errors?

That makes me want to revisit my previous idea: boiling soup spillage detector. I once had a google meeting with a cooking soup to keep an eye on it and thought, heck, that seems like a nice exercise for a visual detector finetune

VTimofeenko · a year ago
Frigate uses models like this one for NVR:

https://frigate.video/

QueenAdrielle · a year ago
This hailo 8l is extremely useful in mobile robotic systems particularly when paired with an rpi 5. The main board used in this demo is.... not particularly useful however. Generally speaking boards like fpga miss the forest for the trees. As a systems engineer, where am I supposed to put such big board? Also I can expect that, even if my team incorporated this board into a mobile system, it become vaporware well before we deploy anything due to low production numbers and we'd be paying ebay scalpers 2x as much because our distributors wouldn't carry them anymore lol.

As for the rpi 5 combination, the power draw is relatively low. The whole thing clocks in at about 14 watts on an rpi 5 which allows us to run this platform off of a battery. With 26 tops, this setup can contend with the jetson xavier nx (21 tops, ~$500) and the jetson orin nano (40 tops, ~$500) for for a cost of around $170. Furthermore the cpu on the rpi 5 is generally more performant than the xavier nx.

Specifically, this is an excellent vision module for real-time object detection from multiple cameras if setup properly, while maintaining access to the prolific raspberry pi hardware ecosystem which is typically cheaper than the jetson ecosystem.

daemonologist · a year ago
I'm working on a project that detects climbing holds and lets you set routes by selecting them. (The usual method is putting a bit of colored tape on each hold, where the color corresponds to a route. This works great but becomes difficult to read once more than four or five routes share a hold.) YOLO made the computer vision part of this pretty smooth sailing.
adolph · a year ago
Embed the holds with a led and ir sensor a la swift concert [0] and you’ve got the whole package.

0. https://news.ycombinator.com/item?id=40492515

mechagodzilla · a year ago
As you guessed, high-speed machine vision stuff is frequently used in manufacturing settings for sorting or various quality control tasks. Imagine a picking out bad potatoes on a conveyor belt moving 10s of potatoes per second, or identifying particle counts and size distributions in a stream of water to gauge water quality.
eurekin · a year ago
Plus, being a NN it might be possible to detect a foreign object with relative ease (comparing to the classic computer vision); like a rat
sachin9 · a year ago
I've been very persistent over the past few months in developing a system for agriculture as a primary use case. I want to deploy features to classify crop type, height, vegetation stage, and other important metrics to achieve real-time or near real-time analytics.

Do you have any suggestions on how to proceed further? So far, I've procured a Jetson, five cameras, a stand to fix and calibrate the modules, and a cam array hat to equip four cameras and the jetson. I was checking out VPU and NPUS and other hardware as well but struggling to identify compatible hardware. How can I get ahead and build such model to test and validate in 3 Months of time ?

mmcwilliams · a year ago
I've used YOLOv5 models in robotics for object detection. While VLMs are great at describing images more generally, having a bounding box of a detection with a confidence score is very useful when paired with depth cameras for locating objects in an open environment. Especially when it can be run on-board and at framerate.
nickpsecurity · a year ago
Field mice that I thought were moles have destroyed my yard. There’s so many tunnels that I can’t tell which are most active. A camera AI that can show which parts of the ground changed significantly would be nice.

At a hotel, we had a problem of luggage carts going missing. There’s a few ways to deal with that. A generic one that would support other, use cases would be to let the camera tell you the last room it went in. Likewise, outdoor cameras might tell you which vehicles had a customer walk in the hotel and which might be non-guests.

euroderf · a year ago
> I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK

I ought a Kinect but never got it working correctly with a Mac. What is the state of the art here ? Is there active development anywhere ?

yazzku · a year ago
To surveil people in the streets.
eurekin · a year ago
Yeah, that's what I'm afraid
globalnode · a year ago
Some of the people didnt look too happy about you filming them, then you went and put them online for the world to see, classy.
bigyikes · a year ago
Breaking: people in public don’t want to be seen by the public.

It’s a detailed breakdown of a technically impressive project and your main takeaway is the 5 seconds in the demo where a guy covers his face?

Kudos to the author for making something neat and sharing it.

shaky-carrousel · a year ago
Breaking: people in public don't want to be recorded and uploaded on the internet for millions to see.

Crazy, right?

yazzku · a year ago
A good point that shouldn't be dismissed so quickly. This tracking also has little use beyond surveillance, so one has to wonder what is it that the author thinks they are doing, or why they think it's interesting or useful. That they then go ahead and film a crowd without their consent is more telling of their position on this moral question than it is any direct harm to the people in the video.
alandarev · a year ago
That's how they score gov contracts - the safest and quickest way to get rich. And oh boy do governments love control
IncreasePosts · a year ago
Oh, the horror. Holy shit was that Jason walking by? He told me he was in Alaska...I'm going to need to have a talk with him. Thank goodness I ran across this video.
zimpenfish · a year ago
That looked interesting as a self-hostable project until it got to the requirement for a $3200 AMD board. Maybe the price will come down one day...
DoingIsLearning · a year ago
The ultrascale is definitely pricy for non-industry applications. The FPGA design is probably larger because of the 4x camera pipeline.

Perhaps with a single camera you could port this to fit on a Zynq 7000 footprint with something like a Pynq Z1 or Numato Styx, which are in the $250 hobbies price point for example.

15155 · a year ago
No chance does the Zynq 701x series perform capably for this task: it's weak Artix fabric and there's only ~50kLUT on the common parts. Even fully pipelined designs, especially those that interface with the hard AXI bus, will struggle to clock above 50MHz.

The Zynq Z7030+ chip would probably be able to get the job done but aren't as common - https://www.lcsc.com/product-detail/Programmable-Logic-Devic.... The Kintex 410T-480T are available new <$160 from reputable vendors, used <$50 from less-reputable vendors - they do have the performance (and overall IO) for this task.

sorenjan · a year ago
Depending on what you're trying to do you might not need it. A popular option for object detection people use with their home hosted video surveillance (using Frigate and Home Assistant) is the Coral TPU and Raspberry Pi or some decommissioned thin client.
zimpenfish · a year ago
I did consider the Intel Neural Compute Stick to accelerate with OpenVINO but they're discontinued and it turns out I can get away with doing fewer detections (birds, not people) by doing a pre-filter of motion detection (reduces the number of frames through OpenCV's DNN by 10x).
muxamilian · a year ago
Google Coral seems abandoned by Google. Not official but last news on their page from May 2022
15155 · a year ago
~400kLUT is very obtainable these days - if you don't need the ARM cores:

https://www.lcsc.com/product-detail/Programmable-Logic-Devic...

(420T/480T are priced similarly new.)

dailykoder · a year ago
At the moment I am wondering if I could build an accelerator for george hotz' tinygrad[1] with cheap FPGAs (i do have an arty a7 35K, this might be too small, i guess?). According to the readme it shall be "easy" to add new accelerator hardware. Sadly my knowledge is still a bit limited around all the python-machinelearning-ecosystem, but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.

Didn't have enough time to dive into it yet and still working on some other project, but this still tickles the back of my head and would be cool even if I could only run mnist on it.

- [1] https://github.com/tinygrad/tinygrad/

gh02t · a year ago
> but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.

To use it with an FPGA accelerator you also have to build all the "hardware" to run said openCL kernel efficiently, manage data transfer, talk to the host, etc for the FPGA. This is very foreign if you're only used to doing software design and still very nontrivial even if you've done FPGA work, though I think there are some open hardware projects around doing this.

shrubble · a year ago
What is the ultimate delivery after all this work? Did it correlate/track the same people across multiple video feeds, for instance?
gte525u · a year ago
It's line speed processing of multiple cameras in HW - it should be less power consumption than equivalent GPU or Jetson.
tonetegeatinst · a year ago
Opencv, YOLO, and darknet were some of the best tools back when I was doing research projects with this. While it is way more efficient compared to say feeding it into a RAG LLM.....it is process intensive.

Its hard to handle multiple data streams of video, and I think I maxed out around 20 camera feeds before the computer slowed to a crawl. NVME storage and better internet would defiantly push the limit on what is possible, but unfortunately it is impossible to get good internet where I live. Also, cameras are not cheap and neither is flash storage. But I do know this stuff sees use in both commercial and I bet they use some better version for the defense industry.

_giorgio_ · a year ago
What's a good (production ready) setup?

I'm thinking of an external camera (weather resistant) and hardware. The hardware could be a small computer that connects to the camera (maybe with wifi?), and runs the YOLO model.

brk · a year ago
That already exists. Lilin is one of several CCTV companies implementing on-camera YOLO. Axis, Hanwha, and i-Pro all have options for you to run your own software/models on camera as well.
_giorgio_ · a year ago
Ok, thanks. I'll look into that. I hope that they don't require some specific abstruse format for the models!
scottapotamas · a year ago
Great writeup. Always a treat to see more high quality FPGA project postmortems, even if they aren’t using accessible parts/toolchains.
daghamm · a year ago
I am not familiar with this NN accelerator.

Does anyone have a comparison between Hailo and, say, a mid or high-end GPU or a TPU?

michaelt · a year ago
According to [1] the manufacturers claim 2% of the TOPS of a RTX 4090, and only 0.8% the power consumption.

$200 in prototype quantities [2] which is 12% the price of a 4090 - but perhaps the price drops when you order them in bulk?

They claim it compares favourably to an 'Nvidia Xavier NX' for image classification tasks, providing somewhat more FPS at significantly lower power consumption. 218 fps running YOLOv5m on 640×640 inputs.

They're completely silent about the amount of memory it has, but you can fit int8 YOLOv5m into about 20 MB so it'll certainly be an amount measured in megabytes rather than gigabytes.

Their target market is "CCTV camera that tracks cars and people" rather than "Run an LLM" or "train a network from scratch"

[1] https://hailo.ai/products/ai-accelerators/hailo-8-ai-acceler...

[2] https://up-shop.org/hailo-m2-key.html

muxamilian · a year ago
From what I know it doesn’t have memory but streams everything to the chip. So there’s no limit regarding the size of the the neural network (unlike Google Coral)
brk · a year ago
Hailo is one of the newer GPU startups, so not surprising that many people have not heard of them yet. So far price/performance/power consumption of the Hailo products seems to be filling a rater large gap between the Amba stuff that is very well suited for 1-4 camera streams in a typical SoC-based device, and the Jetson, which is really kind of overpriced and power hungry for a lot of video applications (at least IMO).
Y_Y · a year ago
Very cheap and power efficient if you're willing to run in int8/int4 and have unlimited time and patience for development
muxamilian · a year ago
It’s a competitor to Google Coral (seems abandoned) and NVIDIA Jetson. I’ve been using it for more than a year and the hardware seems to be one of the best on the market. The software (how to actually do inference on the chip) is subpar though.