I wish but tbh coffee is probably artificially cheaper than it really should be since larger corporations exploit local farms and effectively maintain local monopolies where farms have to sell to said corporations for a fraction of the price it's actually worth.
Ha, well, there is a disturbing reason why computer vision with ultra-cheap hardware is possible: countries all over the world are buying these by the billions in order to keep an eye on their citizens :-(
Big brother is enabling incredible economies of scale....
In the CV department, I recently ordered a cheap FPGA + ARM Cortex-M3 + 64 Mbit SRAM + 32 Mbit flash that does camera input and HDMI output. Like a budget Zynq for CV.
We work directly with vendors to perform low level optimization of deep learning, computer vision, and DSP workloads for dozens of architectures of microcontrollers and CPUs, plus exotic accelerators (neuromorphic compute!) and edge GPUs. This includes ESP32:
You can upload a TensorFlow, PyTorch, or JAX model and receive an optimized C++ library direct from your notebook in a couple lines of Python. It's honestly pretty amazing.
And we also have a full Studio for training models, including architectures we've designed specifically to run well on various embedded hardware, plus hardware-aware hyperparameter optimization that will find the best model to fit your target device (in terms of latency and memory use).
That's definitely not our intention: the output of all Community projects is by default Apache 2.0 licensed, unless the developer specifies a different one.
The community plan does have commercial use restrictions; it's designed for education, demos, and research. We have a pretty good presence in the academic community with tons of papers, code, and projects developed using our community version.
Here's a Google Scholar search showing a bunch of papers:
> As I've been really interested in computer vision lately, I decided on writing a SIMD-accelerated implementation of the FAST feature detector for the ESP32-S3 [...]
> In the end, I was able to improve the throughput of the FAST feature detector by about 220%, from 5.1MP/s to 11.2MP/s in my testing. This is well within the acceptable range of performance for realtime computer vision tasks, enabling the ESP32-S3 to easily process a 30fps VGA stream.
> Using this method, they were able to up-convert infrared light of wavelength around 1550 nm to 622 nm visible light. The output light wave can be detected using traditional silicon-based cameras.
> "This process is coherent—the properties of the input beam are preserved at the output. This means that if one imprints a particular pattern in the input infrared frequency, it automatically gets transferred to the new output frequency," explains Varun Raghunathan, Associate Professor in the Department of Electrical Communication Engineering (ECE) and corresponding author of the study published in Laser & Photonics Reviews.
The FAST feature detector is an algorithm for finding regions of an image that are visually distinctive, which can be used as a first step in motion tracking and SLAM (simultaneous localization and mapping) algorithms typically seen in XR, robotics, etc.
> Is there TPU-like functionality in anything in this price range of chips yet?
I think that in the case of the ESP32-S3, its SIMD instructions are designed to accelerate the inference of quantized AI models (see: https://github.com/espressif/esp-dl), and also some signal processing like FFTs. I guess you could call the SIMD instructions TPU-like, in the sense that the chip has specific instructions that facilitates ML inference (EE.VRELU.Sx performs the ReLU operation). Using these instructions will still take away CPU time where TPUs are typically their own processing core, operating asynchronously. I’d say this is closer to ARM NEON.
> Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE
> The FAST feature detector is an algorithm for finding regions of an image that are visually distinctive, …
Is that related to ‘Energy Function’ in any way?
(I ask because a long time ago I was involved in an Automated Numberplate Reading startup that was using an FPGA to quickly find the vehicle numberplate in an image)
> Is there TPU-like functionality in anything in this price range of chips yet?
Kendryte K210 supports 1x1 and 3x3 convolutions on the "TPU". It was pretty well supported in terms of software & documentation but sadly it hasn't become popular.
These days, you can easily find cheap RV1103 ("LuckFox"), BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards, all of which have some sort of basic TPU. Unfortunately, they are designed to be linux boards meaning that all TPU related stuff is extremely abstracted with zero under-the-hood documentation. So it's absolutely unclear whether their TPUs are real or faked with clever code optimizations.
> These days, you can easily find cheap RV1103 ("LuckFox"), BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards, all of which have some sort of basic TPU. Unfortunately, they are designed to be linux boards meaning that all TPU related stuff is extremely abstracted with zero under-the-hood documentation. So it's absolutely unclear whether their TPUs are real or faked with clever code optimizations.
They all have TPU in hardware, my team has been verifying and benchmarking them. Documentation is only available for the high-level C APIs to the libraries that a programmer is expected to use, and even that tends to be extremely lacking.
I wonder how hard it would be, presumably with some trade-off with detection windows, to use a few of these in parallel and process higher resolutions and frame rates?
Compared to ESP8266, there's generally pretty good ESP32 support for Rust, but you'll likely need to use in your C++ toolchain if you want to use the standard library. no-std in Rust for ESP32 isn't terrible in my experience, though, just not as fleshed out - particularly for hooking into components like wifi/networking and probably a camera as well.
Like the other commenter said, there's plenty of support for SIMD and asm in Rust.
You might ask around on a Rust embedded or Rust ESP32 chatroom before making the dive.
Maybe it's not the chip that it's too cheap. Maybe it's the coffee that's too expensive.
Deleted Comment
Ha, well, there is a disturbing reason why computer vision with ultra-cheap hardware is possible: countries all over the world are buying these by the billions in order to keep an eye on their citizens :-(
Big brother is enabling incredible economies of scale....
https://wiki.sipeed.com/hardware/en/tang/Tang-Nano-4K/Nano-4...
https://www.aliexpress.us/item/3256806880637138.html
Would any of the "retro" game/home computer firmwares fit in that FPGA? I find comparing capacity hard for stuff like that.
The S3 variant easily justifies the slight additional cost, given that it's easily faster by an order of magnitude or greater, having SIMD and an FPU.
https://github.com/espressif/esp-dl/tree/master/examples/fac...
https://edgeimpulse.com/ai-practitioners
We work directly with vendors to perform low level optimization of deep learning, computer vision, and DSP workloads for dozens of architectures of microcontrollers and CPUs, plus exotic accelerators (neuromorphic compute!) and edge GPUs. This includes ESP32:
https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu/espre...
You can upload a TensorFlow, PyTorch, or JAX model and receive an optimized C++ library direct from your notebook in a couple lines of Python. It's honestly pretty amazing.
And we also have a full Studio for training models, including architectures we've designed specifically to run well on various embedded hardware, plus hardware-aware hyperparameter optimization that will find the best model to fit your target device (in terms of latency and memory use).
The community plan does have commercial use restrictions; it's designed for education, demos, and research. We have a pretty good presence in the academic community with tons of papers, code, and projects developed using our community version.
Here's a Google Scholar search showing a bunch of papers:
https://scholar.google.com/scholar?start=0&q=%22edge+impulse...
We also have our own public sharing platform:
https://edgeimpulse.com/projects/overview
Previously you needed a crazy mixture of ML knowledge and low-level embedded engineering skills even to get started, which is not a common occurrence!
https://docs.edgeimpulse.com/docs/run-inference/cpp-library/...
> In the end, I was able to improve the throughput of the FAST feature detector by about 220%, from 5.1MP/s to 11.2MP/s in my testing. This is well within the acceptable range of performance for realtime computer vision tasks, enabling the ESP32-S3 to easily process a 30fps VGA stream.
What are some use cases for FAST?
Features from accelerated segment test: https://en.wikipedia.org/wiki/Features_from_accelerated_segm...
Is there TPU-like functionality in anything in this price range of chips yet?
Neon is an optional SIMD instruction set extension for ARMv7 and ARMv8; so Pi Zero and larger have SIMD extensions
Orrin Nano have 40 TOPS, which is sufficient for Copilot+ AFAIU. "A PCIe Coral TPU Finally Works on Raspberry Pi 5" https://news.ycombinator.com/item?id=38310063
From https://phys.org/news/2024-06-infrared-visible-device-2d-mat... :
> Using this method, they were able to up-convert infrared light of wavelength around 1550 nm to 622 nm visible light. The output light wave can be detected using traditional silicon-based cameras.
> "This process is coherent—the properties of the input beam are preserved at the output. This means that if one imprints a particular pattern in the input infrared frequency, it automatically gets transferred to the new output frequency," explains Varun Raghunathan, Associate Professor in the Department of Electrical Communication Engineering (ECE) and corresponding author of the study published in Laser & Photonics Reviews.
"Show HN: PicoVGA Library – VGA/TV Display on Raspberry Pi Pico" https://news.ycombinator.com/item?id=35117847#35120403https://news.ycombinator.com/item?id=40275530
"Designing a SIMD Algorithm from Scratch" https://news.ycombinator.com/item?id=38450374
> What are some use cases for FAST?
The FAST feature detector is an algorithm for finding regions of an image that are visually distinctive, which can be used as a first step in motion tracking and SLAM (simultaneous localization and mapping) algorithms typically seen in XR, robotics, etc.
> Is there TPU-like functionality in anything in this price range of chips yet?
I think that in the case of the ESP32-S3, its SIMD instructions are designed to accelerate the inference of quantized AI models (see: https://github.com/espressif/esp-dl), and also some signal processing like FFTs. I guess you could call the SIMD instructions TPU-like, in the sense that the chip has specific instructions that facilitates ML inference (EE.VRELU.Sx performs the ReLU operation). Using these instructions will still take away CPU time where TPUs are typically their own processing core, operating asynchronously. I’d say this is closer to ARM NEON.
> Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE
github.com/topics/simd: https://github.com/topics/simd
https://news.ycombinator.com/item?id=37805810#37808036
Is that related to ‘Energy Function’ in any way?
(I ask because a long time ago I was involved in an Automated Numberplate Reading startup that was using an FPGA to quickly find the vehicle numberplate in an image)
Email in bio and would love to chat!
Kendryte K210 supports 1x1 and 3x3 convolutions on the "TPU". It was pretty well supported in terms of software & documentation but sadly it hasn't become popular.
These days, you can easily find cheap RV1103 ("LuckFox"), BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards, all of which have some sort of basic TPU. Unfortunately, they are designed to be linux boards meaning that all TPU related stuff is extremely abstracted with zero under-the-hood documentation. So it's absolutely unclear whether their TPUs are real or faked with clever code optimizations.
They all have TPU in hardware, my team has been verifying and benchmarking them. Documentation is only available for the high-level C APIs to the libraries that a programmer is expected to use, and even that tends to be extremely lacking.
micropython seems pretty accessible from first glance. would it be easy to create a webassembly port of its code?
It can be used in PyScript for client side development. And has JavaScript/DOM bridge as well. https://pyscript.net/tech-preview/micropython/about.html
Like the other commenter said, there's plenty of support for SIMD and asm in Rust.
You might ask around on a Rust embedded or Rust ESP32 chatroom before making the dive.
If you are on Windows, you will need to place the project folder at the top level drive directory, and there are other quirks as well, but it works.