Can somebody explain to me why they wouldn't just do this on the GPU? Isn't the GPU already designed to perform "hardware-accelerated image processing"?
For the same reason Google's TPU was an order of magnitude more efficient than Nvidia's previous "pure" GPUs, or why video codec accelerators are also faster and more efficient than GPUs for video decoding. The GPU is still pretty "general purpose" compared to a chip that only does image processing and not much else.
I hope Google will eventually reveal if the IPU shares any DNA with the TPU at all.
These developments are quite funny. At one point GPUs were just fixed function hardware. As more flexibility was needed for novel applications, programmability was added.
Now we are going to back to fixed function units as the calculus has changed in favour of power savings and against flexibility.
I'm interesting in seeing what advance in tech will change it again in favour of flexibility.
As I understand it the main advantage of the TPU is that it is built around an 8 bit pipeline. It's counter-intuitive to me, but apparently many deep learning NN can be flattened down to layers with only 256 states and still perform really well.
do you have an external source for this? a lot of analysts say the tpu is mostly to keep negotiation power against nvidia, that it's not really especially powerful or efficient
I'm a little bit confused too - all the operations they describe are usually quite fast on a GPU.
I wonder if it's not so much about being better at image processing, but having control and direct access to the hardware. For the Adreno, you have to go through Qualcomm's drivers and are subject to their limitations (there is freedreno which works great, but I don't think Qualcomm allows it on handsets). You're also stuck with the sizes of GPU that are available on Snapdragon SoCs - if you want a larger one, too bad.
Most SoCs have some kind of dedicated image processor these days as well as a GPU, even ARM's got in on the game with their own hardware designs. Unfortunately they tend to be pretty proprietary so not much good if you want to do your own image processing. As far as I can tell, the image processors traditionally have a more DSP-like architecture (small local buffers, instruction set optimised for efficient data transfer and processing, etc), but since it's proprietary it's hard to tell. Supposedly they're meant to be more power efficient than the alternatives.
(Which isn't surprising; GPUs are really designed for 3D rendering and they're definitely overkill for simpler tasks. The general assumption is that you're fetching small locally-contiguous groups of pixels from main RAM, doing texture mapping and computations on them, then conditionally blitting the result to other locally-contiguous areas of RAM. Most of the infrastructure used for this is going to waste if you're just using it to do 2D image processing.)
I was kinda feeling dumb reading this... SoC is "System On A Chip"... "A system on a chip or system on chip (SoC or SOC) is an integrated circuit (also known as an "IC" or "chip") that integrates all components of a computer or other electronic systems. It may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a single substrate."
Just want to add my thanks to the growing list. Kind of surprising that they didn't follow the best practices for abbreviations and have "SoC (System on a Chip)" at the first instance of the abbreviation.
Thanks. I was confused as well... most articles would have started off with the term and then used the acronym... i knew the context of what it was, but not knowing what it actually stood for was driving me nuts...
It's only useful as a persistent exploit vector if it has persistence. If I was designing this, it'd be simpler to have it boot off an image downloaded from the main SoC rather than giving it its own flash.
Good point - if it's just got firmware uploaded from the SoC, it's more of an escalation / memory protection bypass vector versus a persistence vector. I also wonder what gets sent across to it - for example, if a malicious video from the web could land on the coprocessor.
Mostly it's just fun seeing more and more co-processing devices arrive in phones as they get broken :)
> Google says the Pixel Visual Core is designed "to handle the most challenging imaging and machine learning applications"
I suspect they hope it will be the latter rather than the former. Design models at HQ, then distribute them to be trained on each user's data on their own phone. That means you need to use less bandwidth to transfer training data and models, which leaves more available to serve ads!
I suspect you are being slightly tongue in cheek about the bandwidth for ads (yay 5G ;-).) With that said, I think a key benefit of on-device learning is the privacy angle. The Pixel 2's always-on song identification is said to be done on the phone without your audio data being sent to Google. Similarly the Google Clips camera apparently does its magic without sending image data to Google. With devices having less to differentiate on privacy is increasingly visible in the marketing arena particularly for devices that seem to be watching or listening at all times.
Also, what if you replace the ML model with something more nefarious? Like have the model classify things incorrectly. Or if a target face is in the picture then send a request to a server of your choosing.
So maybe Google wants to build a device that needs image processing, but the (initial) volume can't justify a full-on custom ASIC. (And for some other reason -- power, space, ... -- this hypothetical device can't use an FPGA.)
Maybe find a device that does have a custom ASIC (e.g. Pixel 2) and add the image processing functionality to that ASIC. Then perhaps use same custom ASIC for both devices. As long as the extra functionality doesn't increase the cost of the original ASIC, problem solved.
>If Google ever set out to compete with Qualcomm's Snapdragon line, an IPU is something it could build directly into its own designs. For now, though, it has this self-contained solution.
If Google wanted to do this they would probably have to buy out QCOM and their patents.
Or at the very least pay a boatload of money in licensing fees. Qualcomm owns a huge chunk of IP in the mobile network space - they were responsible for the initial push behind CDMA back in the early 90s.
I hope Google will eventually reveal if the IPU shares any DNA with the TPU at all.
Now we are going to back to fixed function units as the calculus has changed in favour of power savings and against flexibility.
I'm interesting in seeing what advance in tech will change it again in favour of flexibility.
encoders: also much higher quality
I wonder if it's not so much about being better at image processing, but having control and direct access to the hardware. For the Adreno, you have to go through Qualcomm's drivers and are subject to their limitations (there is freedreno which works great, but I don't think Qualcomm allows it on handsets). You're also stuck with the sizes of GPU that are available on Snapdragon SoCs - if you want a larger one, too bad.
(Which isn't surprising; GPUs are really designed for 3D rendering and they're definitely overkill for simpler tasks. The general assumption is that you're fetching small locally-contiguous groups of pixels from main RAM, doing texture mapping and computations on them, then conditionally blitting the result to other locally-contiguous areas of RAM. Most of the infrastructure used for this is going to waste if you're just using it to do 2D image processing.)
Deleted Comment
Deleted Comment
It was mandatory for REDCINE-X Pro but since RED has implemented GPU support it’s also not really necessary.
Mostly it's just fun seeing more and more co-processing devices arrive in phones as they get broken :)
I suspect they hope it will be the latter rather than the former. Design models at HQ, then distribute them to be trained on each user's data on their own phone. That means you need to use less bandwidth to transfer training data and models, which leaves more available to serve ads!
https://en.wikipedia.org/wiki/Dark_silicon
Maybe find a device that does have a custom ASIC (e.g. Pixel 2) and add the image processing functionality to that ASIC. Then perhaps use same custom ASIC for both devices. As long as the extra functionality doesn't increase the cost of the original ASIC, problem solved.
If Google wanted to do this they would probably have to buy out QCOM and their patents.