Readit News logoReadit News
mratsim · 2 years ago
Years ago I started a collection of convolution optimization resources: https://github.com/mratsim/laser/wiki/Convolution-optimisati...

Also checked and apparently Nvidia Cutlass now supports generic convolutions: https://github.com/NVIDIA/cutlass

epistasis · 2 years ago
Interesting article, thanks, IMHO mostly for the low level performance analysis.

When it comes to actual computation of convolutions, the fast Fourier transform should at least be mentioned, even if in passing. Early in grad school I peaked at the source for R's density() function, and was blown away that it was using FFT, and that I had not picked up that trick in my math classes (or maybe I had just forgotten it...)

For a 2d example:

https://stackoverflow.com/questions/50453981/implement-2d-co...

And a recent HN thread that was very good:

https://news.ycombinator.com/item?id=40840396

imtringued · 2 years ago
As cool as this is, I can't help but think how pointless the goal itself is.

XDNA 2 will have 12 TFLOPs, roughly matching the 96 core Threadripper Pro 7995WX at a much lower price point.

bee_rider · 2 years ago
These sort of computations generally just get fed bigger inputs as compute gets better.

Also, plenty of threadrippers exist out there already, if you get access to some cluster, it might have whatever type of chip in it. If I have access to a cluster with many 7995’s, I don’t really care too much about what’s available on the consumer side.

toxik · 2 years ago
ILP is instruction-level parallelism, if you had a hard time remembering like me.
SkiFire13 · 2 years ago
I was thinking of Integer Linear Programming when I saw the title. Just another example of why acronyms are bad.

Dead Comment