Readit News logoReadit News
jra101 commented on GPUs Go Brrr   hazyresearch.stanford.edu... · Posted by u/nmstoker
behnamoh · a year ago
tangential: When @sama talks about "Universal Basic Compute" (UBC) as a substitute for Universal Basic Income, obviously he means GPU, right? Who's going to benefit from such policies? Only nvidia? It just seems such a dystopian future to live in: imagine you can sell your UBC to others who know better how to use it, or you can use it to mine bitcoin or whatever. But all the compute is actually created by one company.

There are many reasons to hate nvidia, but honestly if this UBC policy is even remotely being considered in some circles, I'd join Linus Torvalds and say "nvidia, fuck you".

jra101 · a year ago
You're blaming NVIDIA for Sam Altman's dumb idea?

Deleted Comment

jra101 commented on Full screen triangle optimization   30fps.net/pages/twotris/... · Posted by u/rck
ttoinou · 3 years ago
Why didn't they ever implemented a rectangle primitive to be drawn instead of a triangle ? Anyway, here the perf impact is negligible
jra101 · 3 years ago
NVIDIA has an OpenGL extension that does just that [1].

[1] https://registry.khronos.org/OpenGL/extensions/NV/NV_fill_re...

jra101 commented on Microbenchmarking Intel’s Arc A770   chipsandcheese.com/2022/1... · Posted by u/pantalaimon
clamchowder · 3 years ago
(Author here) See https://github.com/clamchowder/Microbenchmarks/tree/master/G...

It's very much a work in progress, as noted in the article. And some of the stuff that worked reasonably well on my cards, like the instruction rate test when trying to measure throughput across the entire card, went down the drain when run on Arc.

jra101 · 3 years ago
Have you tried reducing the register count in your FP32 FMA test by increasing the iteration count and reducing the number of values computed per loop?

Instead of computing 8 independent values, compute one with 8x more iterations:

    for (int i = 0; i < count * 8; i++) {
        v0 += acc * v0; 
    }
That plus inlining the iteration count so the compiler can unroll the loop might help get closer to SOL.

jra101 commented on Tesla’s ‘phantom braking’ problem is getting worse   theverge.com/2022/6/3/231... · Posted by u/metadat
jra101 · 3 years ago
This happens to me just using the adaptive cruise control, no Autopilot or FSD enabled and it's super annoying. Can happen on a completely empty road driving in a straight line.
jra101 commented on Nvidia Unveils 144-Core Grace CPU Superchip   tomshardware.com/news/nvi... · Posted by u/manmal
The_rationalist · 3 years ago
There are significantly more sales of PS4 than switch, also nintendo live on its laurels but people will become tired of the stagnation eventually
jra101 · 3 years ago
PS4: ~117M sold in 9 years Switch: ~104M sold in 5 years

Switch is averaging 20M units/year and PS4 is averaging 13M units/year.

jra101 commented on Twitter makes it harder to choose the old reverse-chronological feed   theverge.com/2022/3/10/22... · Posted by u/Yaina
jra101 · 4 years ago
Thankful that Tweetdeck still works and has never tried to switch me to a non-chronological feed.
jra101 commented on Apple M1 Ultra   apple.com/newsroom/2022/0... · Posted by u/davidbarker
pphysch · 4 years ago
> This enables M1 Ultra to behave and be recognized by software as one chip, so developers don’t need to rewrite code to take advantage of its performance. There’s never been anything like it.

Since when did the average developer care about how many sockets a mobo has...?

Surely you still have to carefully pin processes and reason about memory access patterns if you want maximum performance.

jra101 · 4 years ago
They are referring to the GPU part of the chip. There are two separate GPU complexes on the die but from the software point of view, it is a single large GPU.
jra101 commented on Apple M1 support for TensorFlow 2.5 pluggable device API   developer.apple.com/metal... · Posted by u/dandiep
codelord · 4 years ago
Seems like AMD has been using Vega 20 to refer to two different things.

I was talking about the mobile GPU on MacBook Pros which is based on a 14nm chip. The full name is Radeon Pro Vega 20:

https://www.amd.com/en/graphics/radeon-pro-vega-20-pro-vega-...

https://www.techpowerup.com/gpu-specs/radeon-pro-vega-20.c32...

Vega 20 seems to also refer to a discrete GPU. This has been later rebranded to Radeon VII (maybe because of this confusion). The number you are quoting is for the discrete GPU.

jra101 · 4 years ago
Huh, I had no idea they used Vega 20 both as a codename and a product name. Confusing.
jra101 commented on Apple M1 support for TensorFlow 2.5 pluggable device API   developer.apple.com/metal... · Posted by u/dandiep
codelord · 4 years ago
M1 and AMD GPU support. I'm personally more interested in the latter as I haven't yet upgraded my MacBook Pro and I expect that my Vega 20 to be faster than M1 at ML training.

The raw compute power of M1's GPU seems to be 2.6 TFLOPS (single precision) vs 3.2 TFLOPS for Vega 20. This can give you an estimate of how fast it would be for training.

Just for reference Nvidia's flagship desktop GPU(3090)'s FP32 performance is 35.5 TFLOPS.

jra101 · 4 years ago
Vega 20 should be ~13.8 TFLOPs (single precision): https://www.anandtech.com/show/13923/the-amd-radeon-vii-revi...

u/jra101

KarmaCake day420August 5, 2009View Original