Kepler, Nvidia's Strong Start on 28 nm

scrlk · 2 years ago

> Nvidia’s Fermi architecture was ambitious and innovative, offering advances in GPU compute along with features like high tessellation performance. However Terascale 2’s more traditional approach delivered better power efficiency.

Fermi was given the nickname "Thermi" for a good reason. AMD marketing had a field day: https://www.youtube.com/watch?v=2QkyfGJgcwQ

It didn't help that the heatsink of the GTX 480 resembled the surface of a grill: https://i.imgur.com/9YfUifF.jpg

anvuong · 2 years ago

Well AMD marketing turned out to be a joke, remember Poor Volta? I still think AMD haven't even recovered from that. There marketing for GPUs have been terrible since.

DiabloD3 · 2 years ago

You mean Vega. Volta is a Nvidia arch.

Vega's marketing pushed Nvidia to make what is ending up to be the best product series they will ever make: series 10. That isn't much of a joke, it scared the shit out of Nvidia, and they blinked.

Vega was too late in the pipeline to stop, and Raja was ultimately let go for his role in the whole thing. He refused to start making more gamer-friendly cards, and was obsessed with enterprise compute/jack of all trades cards.

Immediately afterwards was a pivot towards a split arch, allowing multiple teams to pursue their intended markets.

Its why AMD won against Nvidia. Nvidia still has no real answer to AMD's success, other than continuing to increase card prices and making ridiculously large chips that have poor wafer yields. Nvidia won't even have working chiplets until series 60 or 70, while AMD already has them in a shipping product.

KennyBlanken · 2 years ago

....which is hilarious because later on, the 580 had the same TDP as the 1070ti, but half the performance.

dannyw · 2 years ago

On a related topic: does anyone know why NVIDIA keeps reducing the bus width on their latest gen cards?

A 2060 has a 192-bit bus.

A 3060 has a 192-bit bus.

A 4060 has a 128-bit bus!

###

A 2070 has a 256-bit bus.

A 3070 has a 256-bit bus.

A 4070 has a 192-bit bus!

wtallis · 2 years ago

At most points in the product stack, memory frequency increased by enough to compensate for the narrower bus. Dropping to a narrower bus and putting more RAM on each channel allowed for some 50% increases in memory capacity instead of having to wait for a doubling to be economical. And architecturally, the 4000 series has an order of magnitude more L2 cache than the previous two generations (went from 2–6MB to 24–72MB), so they're less sensitive to DRAM bandwidth.

justinclift · 2 years ago

Seems to have gone badly for the 4060 & 4060 Ti specifically though, as they're at the same performance level as the previous gen 3060.

People could buy a 2nd hand 3070 for less money.

noch · 2 years ago

With respect: Have you actually measured performance or are you merely quoting Nvidia marketing?

wmf · 2 years ago

Wafer prices have increased and so has Nvidia's greed so you get less hardware for your money every generation.

FirmwareBurner · 2 years ago

Pretty much. Also lack of real competition.

dogma1138 · 2 years ago

Larger caches and compression as well as considerably higher memory clocks enable them to reduce the bus width whilst being able to hit the performance target.

Both the 2070 and 3070 have a memory bandwidth of 448GB/s the 4070 with its smaller bus has a memory bandwidth of 504GB/s.

frognumber · 2 years ago

I'll be honest: I find GPUs confusing. I use Hugging Face occasionally. I have no idea what GPU will work with what. How does Fermi, Kepler, Maxwell, Pascal, Turing, Ampere, and Hopper compare? How does the consumer version of each compare to the data center version? What about AMD and Intel?

* Arc A770 seems to provide 16GB for <$300, which seems awesome. Will it work for [X]?

* Older NVidia card go up to 48GB for about the cost of a modern 24GB card, and some can be paired. Will it work for [X] (here, LLMs and large resolution image generation require lots of RAM)?

I wish there was some kind of chart of compatibility and support.

throwit12 · 2 years ago

Although you wouldn't know it from the documentation, both the GK10X/GK11X silicon had serious problems with the global memory barrier instruction that had to be fixed in software after launch. All global memory barriers had to be implemented entirely as patched routines, several thousand times slower than the underlying, broken silicon. Amusingly, that same hardware defect forced the L1 cache to be turned off on the first two keplers. I suspect if you ran the same benchmark on GK110 and vs the GK210 used in the article, you'd be surprised to see no effect from the L1 cache at all.

dist-epoch · 2 years ago

Samsung has a fab. Anyone knows why they don't want to enter the game and create an AI chip.

vGPU · 2 years ago

Create? Perhaps lack of IP/talent. The exynos chips have been lagging a bit behind last I checked. However, they are adding capacity to their fabs for AI chips, so it’s possible they may be planning one in the future.

https://www.digitimes.com/news/a20231121VL206/samsung-electr...

amelius · 2 years ago

GPU architecture is just a simple core repeated a gazillion times plus some memory bus.

graphe · 2 years ago

They are with tenstorrent. https://www.reuters.com/technology/samsung-manufacture-chips...

CEO: Jim Keller.

treesciencebot · 2 years ago

This is not a good comparison. Nvidia doesn't have a fab, but they are the lead player in the AI chip space. Intel had both and look where it got them. TSMC has a good model, and you can basically take any of your designs for the same node and manufacture it in any of their plants. Same strategy can be applied to Samsung, and they already help a lot on the memory segment. The new HBM3E memory chips for H200s might be even coming from Samsung.

systemBuilder · 2 years ago

Intel was infected with marketing people who diseased the entire C-Suite and drained the company for 8 years without doing anything other than make up new marketing names for the 5000-11000 series of chips and their stagnant iGPUs. That level of thievery would kill any leading company. ..

SoapSeller · 2 years ago

NVIDIA Ampere consumer line was manufactured in Samsung fabs:

https://en.m.wikipedia.org/wiki/GeForce_30_series

KeplerBoy · 2 years ago

They fabbed a lot of AI chips. Nvidia's Ampere chips up to 3090 were made by Samsung in 8 nm.

Interestingly the chips bigger than those found in the 3090 (so GA100s for A100s) were made by TSMC on a 7 nm node.

Maybe Samsung's yield was not high enough to produce those large chips (AD100 is 826 mmsq and would probably be even bigger on Samsung's node).

ClassyJacket · 2 years ago

Don't they already create the majority of the world's AI chips? Their GPUs?

m3kw9 · 2 years ago

Why are they going back to 28nm?

colechristensen · 2 years ago

This is a new article about the historical usage of 28nm.

m3kw9 · 2 years ago

I checked the title, the 28nm and the date of the article.