> Nvidia’s Fermi architecture was ambitious and innovative, offering advances in GPU compute along with features like high tessellation performance. However Terascale 2’s more traditional approach delivered better power efficiency.
Well AMD marketing turned out to be a joke, remember Poor Volta? I still think AMD haven't even recovered from that. There marketing for GPUs have been terrible since.
Vega's marketing pushed Nvidia to make what is ending up to be the best product series they will ever make: series 10. That isn't much of a joke, it scared the shit out of Nvidia, and they blinked.
Vega was too late in the pipeline to stop, and Raja was ultimately let go for his role in the whole thing. He refused to start making more gamer-friendly cards, and was obsessed with enterprise compute/jack of all trades cards.
Immediately afterwards was a pivot towards a split arch, allowing multiple teams to pursue their intended markets.
Its why AMD won against Nvidia. Nvidia still has no real answer to AMD's success, other than continuing to increase card prices and making ridiculously large chips that have poor wafer yields. Nvidia won't even have working chiplets until series 60 or 70, while AMD already has them in a shipping product.
At most points in the product stack, memory frequency increased by enough to compensate for the narrower bus. Dropping to a narrower bus and putting more RAM on each channel allowed for some 50% increases in memory capacity instead of having to wait for a doubling to be economical. And architecturally, the 4000 series has an order of magnitude more L2 cache than the previous two generations (went from 2–6MB to 24–72MB), so they're less sensitive to DRAM bandwidth.
Larger caches and compression as well as considerably higher memory clocks enable them to reduce the bus width whilst being able to hit the performance target.
Both the 2070 and 3070 have a memory bandwidth of 448GB/s the 4070 with its smaller bus has a memory bandwidth of 504GB/s.
I'll be honest: I find GPUs confusing. I use Hugging Face occasionally. I have no idea what GPU will work with what. How does Fermi, Kepler, Maxwell, Pascal, Turing, Ampere, and Hopper compare? How does the consumer version of each compare to the data center version? What about AMD and Intel?
* Arc A770 seems to provide 16GB for <$300, which seems awesome. Will it work for [X]?
* Older NVidia card go up to 48GB for about the cost of a modern 24GB card, and some can be paired. Will it work for [X] (here, LLMs and large resolution image generation require lots of RAM)?
I wish there was some kind of chart of compatibility and support.
Although you wouldn't know it from the documentation, both the GK10X/GK11X silicon had serious problems with the global memory barrier instruction that had to be fixed in software after launch. All global memory barriers had to be implemented entirely as patched routines, several thousand times slower than the underlying, broken silicon. Amusingly, that same hardware defect forced the L1 cache to be turned off on the first two keplers. I suspect if you ran the same benchmark on GK110 and vs the GK210 used in the article, you'd be surprised to see no effect from the L1 cache at all.
Create? Perhaps lack of IP/talent. The exynos chips have been lagging a bit behind last I checked. However, they are adding capacity to their fabs for AI chips, so it’s possible they may be planning one in the future.
This is not a good comparison. Nvidia doesn't have a fab, but they are the lead player in the AI chip space. Intel had both and look where it got them. TSMC has a good model, and you can basically take any of your designs for the same node and manufacture it in any of their plants. Same strategy can be applied to Samsung, and they already help a lot on the memory segment. The new HBM3E memory chips for H200s might be even coming from Samsung.
Intel was infected with marketing people who diseased the entire C-Suite and drained the company for 8 years without doing anything other than make up new marketing names for the 5000-11000 series of chips and their stagnant iGPUs. That level of thievery would kill any leading company. ..
Fermi was given the nickname "Thermi" for a good reason. AMD marketing had a field day: https://www.youtube.com/watch?v=2QkyfGJgcwQ
It didn't help that the heatsink of the GTX 480 resembled the surface of a grill: https://i.imgur.com/9YfUifF.jpg
Vega's marketing pushed Nvidia to make what is ending up to be the best product series they will ever make: series 10. That isn't much of a joke, it scared the shit out of Nvidia, and they blinked.
Vega was too late in the pipeline to stop, and Raja was ultimately let go for his role in the whole thing. He refused to start making more gamer-friendly cards, and was obsessed with enterprise compute/jack of all trades cards.
Immediately afterwards was a pivot towards a split arch, allowing multiple teams to pursue their intended markets.
Its why AMD won against Nvidia. Nvidia still has no real answer to AMD's success, other than continuing to increase card prices and making ridiculously large chips that have poor wafer yields. Nvidia won't even have working chiplets until series 60 or 70, while AMD already has them in a shipping product.
A 2060 has a 192-bit bus.
A 3060 has a 192-bit bus.
A 4060 has a 128-bit bus!
###
A 2070 has a 256-bit bus.
A 3070 has a 256-bit bus.
A 4070 has a 192-bit bus!
People could buy a 2nd hand 3070 for less money.
Both the 2070 and 3070 have a memory bandwidth of 448GB/s the 4070 with its smaller bus has a memory bandwidth of 504GB/s.
* Arc A770 seems to provide 16GB for <$300, which seems awesome. Will it work for [X]?
* Older NVidia card go up to 48GB for about the cost of a modern 24GB card, and some can be paired. Will it work for [X] (here, LLMs and large resolution image generation require lots of RAM)?
I wish there was some kind of chart of compatibility and support.
https://www.digitimes.com/news/a20231121VL206/samsung-electr...
CEO: Jim Keller.
https://en.m.wikipedia.org/wiki/GeForce_30_series
Interestingly the chips bigger than those found in the 3090 (so GA100s for A100s) were made by TSMC on a 7 nm node.
Maybe Samsung's yield was not high enough to produce those large chips (AD100 is 826 mmsq and would probably be even bigger on Samsung's node).
Dead Comment
Dead Comment