Nvidia releases new AI chip with 480GB CPU RAM, 96GB GPU RAM

They also have a turnkey product with 256 of these things.

1 exaflop + 144TB memory

https://nvidianews.nvidia.com/news/nvidia-announces-dgx-gh20...

jwr · 2 years ago

As someone who lived through the first wave of supercomputers (I worked with Cray Y-MP models), it makes me very happy to see the second wave. For a while I thought supercomputing was dead and we would just be connecting lots of PCs with a network and calling that "supercomputers".

I still remember how my mind was blown when I first learned that all of the memory in a Cray Y-MP was static RAM. Transistor-based flip-flops: extremely power hungry, but also very fast. Another way of looking at it is that all of its RAM was what we call "cache".

This, finally, looks like a supercomputer.

com2kid · 2 years ago

SRAM is so stupid fun to play with.

All of a sudden you don't care so much about the inefficiencies of walking linked lists or trees. When everything is "already in cache", you can worry less about cache efficient algorithms!

1 cycle memory access latency is one of the reasons why tiny embedded MCUs can do things with a fraction of the MHZ of their larger counterparts.

Now days of course it is all about tons of memory, tons of bandwidth, craptons of compute, and planning the flow of data ahead of time.

bigbillheck · 2 years ago

> the first wave of supercomputers (I worked with Cray Y-MP models)

The Y-MP came out in 1988, sixteen years after CRI was founded, which itself was several years after the CDC6600.

bigyikes · 2 years ago

I watched Jensen’s announcement for this.

He calls it the worlds largest GPU. It’s just one, giant compute unit.

Unlike super computers, which are highly distributed, Nvidia says this is 140 TERABYTES of UNIFIED MEMORY.

My mind still gets blown just thinking about it. My poor desktop GPU has 4 gigabytes of memory. Heck, it only has 2 terabytes of storage!

coolspot · 2 years ago

It may be presented as seamless unified memory, but it isn’t. Underlying framework still has to figure out how to allocate your data to minimize cross-unit talk. Each unit has independent CPU, GPU and (V)RAM, but units are interconnected via very fast network.

XCSme · 2 years ago

I think distributed computing will go away soon. As computers become more powerful, the cost of "distributing" and transferring the data would be more than simply executing everything locally. Yes, you can still split the task amongst different nodes, or give different problems to different nodes, but the use-case would mostly be solving distinct problems on each note, that splitting the same task across multiple computers.

Also, with quantum computers, the parallelization/"distribution" of tasks will be done within the same machine, as it can try all solutions and the same time without having to do divide-et-impera algorithms.

Also, in the future, the algorithms will be a lot simpler, and just have FPGA-s like AI chips, where there is no software, the model is directly modelled in the hardware, so each computation is instant (just the time it takes to propagate the electrons or light through the circuit).

sliken · 2 years ago

What's old is new again. This is basically an updated arm version of the itanium based SGI Altix.

Keep in mind unified does not mean uniform, the ram is distributed across all the GPUs.

Deleted Comment

bushbaba · 2 years ago

There’s usecases beyond just ML. Sap Hana could theoretically run on this with greater performance. Same goes for a database. Scaling vertically solves a lot of challenges with distributed ledgers.

markus_zhang · 2 years ago

Is it similar to the mainframe in concept?

jasonjayr · 2 years ago

I had to look twice at that image, I thought it was a 2 rack-unit device, But, no, it's 24 full 42U racks!!

kristopolous · 2 years ago

it's just a rendering. I presume nvidia wouldn't be announcing something that they haven't made and confirmed, I wonder why they chose that image.

Is it just they haven't done the molding of a production installation? Is it possible that their internal instances might not be that presentable?

coherentpony · 2 years ago

> 1 exaflop

To be clear, this is floating point quarter-precision operations when using the FP8 tensor core arithmetic unit [1].

[1] https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-su...

pezezin · 2 years ago

Came here to post exactly the same link. Not just that, but 1 exaflops of sparse FP8.

In comparison, Frontier is 1 exaflops of dense FP64. Try to run this Nvidia system as dense FP64, and it performance will degrade two orders of magnitude.

Don't get me wrong, the machine is really impressive, but the advertisement is quite misleading.

mirekrusin · 2 years ago

Oh my, time to upgrade my Pi 4 Model B.

weinzierl · 2 years ago

If you upgrade to a Jetson you get GPU power and you can keep the form factor, win - win.

RosanaAnaDana · 2 years ago

I want to see Linus play Doom Eternal on it.

ChuckNorris89 · 2 years ago

Is Crysis no longer a thing?

MichaelZuo · 2 years ago

That could probably all fit in a single semi-trailer. It's amazing how dense computation is getting.

fennecfoxy · 2 years ago

Doesn't this basically shoot up the list of the TOP500 then? Wonder if they offer a >256 they could be top of the list, easy.

mk_stjames · 2 years ago

TOP500 uses FP64 performance in ranking. nVidia's 1 exaflop claim is the ~4 petaflops of FP8/INT8 * 256. FP64 performance of modern nvidia gpu's is actually far, far less. The ratio to FP32 isn't even 2:1 anymore (not since Pascal I think) since they realize most machine learning is done with FP32 or less.

64 bit (or 'double precision') is still king in the HPC world though, as it is what you will find in large numerical solutions in fields like computation fluid dynamics, nuclear physics, etc.

This may be a naive question, in "crypto" we saw a shift from GPUs to ASICs as it was more efficient to design and run chips specifically for hashing. Will we see the same in ML, will there be a shift to ASICs for training and inference of models?

Apple already have the "neural" cores, is that more or less what they are?

Could there be a theoretical LLM chip for inference that is significantly cheaper to run?

anonylizard · 2 years ago

Inference is mostly just matrix multiplications, so there's plenty of competitors.

Problem is, inference costs do not dominate training costs. Models have a very limited lifespan, they are constantly retrained or obsoleted by new generations, so training is always going on.

Training is not just matrix multiplications, given hundreds of experiments in model architecture, its not even obvious what operations will dominate future training. So a more general purpose GPU is just a way safer bet.

Also, LLM talent is in extreme short supply, and you don't want to piss them off by telling them they have to spend their time debugging some crappy FPGA because you wanted to save some hardware bucks.

conjecTech · 2 years ago

The more general the model, the longer the lifetime. And the most impactful models today are incredibly general. For things like Whisper, I wouldn't be surprised if we're already at 100:1 ratio for compute spent on inference vs training. BERT and related models are probably an order of magnitude or two above that. Training infra may be a bottleneck now, but it's unclear how long it will be until improvements slow and inference becomes even more dominant.

Capital outlays are tied to the derivative of compute capacity, so even if training just flatlines, hardware spend will drop significantly.

samvher · 2 years ago

What would be the set of skills that would put you in the category of LLM talent that is in extreme short supply?

Just curious what the current bar is here and which of the LLM-related skills might be worth building.

JonChesterfield · 2 years ago

Lots of companies are doing ASICs for machine learning. Off the top of my head, Graphcore, Cerebras, Tenstorrent, Wave. This site claims there are 187 of them (which seems unlikely) https://tracxn.com/d/trending-themes/Startups-in-AI-Processo.... Google's TPU counts and there are periodic rumours about amazon and meta building their own (might be reality now, I haven't been watching closely).

As far as I can tell that gamble isn't work out particularly well for any of the startups but that might be money drying up before they've hit commercial viability. I know the hardware is pretty good for Graphcore, Cerebras and the software proving difficult.

SoapSeller · 2 years ago

Amazon have Inferentia[0] and Trainium[1]. You can use them today on AWS.

[0] https://aws.amazon.com/machine-learning/inferentia/

[1] https://aws.amazon.com/machine-learning/trainium/

bippingchip · 2 years ago

A lot of companies are indeed trying to build AI accelerator cards, but I would not necessarily call them ASICs in the narrow sense of the word, they are by necessity always quite programmable and flexible: NN workloads characteristics change much much faster than you can design and manufacture chips.

I would say they are more like GPUs or DSPs: programmable but optimised for a specific application domain, ML/AI workloads in this case. Sometimes people call this ASIPs: application specific instruction set processors. While maybe not a very commonly used term, it is technically more correct.

foobiekr · 2 years ago

I have experience with companies doing their own chips. Often as not what seems like a good idea turns out not to be because your volume is low and your ability to get to high yield dominates and that both takes years and talent.

As a rule companies should only do their own chips if they are certain they can solve and overcome the cogs problems that low yield and low volume penalties entail. If not you are almost certainly better off just eating the vendor margin. It is very very unlikely that you will do better.

huijzer · 2 years ago

According to Ilya Sutskever in a podcast that I heard, GPUs are already pretty close to ASIC performance for AI workloads. Nvidia can highly optimize the full stack due to their economies of scale.

josephg · 2 years ago

Right. As I understand it, training on gpus isn’t limited by the speed of matrix multiplications. It’s limited by memory bandwidth. So a faster ASIC for matrix operations won’t help. It’ll just sit idle while the system stalls waiting for data to become available.

That’s why having 96gb (with another 480gb or whatever) available via high speed interconnect is a big deal. It means we can train bigger models faster.

BobbyJo · 2 years ago

That shift won't happen until research slows down. Nobody wants to invest significant amounts of money in hardware that will be obsolete in a year.

voxadam · 2 years ago

> Nobody wants to invest significant amounts of money in hardware that will be obsolete in a year.

When talking about the current ML industry it's more like nobody wants to invest significant amounts of money in hardware that will be obsolete before it's even taped out.

mechagodzilla · 2 years ago

What would be obsolete? The underlying operations are almost always just lots of matrix multiplies on lots of memory. Releasing a new set of weights doesn’t somehow change the math being done.

PUSH_AX · 2 years ago

I think this is already a thing.

https://en.wikipedia.org/wiki/Tensor_Processing_Unit

nynx · 2 years ago

These are “GPUs” in a pretty stretched sense. They have a lot of custom logic specifically for doing tensor operations.

cubefox · 2 years ago

Which isn't very informative when it isn't clear how close they are to TPUs, which are ASICs.

fxtentacle · 2 years ago

The main benefit of using modern GPUs is that they have large high-bandwidth memory. You need to put a lot of work into optimizations to reach >10% of the peak compute capability in practical use. That means an ASIC won't eliminate the performance bottleneck.

xxs · 2 years ago

ASICs did take on bitcoin, but not on Ethereum. The ASICs provide an optimized compute unit but they do suck when it comes to memory addressing.

Effectively it'd require the entirely memory controller and the cache, and scheduling. At point point you got most of the GPU w/ a stuck, non-programmable interface of a designated compute. Likely you'd never have to compete for advanced nodes as well.

58x14 · 2 years ago

I worked with an electrical engineer who disclosed to me a mid 8 figures investment made by a PE firm to develop Ethereum FPGA and ASIC hardware, and while he told me they were underwater on that deal, they did achieve (IIRC) 30%+ performance (watt to flop) relative to (at the time) top-of-the-line GPUs.

I wonder what they’re doing with that hardware now.

paulmd · 2 years ago

ethereum ASICs existed - at least a few were publicly known (antminer and other brands had a few) but others likely existed on the down-low as well. They were never about optimized compute but rather a cost-optimized way to deploy a shitload of DDR/GDDR channels reliably at minimum cost.

Ethereum is designed to bottleneck on memory bandwidth (while being uncacheable) so at the end of the day the name of the game is how many memory channels can you slap onto a minimum-cost board. You won't drastically win on perf/w - but as mentioned by a sibling, 30-100% over a fully general-purpose gaming GPU is likely possible, because you don't have to have a whole general-purpose GPU sitting there idling (and it's not a coincidence that gaming GPUs were undervolted/etc to try and bring that power down - but you can't turn everything off). "ASIC-resistance" just means an ASIC is only 1-10x more efficient than a general-purpose device, so general-purpose hardware can still stay in the game. It doesn't mean ASIC-proof, you can still make ASICs and they still have at least some perf/w advantage.

However, if your ASIC costs $100 to get the same performance as a 3060 Ti, that's a huge win even if you only beat the perf/w by 50%. Particularly since your ASIC is likely way easier and more stable to deploy at scale, and doesn't require a host rig with at least a couple hundred bucks of computer gear to even turn on.

Only plebs were buying up GPUs from retailers or sniping websites, buying from ebay was for the chumpiest of chumps. Gangsters were buying them from the board partners a truckload at a time, true elites just pay someone to engineer an ASIC and do a small run of them. Eight-figures (as mentioned by a sibling) is plenty, a $50-75m run of ASICs is quite a lot of silicon even on a fairly modern node (and some mining companies were publicly known to be using TSMC 7nm and other very modern nodes). And when you invest that kind of money, you don't flash it around and scare the marks.

saynay · 2 years ago

We are already seeing chips for inference, really. It's how these models are getting into the consumer market. A lot of the big phones have an inference chip (tensor, neural core, etc), TV are getting them, most GPUs have some stuff dedicated for inference (DSS and superres).

lumb63 · 2 years ago

There are some industries that take advantage of FPGAs alongside CPU(s) to move some computations into hardware for speed gains while maintaining flexibility. Maybe something like that is possible. For examples, look at the Versal chip.

synthos · 2 years ago

Xilinx's AI has a dedicated AI accelerator in the versal so calling it taking advantage of FPGA isn't quite accurate. It's really another chip that happens to be copackaged with the FPGA

NickHoff · 2 years ago

My question here is about underlying fab capacity. This chip is made on TSMC 4N, along with the H100 and 40xx series consumer GPUs. I assume Nvidia has purchased their entire production capacity. I also assume that Nvidia is using that capacity to produce the products with the highest margins, which probably means the H100 and this new GH200. So when they release this new chip, does it mean effectively fewer H100s and 4090s? Or is that not how fabrication capacity works?

I'm asking because whenever I look at ML training in the cloud, I never see any availability - either for this architecture or the A100s. AWS and GCP have quotas set to 0, lambda labs is usually sold out, paperspace has no capacity, etc. What we need isn't faster or bigger GPUs, it's _more_ GPUs.

ac29 · 2 years ago

> This chip is made on TSMC 4N, along with the H100 and 40xx series consumer GPUs. I assume Nvidia has purchased their entire production capacity.

I dont know why you would assume that. Qualcomm has been using TSMC N4 since last year [1]. I'm sure there are other customers as well.

[1] https://www.anandtech.com/show/17395/qualcomm-announces-snap...

It sounds to me like the GH200 achieves more FLOPS per transistor. So, compute demand will be quicker satisfied via the GH200 than via "smaller" chips such as the H100.

Having said that, I don’t think we’re anywhere near some kind of equilibrium for AI compute. If chip supply would magically double tomorrow, then the large companies would buy it for their datacenters and have 100% utilization in a few weeks. They all want to train larger models and scale inference to more users.

rcme · 2 years ago

In addition to training larger models, I'm sure there are many use cases that AI could serve that are currently cost prohibitive due to the cost of running inference.

danielmarkbruce · 2 years ago

I'd like bigger GPUs. A trillion parameter model at 16 bits needs 2000gb+ for inference, more for training. All kinds of things can be done to spread it across multiple GPUs, downsize to less bits etc, but it's a lot easier to just shove a model on one GPU.

We'll likely see more efficiency from bigger GPUs and hopefully more availability as a result.

mcbuilder · 2 years ago

TBH this is what all ML researcher / engineers have wanted for the past 10 years.

bob1029 · 2 years ago

> Or is that not how fabrication capacity works?

Fabs can run multiple complex designs on the same line simultaneously by sharing common tools. For example, photolithography tools can have their reticles swapped out automatically. Obviously, there is a cost to the context switching and most designs cannot be run on the same line as others.

Ultimately, the smallest unit of fabrication capacity is probably best measured along grain of the lot/FOUP (<100 wafers).

ksec · 2 years ago

The basic of Supply Chain and Supply and Demand, as you should have all witness during COVID for toilet rolls are the same.

Fab capacity is not that different to any other manufacturing. You just need to book those capacity way ahead of time. ( 6 - 9 months ) And that is also why I said 99% of news, or rumours about TSMC Capacity are pure BS.

So to answer your question. Yes, Nvidia will likely go for the higher margin products. One of the reason why you see Nvidia working with Samsung and Intel.

theincredulousk · 2 years ago

It's my understanding from friends in the business that the actual chips do not represent any capacity issue or bottleneck, it's actually manufacturing the devices that the chips are in (e.g. the finished graphics card).

NotSuspicious · 2 years ago

Why would this be the case? I would naively think that since the chips can only be made in a fab and the rest can be made basically anywhere that that wouldn't be true.

refulgentis · 2 years ago

That's...fascinating. There's enough space on TSMC but the PCB is the hard part?

Tepix · 2 years ago

I see A100 80GB cloud capacity available on both runpod.io and vast.ai currently.

You know I was wondering this the other day when NVDA's insane run up happened. I went down the road of trying to figure out if there was even enough silicon wafers, or if there even would be enough wafers in the next five years, to justify that price.

Unless all the planet does is make silicon wafers; no.

xadhominemx · 2 years ago

Well you figured wrong - NVDA AI GPUs are a very small % of global foundry supply, if even if volume tripled, they will still be a small % of global foundry supply. NVDA’s revenue is high because their gross margins are extreme, not because their volume is high.

austinwade · 2 years ago

Can you go into more detail? So you're saying that at a 200 P/E ratio NVDA there isn't even enough wafer supply for NVDA to grow into that valuation even over 5 years?

I believe availability is low because the GPUs are too expensive so those that need to scale up use the older and much more affordable models.

tomschwiha · 2 years ago

I'm using Runpod and Datacrunch regulary and they seem to always have some available.

samwillis · 2 years ago

bingdig · 2 years ago

> Grace™ Hopper™

Can anyone with more legal knowledge share how they trademarked the name of Grace Hopper?

adsfgiodsnrio · 2 years ago

Leaving aside the legality, I find it tacky to use the names of dead people in advertisements. Grace Hopper did not endorse this product. We have no idea what she would have thought of Nvidia. Yet the lawyers are now fighting over the right to use her name and legacy to "create shareholder value".

The worst offender is Tesla, because I'm pretty sure he would have hated that company.

gruturo · 2 years ago

It's tacky but I don't think they are in any way implying an endorsement. Tesla, Ampere, Pascal, Volta, Kelvin, Turing (and quite a few more I can't remember) are all Nvidia architecture names, and are all named after historically important scientists (well, I have my reservations about Kelvin, but that's more personal opinion)

anaganisk · 2 years ago

it's just flattery, same as naming a road, Martin Luther King drive, Washington boulevard, etc. They may or may not have approved all this, but this just signifies that you want them to be remembered in your own way.

jrockway · 2 years ago

It's more fun when the people are alive to complain: https://en.wikipedia.org/wiki/Litigation_involving_Apple_Inc...

Who knew that one of the most profitable companies on Earth would get there by calling Carl Sagan a "butt head astronomer"!

Symmetry · 2 years ago

Trademark are always about use in a particular context. Apple has a a trademark on computers using the name "Apple" even though that's been a word for a food for centuries. And if you want produce a line of bulldozers and brand them "Apple Bulldozers" you can do that and get your own trademark on the use of the word "Apple" in that context.

voakbasda · 2 years ago

I really would like to see someone try this, but I do not think it will end well. Apple would find a way to squash the mark. They have sued many businesses that dared to use "Apple" in their clearly unrelated businesses, and I have no reason to believe that this case would be any different.

Practically speaking, trademarks cover whatever can be litigated successfully.

erk__ · 2 years ago

Interesting that you used Apple as an example since they have fought with Apple Corp (Beatles owned company) over it for years https://en.wikipedia.org/wiki/Apple_Corps_v_Apple_Computer

dathinab · 2 years ago

EDIT: To be clear I'm not a legal expert.

Trademarks are context specific and you can trademark "common terms" IF (and at lest theoretically only if) it's used in a very narrow use-case which by itself isn't confusable with the generic term.

The best example here is Apple which is a generic term but trademarked in context of phones/computer/music manufacturing (and by now a bunch of other things).

Through there had been an Apple music label with a bit of back and force of legal cases (and some IMHO very questionable court rulings) which in the end Ended by Apple buying that Label.

So theoretically it's not too bad.

Practically big companies like Apple, Nvidia and similar can just swamp smaller companies with absurd legal fees to force their win (AFIK this is Metas strategie because I honestly have no idea how they think the term Meta for data processing is trademarkable), to make it worse local curt have often shown to not properly apply the law in such conflicts if the other party is from an other country (one or two US states are infamous for very biased legal decision in this kind of cases).

So yeah at the core this aspect of the trademark system is not a terrible idea, but the execution is sadly often fairly lacking. And even high profile cases of trademark abuse often have no consequences if it's a "favorite big company". (For balance negative EU example do include Lego and it's 3d trademark and absurdly biased curt rulings, or Ferrero and it's Kinder (german. Children) trademark on Chocolate).

EDIT: also not the two TM: Grace™ Hopper™ both Grace and Hopper are generic terms you can under some circumstances trademark and then use together, but while probably legal you would likely want to avoid trademarking (Grace Hopper)™

adolph · 2 years ago

It looks like two separate trademarks.

https://en.wikipedia.org/wiki/Salami_slicing_tactics

jabl · 2 years ago

I think it's more due to this device combines a "Grace" CPU and a "Hopper" GPU, thus creating a "Grace Hopper" superchip.

Something to do with Grace Hopper being an actual person (although she is deceased) and thus not being to trademark the entire name?

Hopper is the name of the GPU architecture, and Grace is the name of the CPU. Combining them in a device gets you a "Grace Hopper" superchip.

(And yes, I'd guess the codenames where chosen back in the day with an eye towards combining them in the same device.)

crazypython · 2 years ago

"™" has no legal meaning. "(R)" means a registered trademark.

HWR_14 · 2 years ago

That is not true. (TM) has a legal meaning. It's weaker than an (R), but it is still an enforceable trademark.

It's similar to creating a work covered by copyright vs. registering it with the copyright office.

balls187 · 2 years ago

They didn’t.

Grace is a trade mark. Hopper is a trade mark.

Hence each term having it’s own TM.

Aardwolf · 2 years ago

What kind of CPU is the CPU part? The link doesn't tell. Is it like ARM or RISC V and can it run a general purpose OS like Linux?

Do you plugin in DDR memory somewhere for the 480GB, or is this already on the board?

EDIT: found answer to my own question in the datasheet: "The NVIDIA Grace CPU combines 72 Neoverse V2 Armv9 cores with up to 480GB of server-class LPDDR5X memory with ECC."

tromp · 2 years ago

Quoting from https://www.nvidia.com/en-us/data-center/grace-cpu-superchip...

> The NVIDIA Grace CPU Superchip uses the NVIDIA® NVLink®-C2C technology to deliver 144 Arm® Neoverse V2 cores and 1 terabyte per second (TB/s) of memory bandwidth.

> High-performance CPU for HPC and cloud computing Superchip design with up to 144 Arm Neoverse V2 CPU cores with Scalable Vector Extensions (SVE2)

> World’s first LPDDR5X with error-correcting code (ECC) memory, 1TB/s total bandwidth

> 900 gigabyte per second (GB/s) coherent interface, 7X faster than PCIe Gen 5

> NVIDIA Scalable Coherency Fabric with 3.2TB/s of aggregate bisectional bandwidth

> 2X the packaging density of DIMM-based solutions

> 2X the performance per watt of today’s leading CPU

This is about the "Grace Hopper Superchip", which has one of the Grace CPU's replaced with a GPU. Thus, 72 Neoverse V2 cores.

https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-su...

greggsy · 2 years ago

72 Neoverse V2 Armv9 cores.

Not sure how one interfaces with it, but it presumably runs an approved Linux distro, with a web server at best.

whatisyour · 2 years ago

it's a normal chip like your x64 chip.you install ARM variant of your linux distribution on it and run it natively.

source: I have one

chakintosh · 2 years ago

LTT posted a video a few days ago from Computex talking a bit in depth about it

tiffanyh · 2 years ago

Dumb questions ...

- Am I wrong in understanding this is a general purpose computer (with massive graphic capabilities)?

- And if so, what CPU is it using (an NVIDIA ARM CPU)?

- And what OS does it run?

zucker42 · 2 years ago

Correct, it's called the Grace Hopper superchip because it using Nvidia Grace CPU (which are ARM) and Nvidia Hopper GPU.

For OS, it will run some form of Linux. I'm not sure if the particular recommended build has been (or will be) publicly released.

ulrikhansen54 · 2 years ago

More powerful chips are great, but NVIDIA really ought to focus some of their best folks on ironing out some of the quirks of using their CUDA software and actually getting stuff to run on their hardware in a simpler manner. Anyone who's ever fiddled with various CUDA device drivers and lining up PyTorch & Python versions will understand the pain.

IceWreck · 2 years ago

The solution is to not install CUDA on your base system because you need multiple versions of CUDA and some of them are often incompatible with your distro provided GCC.

Here is what works for me:

- Nvidia drivers on base linux system (rpmfusion/fedora in my case)

- Install nvidia container toolkit

- Use a cuda base container image and run all your code inside podman or docker

codethief · 2 years ago

I admit it's been a while (2 years) since I last played with Nvidia/CUDA (on Jetson) and back then running CUDA inside Docker was still somewhat arcane, but in my experience, whatever the Nvidia documentation lays out works well until you want to 1) cut down on container image size (important for caches and build pipelines) and, to this end, understand what individual deb packages and libraries do, 2) run the container on a system different from the official Nvidia Ubuntu image.

Back then the docs were just awful. Has this really changed that much in recent times?

thangngoc89 · 2 years ago

Pytorch is the most painless one because everything is bundled in the wheel. Latest stable CUDA supported by PyTorch is 11.8 and I have been running it on a CUDA 12.0 machine because CUDA is backward compatible. Tensorflow on the other hands, requires compilation with the installed CUDA library and it’s truly a pain since I can’t change the machine’s CUDA version.

omgJustTest · 2 years ago

Hardware before software!

paulryanrogers · 2 years ago

ATI/AMD GPUs supposedly have great hardware, hamstrung by less-than-great software. In fact it's the lack of some software features making me hesitate to switch despite major cost savings.

indymike · 2 years ago

Is there some connection between Nvidia and Admiral Hopper's family that makes it ok to appropriate her identity for their product?

vinay427 · 2 years ago

As far as I can tell they have frequently used the names of famous scientists, such as Kepler, Fermi, Maxwell, Pascal, Turing, Ampere, and Ada Lovelace. This has existed long before Hopper.

justinclift · 2 years ago

> appropriate her identity

Hmmm, what's the difference between homage and appropriation for things like this?

Claiming a trademark.