Nvidia Hopper Architecture In-Depth

The mention of "Confidential Computing support" and the word "secure" over and over makes me strangely nervous.

I have a sneaking suspicion that somewhere in an NSA datacenter there will be racks upon racks of these things running transformers over every text message and voice call coming from certain regions...

amluto · 3 years ago

“Confidential Computing” is an up-and-coming model in which a user (e.g. cloud customer) trusts a hardware vendor (e.g. Intel, AMD, nVidia), and that vendor has a mechanism by which systems with their hardware can run user code and attest, remotely, to the user that the code has not been tampered with and that the system’s owner (e.g. Amazon) is not exfiltrating user data.

This could be used for DRM or nefarious purposes, but that’s not the point.

Obviously this is no more secure than the underlying hardware. All the attacks against SGX, for example, will still apply unless they are fixed.

fulafel · 3 years ago

This kind of thing has a history, eg this was on top of the TPM platform: https://web.archive.org/web/20060822043633/http://www.hpl.hp...

I wonder what's different this time.

wfrew · 3 years ago

While I can totally imagine that being true, I imagine this is probably more a response to cloud providers wanting to subdivide each physical GPU into virtual/multi-tenant GPUs. They'll want to be able to provide strong security guarantees to customers who're renting compute on a multi-tenant GPU and are pushing their sensitive data onto it

coolspot · 3 years ago

It is perhaps less about NSA and more about DRM that should scare us in NVidia doing “confidential computing”.

FP8 is interesting. I wonder if training could ever be made to work in this format? I see they have two options for mantissa size which is interesting. But no unsigned option? Losing a whole bit to the sign seems unfortunate when you only have eight.

sbierwagen · 3 years ago

If you can train with 8 bit ints, then FP8 should be fine: https://arxiv.org/abs/2009.13108

A group at IBM has been claiming for years that you can train with as little as FP4: https://papers.nips.cc/paper/2020/file/13b919438259814cd5be8...

modeless · 3 years ago

There have been a lot of papers on techniques for training in low precision, even as low as binary. But the largest models that would benefit most aren't using those techniques, which indicates to me that they don't work as well as their paper abstracts suggest.

hughw · 3 years ago

Will AWS please rent me just one of these? I can't use 8 at a time. I need the fine granularity they offer with p3 instances.

Adding network addressing to the GPU interconnect is kind of fascinating

Am I right in thinking the GPU-to-GPU communication is just shuttling chunks of data around for sharing inputs/outputs of computations? Or is there some other coordination going on between the GPUs directly with regards to the actual computations each is running? (Or is that still being managed wholly by the CPUs they're attached to?)

learndeeply · 3 years ago

Both - the first is for sharding tensors across GPUs, the second is to do an all reduce (e.g. for distributed data parallel to synchronize gradients)

chessgecko · 3 years ago

Maybe it’s to speed up multi gpu matrix multiplies. They’re useful for serving/training gpt3 size models

jiggawatts · 3 years ago

musha68k · 3 years ago

Does anyone know where a mere mortal can try these with least markup asked by the corresponding cloud provider?

This is a preview, H100 hasn't started shipping yet.

There's been some speculation that if the Ethereum merge puts a lot of 3000-series cards on the used market then Nvidia might delay releasing this generation of GPU.

You're confusing the H100, a high-end data-center card with the RTX 4000 series.

wmf · 3 years ago

The 100 series aren't used for mining and used desktop GPUs don't compete with them. Hopper will be released when it's ready.