Adding network addressing to the GPU interconnect is kind of fascinating
Am I right in thinking the GPU-to-GPU communication is just shuttling chunks of data around for sharing inputs/outputs of computations? Or is there some other coordination going on between the GPUs directly with regards to the actual computations each is running? (Or is that still being managed wholly by the CPUs they're attached to?)
The mention of "Confidential Computing support" and the word "secure" over and over makes me strangely nervous.
I have a sneaking suspicion that somewhere in an NSA datacenter there will be racks upon racks of these things running transformers over every text message and voice call coming from certain regions...
“Confidential Computing” is an up-and-coming model in which a user (e.g. cloud customer) trusts a hardware vendor (e.g. Intel, AMD, nVidia), and that vendor has a mechanism by which systems with their hardware can run user code and attest, remotely, to the user that the code has not been tampered with and that the system’s owner (e.g. Amazon) is not exfiltrating user data.
This could be used for DRM or nefarious purposes, but that’s not the point.
Obviously this is no more secure than the underlying hardware. All the attacks against SGX, for example, will still apply unless they are fixed.
While I can totally imagine that being true, I imagine this is probably more a response to cloud providers wanting to subdivide each physical GPU into virtual/multi-tenant GPUs.
They'll want to be able to provide strong security guarantees to customers who're renting compute on a multi-tenant GPU and are pushing their sensitive data onto it
FP8 is interesting. I wonder if training could ever be made to work in this format? I see they have two options for mantissa size which is interesting. But no unsigned option? Losing a whole bit to the sign seems unfortunate when you only have eight.
There have been a lot of papers on techniques for training in low precision, even as low as binary. But the largest models that would benefit most aren't using those techniques, which indicates to me that they don't work as well as their paper abstracts suggest.
This is a preview, H100 hasn't started shipping yet.
There's been some speculation that if the Ethereum merge puts a lot of 3000-series cards on the used market then Nvidia might delay releasing this generation of GPU.
Am I right in thinking the GPU-to-GPU communication is just shuttling chunks of data around for sharing inputs/outputs of computations? Or is there some other coordination going on between the GPUs directly with regards to the actual computations each is running? (Or is that still being managed wholly by the CPUs they're attached to?)
I have a sneaking suspicion that somewhere in an NSA datacenter there will be racks upon racks of these things running transformers over every text message and voice call coming from certain regions...
This could be used for DRM or nefarious purposes, but that’s not the point.
Obviously this is no more secure than the underlying hardware. All the attacks against SGX, for example, will still apply unless they are fixed.
I wonder what's different this time.
A group at IBM has been claiming for years that you can train with as little as FP4: https://papers.nips.cc/paper/2020/file/13b919438259814cd5be8...
There's been some speculation that if the Ethereum merge puts a lot of 3000-series cards on the used market then Nvidia might delay releasing this generation of GPU.