Readit News logoReadit News
PeterStuer · a month ago
I always found hammering attacks to be extremely satisfying, even from a meta-physical pov.

You escape a closed virtual universe by not "breaking out" in the tradidional sense, exploiting some bug in the VM hypervisor's boundary itself, but by directly manipulating the underlying physics of the universe on wich the virtual universe is founded, just by creating a pattern inside the virtual universe itself.

No matter how many virtual digital layers, as long as you can impact the underlying analog substrate this might work.

Makes you dream there could be an equivalent for our own universe?

arduinomancer · a month ago
I tried knocking on my wall 100,000 times and it did indeed cause a disturbance in the neighbouring cell of my apartment

Turns out this whole virtualized house abstraction is a sham

N_Lens · a month ago
Thanks for the sensible chuckle.
queuebert · a month ago
> Makes you dream there could be an equivalent for our own universe?

My idea to attack the simulation is psychological: make our own simulation that then makes its own simulation, and so on all the way down. That will sow doubt in the minds of the simulators that they themselves are a simulation and make them sympathetic to our plight.

nemomarx · a month ago
What if the simulators tell you they're also doing this? It could be turtles all the way up perhaps
lukan · a month ago
"I always found hammering attacks to be extremely satisfying"

On a philosophical level I somewhat agree, but on a practical level I am sad as this likely means reduced performance again.

MangoToupe · a month ago
Only for places where you need security. Many types of computation do not need security.
mrkstu · a month ago
I remember as a kid in the 70’s I first heard about most physical things being empty space.

Walking into a wall a few hundred times may have damaged my forehead almost as much as my trust in science…

Deleted Comment

sneak · a month ago
If we are in a simulation, the number of ways that could be the equivalent of SysRq or control-alt-delete are infinite.

We haven’t even tried many of the simple/basic ones like moving objects at 0.9c.

0cf8612b2e1e · a month ago
To ruin a great joke, particle accelerators do get up to 0.9999+ the speed of light.
mistercow · a month ago
> Makes you dream there could be an equivalent for our own universe?

But would we even notice? As far as we were concerned, it would just be more physics.

LeifCarrotson · a month ago
The real question is how we'd know about the physics of the parent universe.

I think this short story is interesting to think about in that way:

https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien...

hoseja · a month ago
Well for Earth life there are multiple but evolution just learned to exploit them all.
amelius · a month ago
> No matter how many virtual digital layers

So you are saying that a GPU program can find exploits in physics without having access to e.g. high energy physics tools?

Sounds implausible.

SandmanDP · a month ago
> Makes you dream there could be an equivalent for our own universe?

I’ve always considered that to be what’s achieved by the LHC: smashing the fundamental building blocks of our universe together at extreme enough energies to briefly cause ripples through the substrate of said universe

whyowhy3484939 · a month ago
That's assuming there is a substrate that can be disturbed. That's where the parent's analogy breaks down.

As an example of an alternative analogy: think of how many bombs need to explode in your dreams before the "substrate" is "rippled". How big do the bombs need to be? How fast does the "matter" have to "move"? I think "reality" is more along those lines. If there is a substrate - and that's a big if - IMO it's more likely to be something pliable like "consciousness". Not in the least "disturbed" by anything moving in it.

simiones · a month ago
The LHC doesn't generate anything like the kind of energy that you get when interstellar particles hit the Earth's upper atmosphere, nevermind what's happening inside the sun - and any of these are many, many orders of magnitude below the energies you get in a supernova, for example.

The LHC is extremely impressive from a human engineering perspective, but it's nowhere close to pushing the boundaries of what's going on every second in the universe at large.

drcongo · a month ago
I love that we can switch out LHC for LSD and this comment would still feel perfect.
userbinator · a month ago
No one really cared about the occasional bitflips in VRAM when GPUs were only used for rendering graphics. It's odd that enabling ECC can reduce performance, unless they mean that's only in the presence of ECC errors being corrected, since AFAIK for CPUs there isn't any difference in speed even when correcting errors.

In a proof-of-concept, we use these bit flips to tamper with a victim’s DNN models and degrade model accuracy from 80% to 0.1%, using a single bit flip

There is a certain irony in doing this to probabilistic models, designed to mimic an inherently error-prone and imprecise reality.

ryao · a month ago
Nvidia implements ECC in software since they did not want to add the extra memory chip(s) needed to implement it in hardware to their boards. The only case where they do it in hardware is when they use HBM memory.

That said, GDDR7 does on die ECC, which gives immunity to this in its current form. There is no way to get information on corrected bitflips from on-die ECC, but it is better than nothing.

hamandcheese · a month ago
Available ECC dims are often slower than non-ECC dims. Both slower MT/s and higher latency. At least for "prosumer" ECC UDIMMs which are what I'm familiar with.

So it doesn't seem that wild to me that turning on ECC might require running at lower bandwidth.

ryao · a month ago
This is incorrect. ECC DIMMs are no slower than regular DIMMs. Instead, they have extra memory and extra memory bandwidth. A 8GB DDR4 ECC DIMM would have 9GB of memory and 9/8 the memory bandwidth. The extra memory is used to store the ECC bits while the extra memory bandwidth is to prevent performance loss when reading/writing ECC alongside the rest of the memory. The memory controller will spend an extra cycle verifying the ECC, which is a negligible performance hit. In reality, there is no noticeable performance difference. However, where you would have 128 traces to a Zen 3 CPU for DDR4 without ECC, you would need 144 traces for DDR4 with ECC.

A similar situation occurs with GDDR6, except Nvidia was too cheap to implement the extra traces and pay for the extra chip, so instead, they emulate ECC using the existing memory and memory bandwidth, rather than adding more memory and memory bandwidth like CPU vendors do. This causes the performance hit when you turn on ECC on most Nvidia cards. The only exception should be the HBM cards, where the HBM includes ECC in the same way it is done on CPU memory, so there should be no real performance difference.

Deleted Comment

privatelypublic · a month ago
This seems predicated on there being significant workloads that split gpu's between tenants for compute purposes.

Anybody have sizable examples? Everything I can think of results in dedicated gpus.

vlovich123 · a month ago
Many of the GPU rental companies charge less for shared GPU workloads. So it's a cost/compute tradeoff. It's usually not about the workload itself needing the full GPU unless you really need all the RAM on a single instance.
privatelypublic · a month ago
Any examples to check out? The only one i know of is vastai... and there's already a list of security issues a mile long there.
huntaub · a month ago
My (limited) understanding was that the industry previously knew that it was unsafe to share GPUs between tenants, which is why the major cloud providers only sell dedicated GPUs.
bluedino · a month ago
NVIDIA GPU's can run in MIG (Multi-Instance GPU), allowing you to pack more jobs on than you have GPUs. Very common in HPC but I don't about in the cloud.
privatelypublic · a month ago
I thought about splitting the GPU between workloads, as well terminal server/virtualized desktop situations.

I'd expect all code to be strongly controlled in the former, and reasonably secured in the latter with software/driver level mitigations possible and the fact that corrupting somebody else's desktop with row-hammer doesn't seem like good investment.

As another person mentioned- and maybe it is a wider usage than I thought- cloud gpu compute running custom code seems to be the only useful item. But, I'm having a hard time coming up with a useful scenario. Maybe corrupting a SIEM's analysis & alerting of an ongoing attack?

cyberax · a month ago
No large cloud hoster (AWS, Google, Azure) shares GPUs between tenants.
privatelypublic · a month ago
Update: I thought for a second I had one: Jupyter notebook services with GPUs- but looking at google colab^* even there its a dedicated GPU for that session.

* random aside: how is colab compute credits having a 90 day expiration legal? I thought california outlawed company-currency expiring? (A la gift cards)

dogma1138 · a month ago
Colab credits aren’t likely a currency equivalent but a service equivalent which is still legal to expire afaik.

Basically Google Colab credits is like buying a seasonal bus pass with X trips or a monthly parking pass with X amount of hours. Rather than getting store cash which can be used for anything.

Deleted Comment

SnowflakeOnIce · a month ago
Example: A workstation or consumer GPU used both for rendering the desktop and running some GPGPU thing (like a deep neural network)
privatelypublic · a month ago
Not an issue- thats a single Tennant.

Which is my point.

haiku2077 · a month ago
GKE can share a single GPU between multiple containers in a partitioned or timeshared scheme: https://cloud.google.com/kubernetes-engine/docs/concepts/tim...
privatelypublic · a month ago
Thats the thing... they're all the same tennant. A GKE node is a VM instance, and GCE doesn't have shared GPUs that I can see.
im3w1l · a month ago
Webgpu api taking screenshot of full desktop maybe?
Buttons840 · a month ago
Do you think WebGPU would be any more of an attack vector than WebGL?
privatelypublic · a month ago
Rowhammer itself is a write-only attack vector. It can, however, potentially be chained to change the write address to an incorrect region. Haven't dived into details.
perching_aix · a month ago
HW noob here, anyone here has insight on how an issue like this passes EM simulation during development? I understand that modern chips are way too complex for full formal verification, but I'd have thought memory modules would be so highly structurally regular that it might be possible there despite it.
andyferris · a month ago
I am no expert in the field, but my reading of the original rowhammer issue (and later partial hardware mitigations) was that it was seen as better to design RAM that works fast and is dense and get that to market, than to engineer something provably untamperable with greater tolerances / die size / latency.

GPUs have always been squarely in the "get stuff to consumers ASAP" camp, rather than NASA-like engineering that can withstand cosmic rays and such.

I also presume an EM simulation would be able to spot it, but prior to rowhammer it is also possible no-one ever thought to check for it (or more likely that they'd check the simulation with random or typical data inputs, not a hitherto-unthought-of attack vector, but that doesn't explain more modern hardware).

privatelypublic · a month ago
I seem to recall that rowhammer was known- but thought impossible for userland code to implement.

This is a huge theme for vulnerabilities. I almost said "modern" but looking back I've seen the cycle (disregard attacks as strictly hypothetical. Get caught unprepared when somebody publishes something making it practical) happen more than a few times.

userbinator · a month ago
but prior to rowhammer it is also possible no-one ever thought to check for it

It was known as "pattern sensitivity" in the industry for decades, basically ever since the beginning, and considered a blocking defect. Here's a random article from 1989 (don't know why first page is missing, but look at the references): http://web.eecs.umich.edu/~mazum/PAPERS-MAZUM/patternsensiti...

Then some bastards like these came along...

https://research.ece.cmu.edu/safari/thesis/skhan_jobtalk_sli...

...and essentially said "who cares, let someone else be responsible for the imperfections while we can sell more crap", leading to the current mess we're in.

The flash memory industry took a similar dark turn decades ago.

MadnessASAP · a month ago
Given that I wasnt surprised by the headlie Inhave to imagine that nvidia engineers were also well aware.

Nothing is perfect, everything has its failure conditions. The question is where do you choose to place the bar? Do you want your component to work at 60, 80, or 100C? Do you want it to work in high radiation environments? Do you want it to withstand pathological access patterns?

So in other words, there isnt a sufficent market for GPUs at double the $/GB RAM but are resilient to rowhammer attacks to justify manufacturing them.

wnoise · a month ago
The idea of pathological RAM access patterns is as ridiculous as the idea of pathological division of floating point numbers. ( https://en.wikipedia.org/wiki/Pentium_FDIV_bug ). The spec of RAM is to be able to store anything in any order, reliably. They failed the spec.
bobmcnamara · a month ago
> The question is where do you choose to place the bar?

In the datasheet.

thijsr · a month ago
Rowhammer is an inherent problem to the way we design DRAM. It is a known problem to memory manufacturers that is very hard, if not impossible, to fix. In fact, Rowhammer only becomes worse as the memory density increases.
sroussey · a month ago
It’s a matter of percentages… not all manufacturers fell to the rowhammer attack.

The positive part of the original rowhammer report was that it gave us a new tool to validate memory (it caused failures much faster than other validation methods).

iFire · a month ago
Does the ECC mode on my 4090 Nvidia rtx stop this?
fc417fc802 · a month ago
Yes, but it reduces performance, and you don't need to care about this because (presumably) you aren't a cloud provider running multi-tenant workloads.

Worst case scenario someone pulls this off using webgl and a website is able to corrupt your VRAM. They can't actually steal anything in that scenario (AFAIK) making it nothing more than a minor inconvenience.

perching_aix · a month ago
Couldn't it possibly lead to arbitrary code execution on the GPU, with that opening the floodgates towards the rest of the system via DMA, or maybe even enabling the dropping of some payload for the kernel mode GPU driver?
keysdev · a month ago
What about Apple M series?
sylware · a month ago
On the general case, that's why some optimized assembly written machine code can be an issue compared to the slow compiler generated machine code (not true all the time of course): if this machine code is 'hammering' memory, it is could happen more likely with the optimized assembly machine code than with the "actually tested" compiler genertade machine code.
saagarjha · a month ago
No, in fact you can often Rowhammer inside an interpreter if you construct it correctly.
sylware · a month ago
Point missed: this is not what I said.