Could websites concerned with privacy deploy a package that triggers interrupts randomly? Could a browser extension do it for every site?
Our countermeasure which triggers interrupts randomly is implemented as a browser extension, the source code for which is available here: https://github.com/jackcook/bigger-fish
I'm not sure I would recommend it for daily use though, I think our tests showed it slowed page load times down by about 10%.
EDIT: from looking at the paper, it seems like even though the core state space model/selection mechanism is linear (except for discretization?), they incorporate a nonlinearity in the full “mamba block”, which is stacked up with residual connections and layer norm just like in a transformer. They describe this as combining a linear attention and an MLP into a single step, rather than alternating attention and MLP as in a transformer.
One question I have on selectivity: footnote 4 says "the continuous A is constant, while our discretization parameter ∆ is input-dependent." What is the effect of varying the discretization instead of the (main, as I understand it) state A? My gut says it simplifies training and provides stability, but I feel A carries most of the behavior of the model, so it should have more wiggle room throughout training.
“We remark that while the A parameter could also be selective, it ultimately affects the model only through its interaction with ∆ via A = exp(∆A) (the discretization (4)). Thus selectivity in ∆ is enough to ensure selectivity in (A, B), and is the main source of improvement. We hypothesize that making A selective in addition to (or instead of) ∆ would have similar performance, and leave it out for simplicity.”
I think the paper's contributions really don't have anything to do with ML; it's about the new side channel with interrupts, which is a cool find. ML just gets more people to read it, which I guess is ok. I mean, you could just use "statistics" here in much the same way.
I remember an advisor once telling me: once you figure out what a paper is really about, rewrite it, and remove the stuff you used to think it was about. The title of this paper should be about the new side channel, not about the ML story, imho.
But this is just a nitpick. Great work!
But the finding about ML misinterpretation is particularly notable because it calls a lot of existing computer architecture research into question. In the past, attacks like this were very difficult to pull off without an in-depth understanding of the side channel being exploited. But ML models (in this case, an LSTM) generally go a bit beyond “statistics” because they unlock much greater accuracy, making it much easier to develop powerful attacks that exploit side channels that aren’t really understood. And there are a lot of ML-assisted attacks created in this fashion today: the Shusterman et al. paper alone has almost 200 citations, a huge amount for a computer architecture paper.
The point of publishing this kind of research is to better understand our systems so we can build stronger defenses — the cost of getting this wrong and misleading the community is pretty high. And this would technically still be true even if we ultimately found that the cache was responsible for the prior attack. But of course, it helps that we discovered a new side channel along the way — this really drove our point home. I probably could have emphasized this more in my blogpost.