Breaking Bad: How Compilers Break Constant-Time~Implementations

kibwen · 10 months ago

> Our work highlights that current defensive programming techniques are more akin to anecdotal workarounds to compilers’ behavior rather than a systematic solution. Our results point out that high-level constant-time code is prone to dangerous transformations and optimizations from compilers and calls for cooperation between developers of security-critical libraries and general-purpose compilers.

All optimizing compilers perform optimizations based on the notion of an abstract machine, which determines what semantics of the original program must be conserved. And the abstract machine for a given language is usually provided by the language that is being implemented (except for e.g. languages that just give up and say "give me C's abstract machine").

You definitely don't want compiler backends trying to "guess" when some defensive programming technique is being used, so that they can compile it conservatively. That's a dead-end pathway to madness. You need languages to give you the power to explicitly say "this operation is constant time, don't mess with it". And in practice they do give you this power, via inline assembly. Of course inline assembly is suboptimal for a lot of reasons, and we can certainly try to do better, but this is a language design problem, not an optimizing compiler problem.

the8472 · 10 months ago

Not just the abstract machines. CPUs don't guarantee constant-time execution for most instructions. And if you target wasm or some other intermediate compilation target then the first compiler can't do much about the 2nd compiler applying additional optimizations. And then there are post-link optimizations where any hypothetical "please don't optimize this" annotations would be gone.

pclmulqdq · 10 months ago

In a theoretical sense only, that is true on x86 when a flag (that is always set) is not set. The vast majority of useful instructions are constant-time (in the value of the data).

pjc50 · 10 months ago

C has enough security questions hanging around it; for things like constant time operation, which has AFAIK never been promised anywhere in the C standards (and probably can't be!), I think the only reasonable solution is a new language specifically designed for the purpose.

Not for writing whole programs in, but for writing fairly small amounts of code that performs arithmetic and compiles to libraries with C linkage. No VM, not even any memory allocation (make the caller do that in advance with fixed-size areas).

Basically the "portable assembler" that people keep using C for despite its explicit promises in the standard to not be that.

variadix · 10 months ago

At least for C and C++, why don’t GCC/clang implement constant time primitives as intrinsics? It would be useful in the meantime to developers working on cryptographic code, would provide better guarantees, and could emit useful errors when compiling against platforms where such guarantees cannot be provided. It would also act as prior art for standardization.

I’ve always thought it was strange that assembly wasn’t used for some of these primitives, since you don’t control the codegen, but letting the compiler provide intrinsics would solve the problem without requiring every project to write their own assembly routines.

PaulDavisThe1st · 10 months ago

> the only reasonable solution is a new language specifically designed for the purpose.

actual assembler is just such a language, as is being discussed here.

amluto · 10 months ago

I think a good solution involves more than operations. Consider:

    int a = read_a_value();
    a = constant_time_add(a+1);
    do_something_with(a);

In many languages, the compiler is free to insert entirely new operations between these statements so long as the output isn’t affected. Or the compiler could reuse a register that contained a to do something else and do something awful like zeroing that register in non-constant time.

I think that new types are a better solution. A type could be specified such that the compiler must not use its value in a non-constant time operation, nor may the compiler leak it into an initialized value.

pjmlp · 10 months ago

The problem is the programming language culture, in some circles, if guessing is going too far, the compiler writers will rather opt for safety, while other language communities will stop at nothing for the ultimate performance, nitromethan powered optimising compiler.

GTP · 10 months ago

A thing that I've been wondering is, would it be enough in this specific case to turn off compiler optimizations, e.g. by using GCC's -O0 flag?

recursivecaveat · 10 months ago

In this specific case, in practice, maybe. At end the of the day though it is like trying to write a metred poem in a foreign language by writing English and passing it through google translate. You're trying to precisely control properties of the output with a tool that was not designed for it, doesn't realize what you're doing, and has at every step many possible alternatives. Any success you have is tied to a specific version of the tool, there are no guarantees.

SAI_Peregrinus · 10 months ago

In general, no. Optimizations from a compiler can't do anything regular code gen isn't allowed to do. It's just they're more likely to do some things. So in practice disabling optimization (especially using pragmas or attributes for certain critical functions) can help, but it can't be considered truly reliable.

fanf2 · 10 months ago

The paper says,

« We found binaries with issues in all optimization levels as depicted in Figure 3. We observe notable drops for LLVM's and GCC's -O0. We believe that fewer enabled optimizations also lead to fewer compiler-induced issues. We discuss in Section VII that optimizations are one of the reasons compilers introduce side channels; thus fewer optimizations could mean defensive programming techniques are less likely to get inter-fered with. However, we note that this cannot be considered a solution to compiler-induces issues as we still observe some issues with -O0. We discuss these further in Section VI-D. »

photochemsyn · 10 months ago

Libsodium does this compiler-specific pragmas and attributes (like __attribute__((optimize("O0")))). It's also the cryptographic library mentioned in this study with the smallest number of discovered issues, but:

> "However, we want to note that libsodium is the smallest library that we test with the most limited support for cryptographic primitives. For example, libsodium does not support the elliptic curve secp256; instead,they do all of their elliptic curve operations on curve25519. Therefore, it may be unfair to compare to larger libraries with support for more cryptographic primitives."

cryptonector · 10 months ago

Unreliable. But in combination with tests of constant-time-ness then it's ok in the short-term. In the long-term it'd be nice if the language and tooling could be told "this is to be constant-time", and even better to have a language specially crafted for these sorts of things, something between assembly and C.

secondcoming · 10 months ago

memset isn’t part of any abstract machine, yet compilers feel free to optimise that out, so we needed to add new functions that compilers wouldn’t touch.

gavinsyancey · 10 months ago

The C language specification describes what memset does. If a program calls memset, compilers must make it behave as if an actual call to the memset function (as described in the standard) was made, for all observable effects that matter to the abstract machine.

p4bl0 · 10 months ago

The subject of my PhD thesis was formal approach to cryptographic implementation against side-channel attacks [1]. In the lab I worked at, most people wrote assembly code for precisely the reasons discussed here [2], and a part of my work consisted in applying a form of formal verification directly on assembly code rather than a more abstract version of the cryptographic algorithms. You cannot trust compilers which are designed with efficiency in mind when your main objective is security. Keeping the same semantics and going as fast as possible often break security measures.

And that's not just for constant-time implementations, sometimes it's a lot more simpler than that. Some security scheme against power consumption side-channel attacks, such as dual-rail with precharge logic, require that registers are zeroed before being written to. A compiler seeing code such as:

    x = 0                /* first write, zeroing */
    x = sensitive_data   /* second write, actual value */

will of course consider that the first write is never read and optimize it away, thus breaking the security scheme… Then, there's also the need to make sure that the register your using is always the same (because leakage profile may differ a lot between the different registers of a single device). All the more reasons to write critical cryptographic code in directly in assembly.

[1] https://pablo.rauzy.name/research.html#phd

[2] A ~fun anecdote: I'm a computer scientist, but most people at that lab were in electronics, and for them assembly code was considered high level because it was already software!

manoweb · 10 months ago

Hello, isn’t part of what you were trying to achieve already taken care of if you use ‘volatile’ or a memory barrier? When writing Linux drivers these patterns are common. In some more exotic cases, even instruction barriers can be used to limit what the out of order or speculative execution is going to be

smaddox · 10 months ago

Not the OP, and not at all an expert in this area, but I was curious what the answer was, and from a bit of reading it seems like a potential reason that might not work is because volatile also prevents caching the value in a register. So if you want to keep the computation in a register, but you want to explicitly clear the value to zero before writing to it, it seems like C semantics are insufficient (at least without inline assembly).

p4bl0 · 10 months ago

Possibly (I don't know how volatile interact with registers allocation), but the thing is you don't just want this specific example to work, there are a lot of things you want to be able to really ensure, i.e., have rigorous proof of [1]. So maybe if the semantics is adequate and the compiler itself is formally proved like CompCert [2] you can rely on volatile, but that's a lot a assumptions.

[1] See for example the work we did in this paper: Formally Proved Security of Assembly Code Against Power Analysis: A Case Study on Balanced Logic https://eprint.iacr.org/2013/554

[2] https://compcert.org/

Plasmoid · 10 months ago

How do you handle the the CPU doing out-of-order execution and other tricks to reduce actual run-time?

cokernel_hacker · 10 months ago

ARM has DIT: https://developer.arm.com/documentation/ddi0595/2021-06/AArc...

Intel has DOIT: https://www.intel.com/content/www/us/en/developer/articles/t...

The idea is that the processor will not take shortcuts that take advantage of the values it is processing. For example, a 64-bit division cannot shortcut if the operands are both small, etc.

p4bl0 · 10 months ago

I actually don't know, we weren't working on modern CPUs but on smaller devices that were used for sensitive embedded systems.

pjc50 · 10 months ago

OOE does not depend on the value of the operands, only on register allocation.

badmintonbaseba · 10 months ago

I get that assembly gets you closer, but does it actually give you the guarantees that you need? Do you have detailed documentation of what instructions and patterns you can use to avoid timing and other side channels that applies to many CPUs that implement an instruction set?

pjc50 · 10 months ago

In practice you have to go by the timings given in the datasheet, which may be tedious to find. Consumer grade CPUs aren't usually protected against other side channels.

p4bl0 · 10 months ago

That depends a lot on the hardware. For some embedded systems, such as 8-bit smartcards, you can get pretty detailed datasheets from the manufacturer, and the architecture is a lot simpler than modern CPUs.

moomin · 10 months ago

I saw a rant on here about this not so long ago: it feels ridiculous that nearly all of our security code is compiled using compilers that cannot provide the guarantees these algorithms need.

klooney · 10 months ago

A lot of it is in ASM, for just that reason.

saghm · 10 months ago

Sure, but needing to write security code in assembly is ridiculous for other reasons

empath75 · 10 months ago

Seems like a hard problem to solve, because even if you mark some block of code as DO_NOT_OPTIMIZE it still is likely to pull in code from elsewhere that probably has been optimized. Seems like you might have to actually just write the relevant code in inlined machine code.

jonhohle · 10 months ago

It’s not only that, but also the semantics of the language. Is, for example, integer comparison guaranteed to be constant time? What about integer comparison with constants that require multiple load instructions (like on MIPS where all instructions are word sized so loading words requires two instructions). Is it “wrong” to do two comparisons against the high half and the low half independently, each branching on failure, if the language provides no guarantees and the outcome is the same?

Ultimately security focused languages that provide operational guarantees with C linkage are probably necessary. I don’t think it’s enough to not optimize C.

immibis · 10 months ago

Inlined micro instructions, at that, with a special flag to prevent the CPU skipping over zero runs in multipliers.

jedisct1 · 10 months ago

Jasmin is an amazing tool to write code that's proven to be free of side channels, memory safe, and protected against common speculative execution attacks: https://github.com/jasmin-lang/jasmin

I wrote AEGIS implementations in it: https://github.com/jedisct1/aegis-jasmin and it was a really great experience.

jkaptur · 10 months ago

If you offer a programmer a deal where you make their code x% faster on average, never slower, and the actual speedup depends on a huge number of factors making it really hard to reason about, they'll take it 99.99% 0f the time.

It seems to me that the entire stack is built on this assumption, making life incredibly difficult the other 0.01% of the time.

high_na_euv · 10 months ago

No.

Nobody asks such a question and not all would answer it the same.

Id rather take 5% perf. regression to my cpp code if it meant that compile time halves and errors quality will be better

badmintonbaseba · 10 months ago

Maybe this is overly naive, but I always wondered whether timing side channels could be mitigated by scheduling authentication responses before doing the crypto calculations.

1. authentication request arrives

2. schedule response at a later time, generously leaving time for doing the next step

3. do part of the work that is sensitive to timing attacks

4. respond at scheduled time

avidiax · 10 months ago

That applies over a network. It doesn't apply to someone that has physical access to the machine. Even if the machine delays a response, the power profile of the computation will tell the exact timing of the computation.

Since there is lots of cryptography used to keep users out of their own hardware (or stolen hardware) these days, this is important.

Validark · 10 months ago

Good for the authors for putting the idea out there I guess, but to me it seems pretty obvious that people who never look at their assembly would have no idea whether it matches their expectations.

And it's not hard. Godbolt.org is free for everyone.

Checking if the code branches is super easy, you don't need a super smart runtime analysis tool to figure out if there's a branch. Ditto for everything else you want to check.

LorenPechtel · 10 months ago

I think the only real option is for the compilers to add a couple of limitations:

1) Allow any block of code to be defined as constant-time. The compiler is forbidden any optimizations that could break this. Constant-time routines can only call other routines that are likewise declared as constant-time.

2) Allow any operation to be defined as important. The compiler can't remove it as redundant.