GCC always assumes aligned pointer accesses (2020)

The arguments in this blogpost are fundamentally flawed. The fact that they opened a bug based on them but got shut down should have raised all red flags.

When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".

Trying to scope that down using arguments of the form "but what the hardware does is X" are fundamentally flawed, because anything means anything, and what the hardware does doesn't change that, and therefore it doesn't matter.

This blogpost "What The Hardware Does is not What Your Program Does" explains this in more detail and with more examples.

https://www.ralfj.de/blog/2019/07/14/uninit.html

eklitzke · 2 years ago

The blog post is also kind of unhinged because in the incredibly rare cases where you would want to write code like this you can literally just use the asm keyword.

I think it's also worth considering WHY compilers (and the C standard) make these kinds of assumptions. For starters, not all hardware platforms allow unaligned accesses at all. Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses. God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read.[^1] The fact that you need to go through escape hatches to get the compiler to generate code to do unaligned loads and stores is a good thing, because it helps prevent people from writing code with mysterious slowdowns.

Writing a function that takes two pointers of the same type already has to pessimize loads and stores on the assumption that the pointers could alias. That is to say, if your function takes int p, int q then doing a store to p requires reloading q, because p and q could point to the same thing. Thankfully in some situations the compiler can figure out that in a certain context p and q have different addresses and therefore can't alias, this helps the compiler generate faster code (by avoiding redundant loads). If p and q are allowed to alias even when they have different addresses, this would all go out the window and you'd basically need to assume that all pointer types could alias under any situation. This would be TERRIBLE for performance.

[^1]: https://rigtorp.se/split-locks/

saagarjha · 2 years ago

> Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses.

This is generally not true.

Gibbon1 · 2 years ago

> For starters, not all hardware platforms allow unaligned accesses at all.

Yeah and always everywhere a mistake. It was a mistake back in the 1970's and it's increasing bigger mistake as time goes on. Just like big endian and 'network order'

vlovich123 · 2 years ago

While the sentiment is correct as to why compilers makes alignment assumptions, a lot of the details here I think are not quite right.

> For starters, not all hardware platforms allow unaligned accesses at all

If you're dealing with very simple CPUs like the ARM M0, sure. But even the M3/M4 allows unaligned access.

> Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses

I believe that information hasn't been true for a long time (since 1995). Unless you're talking about unaligned accesses that also cross a cache line boundary being slower [1]. But I imagine that aligned accesses crossing a cache line boundary are also similarly slower because the slowness is the cache line boundary.

> God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read

What you're referring to is atomic unaligned access that's also across cache line boundaries. I don't know what it is within a cache line, but I imagine it's not as bad as you make it out to be. Unaligned atomics across cache line boundaries also don't work on ARM and have much spottier support than unaligned access in general.

TLDR: People cargo cult advice about unaligned access but it's more because it's a simpler rule of thumb and there's typically very little benefit to pack things as tightly as possible which is where unaligned accesses generally come up.

[1] https://news.ycombinator.com/item?id=10529947

jcranmer · 2 years ago

This line in particular really bugs me:

> The present blog post brings bad, and as far as I know, previously undocumented news. Even if you really are targeting an instruction set without any memory access instruction that requires alignment, GCC still applies some sophisticated optimizations that assume aligned pointers.

I could have told you this was true ~20 years ago, and the main reason I'm so conservative in how far back gcc has been doing this is that it's only around that time I started programming--I strongly suspect this dates back to the 90's.

SAI_Peregrinus · 2 years ago

It dates to the first standardization of C in 1989. The "C as portable assembly" view ended when ANSI C got standardized, and K&R's 2nd edition was published.

noselasd · 2 years ago

indeed, I still have ~20 years old code that picks up and rectifies unaligned memory so gcc does the right thing. To claim a compiler bugs out on unaligned memory sounds very weird, I assumed that was common knowledge.

hinkley · 2 years ago

27 years ago I was helping someone rearrange structs because word-sized fields were being word aligned, and you would waste a good deal of memory if you arranged by concept instead of for byte packing. I believe that was under Microsoft’s compiler.

saagarjha · 2 years ago

What you’re saying and what the blog post is implying are different things. This is an admission that GCC optimizes on this behavior in practice. Your claim is that GCC could optimize on this, which is a much less interesting claim.

j16sdiz · 2 years ago

That's what the author meant when he said "The shift of the C language from “portable assembly” to “high-level programming language without the safety of high-level programming languages”"

Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".

The abstract machine idea was introduced much later.

> The arguments in this blogpost are fundamentally flawed.

The "fundamentally flawed" comment is revisionist idea.

JonChesterfield · 2 years ago

This turns out to be contentious. There are two histories of the C language and which one you get told is true depends on who you ask.

1/ a way to emit specific assembly with a compiler dealing with register allocation and instruction selection

2/ an abstract machine specification that permits optimisations and also happens to lower well defined code to some architectures

My working theory is that the language standardisation effort invented the latter. So when people say C was always like this, they mean since ansi c89, and there was no language before that. And when people say C used to be typed/convenient assembly language, they're referring to the language that was called C that existed in reality prior to that standards document.

The WG14 mailing list was insistent (in correspondence to me) that C was always like this, some of whom were presumably around at the time. A partial counterargument is the semi-infamous message from Dennis Richie copied in various places, e.g. https://www.lysator.liu.se/c/dmr-on-noalias.html

An out of context quote from that email to encourage people to read said context and ideally reply here with more information on this historical assessment

"The fundamental problem is that it is not possible to write real programs using the X3J11 definition of C. The committee has created an unreal language that no one can or will actually use."

Regards

bigbillheck · 2 years ago

> Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".

There was also a huge variety of compilers that were buggy and incomplete each in their own ways, often with mutually-incompatible extensions, not to mention prone to generating pretty awful code.

wbl · 2 years ago

How does C do what hardware does and store things in registers when it can?

User23 · 2 years ago

To the best of my recollection the “abstract machine” is a C++ism that unfortunately crept into C.

compiler-guy · 2 years ago

If you really are targeting the x86_64 instruction set, you should be writing x86_64 instructions. Then you get exactly what the hardware does and don’t get any of those pesky compiler assumptions.

Of course you don’t get any of those pleasant optimizations either. But those optimizations are only possible because of the assumptions.

astrange · 2 years ago

There's some optimizing x86 assemblers actually. Of course, now you have to follow their rules.

captainmuon · 2 years ago

I think it is a good blog post, because it highlights an issue that I was not aware of and that I think many programmers are not. I do think I am a decent C programmer, and I spotted the strict aliasing issue immediately, but I didn't know that unaligned pointer access is UB. Because let's face it, the majority of programmers didn't read the standard, and those who did don't remember all facets.

I first learned many years ago that you should pick apart binary data by casting structs, using pointers to the middle of fields and so on. It was ubiquitous for both speed and convenience. I don't know if it was legal even in the 90s, but it was general practice - MS Office file formats from that time were just dumped structs. Then at some point I learned about pointer alignment - but it was always framed due to performance, and due to the capabilities of exotic platforms, never as a correctness issue. But it's not just important to learn what to do, but also why to do it, which is why we need more articles highlighting these issues.

(And I have to admit, I am one of these misguided people who would love a flag to turn C into "portable assembler" again. Even if it is 10x slower, and even if I had to add annotations to every damn for loop to tell the compiler that I'm not overflowing. There are just cases where understanding what you are actually doing to the hardware trumps performance.)

saagarjha · 2 years ago

I think you (and most of the other commenters in this thread) misunderstand the perspective of the author. This is a tool meant to do static analysis of a C codebase. Their job is not to actually follow the standard, but identify what “common C” actually looks like. This is not the same as standard C.

There are a lot of things compilers do not optimize on even though they are technically illegal. As a result, people write code that relies on these kinds of manipulations. No, this is not your standard complaint about undefined behavior being the work of the devil, this is code that in certain places pushes the boundaries of what the compiler silently guarantees. The author’s job is to identify this, not what the standard says, because a tool that rejects any code that’s not entirely standards compliant is generally useless for any nontrivial codebase.

dataangel · 2 years ago

> When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".

This view is alienating systems programmers. You're right that that's what the standard says, but nobody actually wants that except compiler writers trying to juice unrealistic benchmarks. In practice programmers want to alias things, they want to access unaligned memory, they want to cast objects right out of memory without constructing them, etc. And they have real reasons to do so! More narrowly defining how off the rails the compiler is allowed to go, rather than anything is a desirable objective for changing the standard.

tom_ · 2 years ago

My UB links list is getting on in years, but somehow remains vaguely relevant.

"These people simply don't understand what C programmers want": https://groups.google.com/forum/#!msg/boring-crypto/48qa1kWi...

"please don't do this, you're not producing value": http://blog.metaobject.com/2014/04/cc-osmartass.html

"Everyone is fired": http://web.archive.org/web/20160309163927/http://robertoconc...

"No sane compiler writer would ever assume it allowed the compiler to 'do anything' with your code": http://web.archive.org/web/20180525172644/http://article.gma...

kmeisthax · 2 years ago

Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.

We need a C interpreter that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc. If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.

My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."

codedokode · 2 years ago

> For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines.

This is indeed a design mistake, but in another sense. Ordinary arithmetic ops like + or - should throw an exception on overflow (with both signed and unsigned operands) because most of the times you need an ordinary math, not math modulo 2^32. For those rare cases where wrap around is desired, there should be a function like add_and_wrap() or a special operator.

plorkyeran · 2 years ago

UBSan covers each of those except provenance checking, and ASan mostly catches provenance problems even though that's not directly the goal. There are some dumb forms of UB not caught by any of the sanitizers, but most of them are.

Making your program UBSan-clean is the bare minimum you should do if you're writing C or C++ in 2023, not an absurd goal. I know it'll never happen, but I'm increasingly of the opinion that UBSan should be enabled by default.

throwawaylinux · 2 years ago

> Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.

All C compilers implement the C abstract machine. It is not used to justify miscompiling code, it is used to specify behavior of compiled code.

> We need a C interpreter

Interpreter or not is not relevant, there must be some misconception. Any behavior you can implement with an interpreter can be implemented with compiled code. E.g., add a test and branch after each integer operation if you want to crash on overflow.

> that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc.

As others have mentioned there are static and dynamic checkers (sanitizers) that test for such things nowadays. In compiled, not interpreted code, mind you.

> If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.

It's not that bad.

> My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."

The spec uses implementation defined behavior for that. Although you can argue that they went the wrong way on some choices -- signed integer overflow "depends on the machine at hand" in the first K&R, which you could say would be reasonable to call it implementation specific and enumerate the behaviors of supported machines.

C had a long history with hardware manufacturers, compiler writers, and software developers though, so the standard can never universally please everybody. The purpose of standardization was never to make something that was easiset for software development, ignoring the other considerations. So a decision is not an example of design by committee gone wrong just because happened to be worse for software writers (e.g., choosing to make overflow undefined instead of implementation dependent). You would have to know why such a decision was made.

dorianh · 2 years ago

The blog post company does sell a C interpreter that checks for all undefined behaviors (with provenance and offset).

saagarjha · 2 years ago

The general problem with this argument is that “do what the hardware does” is actually not easy to reason about. The end results of this typically are impossible to grok.

tom_ · 2 years ago

And one of the anythings permitted would be to behave in a documented manner characteristic of the target environment. The program is after all almost certainly being built to run on an actual machine; if you know what that actual machine does, it would sometimes be useful to be able to take advantage of that. We might not be able to demand this on the basis that the standard requires it, but as a quality of implementation issue I think it a reasonable request.

This is such an obvious thing to do that I'm surprised the C standard doesn't include wording along those lines to accommodate it. But I suppose even if it did, people would just ignore it.

robinsonb5 · 2 years ago

The problem is that what the machine does isn't necessarily consistent. If you're using old-as-the-green-hills integer instructions then yes, the CPU supports unaligned access. If you want to benefit from the speedup afforded by the latest vector instructions, now it suddenly it doesn't.

Also, to be fair, GCC does appear to back off the optimisations when dealing with, for example, a struct with the packed attribute.

hedora · 2 years ago

Honestly, I think you are both incorrect.

C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.

Intel added instructions that can’t handle unaligned access, so they broke that contract. I’d argue that it is an instruction set architecture bug.

Alternatively, Intel could argue that compilers shouldn’t emit vector instructions unless they can statically prove the pointer is aligned. That’s not feasible in general for languages like C/C++, so that’s a pretty weak defense of having the processor pay the overhead of supporting unaligned access on some, but not all, paths.

SkiFire13 · 2 years ago

> C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.

There are a bunch of misconceptions here:

- unaligned loads were never implementation defined, they are undefined;

- even if they were implementation defined, this would give the compiler the choice of how to define them, not the instruction set;

- unaligned memory accesses on x86 for non-vector registers still work fine, so old instructions were not impacted and there's no bug. It's just that the expectations were not fulfilled for the new extension of those instructions.

assbuttbuttass · 2 years ago

Undefined and implementation defined are different in C. The number of bits in an int is implementation defined. Unaligned access is undefined.

JonChesterfield · 2 years ago

Loads of architectures can't do misaligned memory access. Even x86 has problems when variables span cache lines. The compiler usually deals with this for the programmer, e.g. by rounding the address down then doing multiple operations and splicing the result together.

armitron · 2 years ago

Unaligned memory accesses are undefined behavior in C. If you're writing C, you should be abiding by C rules. "Used to work correctly" is more guesswork and ignorance than "abiding by C rules". In C, playing fast&loose with definitions hurts, BAD.

Frankly, I'd be ashamed to write this blog post since the only thing it accomplishes is exposing its writers as not understanding the very thing they're signaling expertise on.

bigbillheck · 2 years ago

> C has always had a concept of implementation defined behavior,

Surely only after standardization tho?