The arguments in this blogpost are fundamentally flawed. The fact that they opened a bug based on them but got shut down should have raised all red flags.
When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".
Trying to scope that down using arguments of the form "but what the hardware does is X" are fundamentally flawed, because anything means anything, and what the hardware does doesn't change that, and therefore it doesn't matter.
This blogpost "What The Hardware Does is not What Your Program Does" explains this in more detail and with more examples.
The blog post is also kind of unhinged because in the incredibly rare cases where you would want to write code like this you can literally just use the asm keyword.
I think it's also worth considering WHY compilers (and the C standard) make these kinds of assumptions. For starters, not all hardware platforms allow unaligned accesses at all. Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses. God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read.[^1] The fact that you need to go through escape hatches to get the compiler to generate code to do unaligned loads and stores is a good thing, because it helps prevent people from writing code with mysterious slowdowns.
Writing a function that takes two pointers of the same type already has to pessimize loads and stores on the assumption that the pointers could alias. That is to say, if your function takes int p, int q then doing a store to p requires reloading q, because p and q could point to the same thing. Thankfully in some situations the compiler can figure out that in a certain context p and q have different addresses and therefore can't alias, this helps the compiler generate faster code (by avoiding redundant loads). If p and q are allowed to alias even when they have different addresses, this would all go out the window and you'd basically need to assume that all pointer types could alias under any situation. This would be TERRIBLE for performance.
> For starters, not all hardware platforms allow unaligned accesses at all.
Yeah and always everywhere a mistake. It was a mistake back in the 1970's and it's increasing bigger mistake as time goes on. Just like big endian and 'network order'
While the sentiment is correct as to why compilers makes alignment assumptions, a lot of the details here I think are not quite right.
> For starters, not all hardware platforms allow unaligned accesses at all
If you're dealing with very simple CPUs like the ARM M0, sure. But even the M3/M4 allows unaligned access.
> Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses
I believe that information hasn't been true for a long time (since 1995). Unless you're talking about unaligned accesses that also cross a cache line boundary being slower [1]. But I imagine that aligned accesses crossing a cache line boundary are also similarly slower because the slowness is the cache line boundary.
> God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read
What you're referring to is atomic unaligned access that's also across cache line boundaries. I don't know what it is within a cache line, but I imagine it's not as bad as you make it out to be. Unaligned atomics across cache line boundaries also don't work on ARM and have much spottier support than unaligned access in general.
TLDR: People cargo cult advice about unaligned access but it's more because it's a simpler rule of thumb and there's typically very little benefit to pack things as tightly as possible which is where unaligned accesses generally come up.
> The present blog post brings bad, and as far as I know, previously undocumented news. Even if you really are targeting an instruction set without any memory access instruction that requires alignment, GCC still applies some sophisticated optimizations that assume aligned pointers.
I could have told you this was true ~20 years ago, and the main reason I'm so conservative in how far back gcc has been doing this is that it's only around that time I started programming--I strongly suspect this dates back to the 90's.
It dates to the first standardization of C in 1989. The "C as portable assembly" view ended when ANSI C got standardized, and K&R's 2nd edition was published.
indeed, I still have ~20 years old code that picks up and rectifies unaligned memory so gcc does the right thing. To claim a compiler bugs out on unaligned memory sounds very weird, I assumed that was common knowledge.
27 years ago I was helping someone rearrange structs because word-sized fields were being word aligned, and you would waste a good deal of memory if you arranged by concept instead of for byte packing. I believe that was under Microsoft’s compiler.
What you’re saying and what the blog post is implying are different things. This is an admission that GCC optimizes on this behavior in practice. Your claim is that GCC could optimize on this, which is a much less interesting claim.
That's what the author meant when he said "The shift of the C language from “portable assembly” to “high-level programming language without the safety of high-level programming languages”"
Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".
The abstract machine idea was introduced much later.
> The arguments in this blogpost are fundamentally flawed.
The "fundamentally flawed" comment is revisionist idea.
This turns out to be contentious. There are two histories of the C language and which one you get told is true depends on who you ask.
1/ a way to emit specific assembly with a compiler dealing with register allocation and instruction selection
2/ an abstract machine specification that permits optimisations and also happens to lower well defined code to some architectures
My working theory is that the language standardisation effort invented the latter. So when people say C was always like this, they mean since ansi c89, and there was no language before that. And when people say C used to be typed/convenient assembly language, they're referring to the language that was called C that existed in reality prior to that standards document.
The WG14 mailing list was insistent (in correspondence to me) that C was always like this, some of whom were presumably around at the time. A partial counterargument is the semi-infamous message from Dennis Richie copied in various places, e.g. https://www.lysator.liu.se/c/dmr-on-noalias.html
An out of context quote from that email to encourage people to read said context and ideally reply here with more information on this historical assessment
"The fundamental problem is that it is not possible to write real programs using the X3J11 definition of C. The committee has created an unreal language that no one can or will actually use."
> Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".
There was also a huge variety of compilers that were buggy and incomplete each in their own ways, often with mutually-incompatible extensions, not to mention prone to generating pretty awful code.
If you really are targeting the x86_64 instruction set, you should be writing x86_64 instructions. Then you get exactly what the hardware does and don’t get any of those pesky compiler assumptions.
Of course you don’t get any of those pleasant optimizations either. But those optimizations are only possible because of the assumptions.
I think it is a good blog post, because it highlights an issue that I was not aware of and that I think many programmers are not. I do think I am a decent C programmer, and I spotted the strict aliasing issue immediately, but I didn't know that unaligned pointer access is UB. Because let's face it, the majority of programmers didn't read the standard, and those who did don't remember all facets.
I first learned many years ago that you should pick apart binary data by casting structs, using pointers to the middle of fields and so on. It was ubiquitous for both speed and convenience. I don't know if it was legal even in the 90s, but it was general practice - MS Office file formats from that time were just dumped structs. Then at some point I learned about pointer alignment - but it was always framed due to performance, and due to the capabilities of exotic platforms, never as a correctness issue. But it's not just important to learn what to do, but also why to do it, which is why we need more articles highlighting these issues.
(And I have to admit, I am one of these misguided people who would love a flag to turn C into "portable assembler" again. Even if it is 10x slower, and even if I had to add annotations to every damn for loop to tell the compiler that I'm not overflowing. There are just cases where understanding what you are actually doing to the hardware trumps performance.)
I think you (and most of the other commenters in this thread) misunderstand the perspective of the author. This is a tool meant to do static analysis of a C codebase. Their job is not to actually follow the standard, but identify what “common C” actually looks like. This is not the same as standard C.
There are a lot of things compilers do not optimize on even though they are technically illegal. As a result, people write code that relies on these kinds of manipulations. No, this is not your standard complaint about undefined behavior being the work of the devil, this is code that in certain places pushes the boundaries of what the compiler silently guarantees. The author’s job is to identify this, not what the standard says, because a tool that rejects any code that’s not entirely standards compliant is generally useless for any nontrivial codebase.
> When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".
This view is alienating systems programmers. You're right that that's what the standard says, but nobody actually wants that except compiler writers trying to juice unrealistic benchmarks. In practice programmers want to alias things, they want to access unaligned memory, they want to cast objects right out of memory without constructing them, etc. And they have real reasons to do so! More narrowly defining how off the rails the compiler is allowed to go, rather than anything is a desirable objective for changing the standard.
Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.
We need a C interpreter that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc. If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.
My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."
> For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines.
This is indeed a design mistake, but in another sense. Ordinary arithmetic ops like + or - should throw an exception on overflow (with both signed and unsigned operands) because most of the times you need an ordinary math, not math modulo 2^32. For those rare cases where wrap around is desired, there should be a function like add_and_wrap() or a special operator.
UBSan covers each of those except provenance checking, and ASan mostly catches provenance problems even though that's not directly the goal. There are some dumb forms of UB not caught by any of the sanitizers, but most of them are.
Making your program UBSan-clean is the bare minimum you should do if you're writing C or C++ in 2023, not an absurd goal. I know it'll never happen, but I'm increasingly of the opinion that UBSan should be enabled by default.
> Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.
All C compilers implement the C abstract machine. It is not used to justify miscompiling code, it is used to specify behavior of compiled code.
> We need a C interpreter
Interpreter or not is not relevant, there must be some misconception. Any behavior you can implement with an interpreter can be implemented with compiled code. E.g., add a test and branch after each integer operation if you want to crash on overflow.
> that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc.
As others have mentioned there are static and dynamic checkers (sanitizers) that test for such things nowadays. In compiled, not interpreted code, mind you.
> If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.
It's not that bad.
> My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."
The spec uses implementation defined behavior for that. Although you can argue that they went the wrong way on some choices -- signed integer overflow "depends on the machine at hand" in the first K&R, which you could say would be reasonable to call it implementation specific and enumerate the behaviors of supported machines.
C had a long history with hardware manufacturers, compiler writers, and software developers though, so the standard can never universally please everybody. The purpose of standardization was never to make something that was easiset for software development, ignoring the other considerations. So a decision is not an example of design by committee gone wrong just because happened to be worse for software writers (e.g., choosing to make overflow undefined instead of implementation dependent). You would have to know why such a decision was made.
The general problem with this argument is that “do what the hardware does” is actually not easy to reason about. The end results of this typically are impossible to grok.
And one of the anythings permitted would be to behave in a documented manner characteristic of the target environment. The program is after all almost certainly being built to run on an actual machine; if you know what that actual machine does, it would sometimes be useful to be able to take advantage of that. We might not be able to demand this on the basis that the standard requires it, but as a quality of implementation issue I think it a reasonable request.
This is such an obvious thing to do that I'm surprised the C standard doesn't include wording along those lines to accommodate it. But I suppose even if it did, people would just ignore it.
The problem is that what the machine does isn't necessarily consistent. If you're using old-as-the-green-hills integer instructions then yes, the CPU supports unaligned access. If you want to benefit from the speedup afforded by the latest vector instructions, now it suddenly it doesn't.
Also, to be fair, GCC does appear to back off the optimisations when dealing with, for example, a struct with the packed attribute.
C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.
Intel added instructions that can’t handle unaligned access, so they broke that contract. I’d argue that it is an instruction set architecture bug.
Alternatively, Intel could argue that compilers shouldn’t emit vector instructions unless they can statically prove the pointer is aligned. That’s not feasible in general for languages like C/C++, so that’s a pretty weak defense of having the processor pay the overhead of supporting unaligned access on some, but not all, paths.
> C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.
There are a bunch of misconceptions here:
- unaligned loads were never implementation defined, they are undefined;
- even if they were implementation defined, this would give the compiler the choice of how to define them, not the instruction set;
- unaligned memory accesses on x86 for non-vector registers still work fine, so old instructions were not impacted and there's no bug. It's just that the expectations were not fulfilled for the new extension of those instructions.
Loads of architectures can't do misaligned memory access. Even x86 has problems when variables span cache lines. The compiler usually deals with this for the programmer, e.g. by rounding the address down then doing multiple operations and splicing the result together.
Unaligned memory accesses are undefined behavior in C. If you're writing C, you should be abiding by C rules. "Used to work correctly" is more guesswork and ignorance than "abiding by C rules". In C, playing fast&loose with definitions hurts, BAD.
Frankly, I'd be ashamed to write this blog post since the only thing it accomplishes is exposing its writers as not understanding the very thing they're signaling expertise on.
An example of a recent compile target that breaks on unaligned pointer accesses was asm.js. There, a 32-bit read turns into a read from a JavaScript Int32Array like this:
HEAP32[ptr >> 2]
The k-th index in the array contains 4 bytes of data, so the pointer to an address must be divided by 4, which is what the >> 2 does. And >> 2 will "break" unaligned pointers because it discards the low bits.
In practice we did run into codebases that broke because of this, but it was fairly rare. We built some tools (SAFE_HEAP) that helped find such issues. In the end it may have added some work to a small amount of ports, but very few I think.
asm.js has been superceded by WebAssembly, which allows unaligned accesses, so this is no longer a problem there.
They are confused, and seem not to realize that ABIs exist, and often specify alignment requirements. They seem to believe there are just ISA and architecture specs.
When you compile for Linux x86_64 ABI, gcc assumes that the stack is 16 byte aligned because it’s required by the ABI.
Regardless of whether the ISA needs it.
If they want the compiler to make no assumptions about aligned accesses, they would need to define an ABI in GCC that operates that way and compile.with it. They were historically supported (though its been years since I looked)
In a project I'm working on[0], there's an array object type used throughout, which can sometimes point to arbitrary data elsewhere. In a funky edge-case[1], such an array can be built with an unaligned data pointer.
Thus, if gcc/clang started seriously utilizing aligned pointer accesses everywhere, nearly every single load & store in the entire project would have to be replaced with something significantly more verbose. Maybe in a more fancy language you could have ptr<int> vs unaligned_ptr<int> or similar, but in C you kinda just have compiler flags, and maybe __attribute__-s if you can spare some verbosity.
C UB is often genuinely useful, but imo having an opt-out option for certain things is, regardless, a very reasonable request.
[1]: Any regularly allocated array has appropriate alignment. But there are some functions that take a slice of the array "virtually" (i.e. pointing to it instead of copying), and another one that bitwise-reinterprets an array to one with a different element type (again, operating virtually). This leads to a problem when e.g. taking i8 elements [3;7) and reinterpreting as an i32 array. A workaround would be to make the reinterpret copy memory if necessary (and this would have to be done if targeting something without unaligned load/store), but that'd ruin it being a nice O(1).
> This leads to a problem when e.g. taking i8 elements [3;7) and reinterpreting as an i32 array.
Even ignoring alignment issues, this is already UB because it violates the strict aliasing rule. You technically need to memcpy and hope that the compiler optimizes the memcpy out. In C++20 you can use std::bit_cast in some circumstances. https://en.cppreference.com/w/cpp/numeric/bit_cast. In C11 you can use a union, but that still requires a "copy" into the union.
I'm of course already using -fno-strict-aliasing (primarily because without it it's impossible to implement a custom memory allocator, but it also helps here).
As others have pointed out, GCC is completely allowed to do this because unaligned access is UB.
So the problem is not that GCC assumes your code has no UB.
The issue is that the C (and C++) specifications persist in this obnoxious and odious desire to label definable behaviour as UB, with no justification.
All of the arguments about needing UB to support different hardware fail immediately to the simple fact that the specification already has specific terms that would cover this: Implementation Defined Behavior, and Unspecified Behaviour. Using either of these instead of UB would support just as much hardware, without inflicting clearly anti-programmer optimizations on developers where the compiler is allowed to assume objectively false things about the hardware.
Undefined behaviour should be used solely for behavior that cannot be defined - for example using out of bounds, unallocated, or released memory cannot be defined because the C VM does not specify allocation, variable allocation, etc. Calling a function with a mismatched type signature is not definable as C does not specify the ABI. etc.
The big thing seems to be less about GCC, and more a question of, "what should a compiler be?"
He'd be better looking at smaller, less-known compilers, like the Portable C Compiler or the Intel C Compiler. If you want hyper-optimized, better-than-assembly quality, you pretty much have to give up predictability. The best optimizations that are predictable can't be written using modern compiler theory. They instead involve a lot of work, care, and attention that can't be generalized to other architectures. It can require a love for an architecture, even if's a crap one.
It's a tradeoff. Not every compiler needs to be optimized, and not every compiler needs to embody the spirit of a language.
> The C standards, having to accommodate both target architectures where misaligned accesses worked and target architectures where these violently interrupted the program, applied their universal solution: they classified misaligned access as an undefined behavior.
No. If the C standard wants to accommodate different target architectures, they use implementation-specified behavior. The undefined behavior is just polite way to say that the code is buggy.
The C standard just requires natural alignment even on architectures that allows unaligned accesses.
When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".
Trying to scope that down using arguments of the form "but what the hardware does is X" are fundamentally flawed, because anything means anything, and what the hardware does doesn't change that, and therefore it doesn't matter.
This blogpost "What The Hardware Does is not What Your Program Does" explains this in more detail and with more examples.
https://www.ralfj.de/blog/2019/07/14/uninit.html
I think it's also worth considering WHY compilers (and the C standard) make these kinds of assumptions. For starters, not all hardware platforms allow unaligned accesses at all. Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses. God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read.[^1] The fact that you need to go through escape hatches to get the compiler to generate code to do unaligned loads and stores is a good thing, because it helps prevent people from writing code with mysterious slowdowns.
Writing a function that takes two pointers of the same type already has to pessimize loads and stores on the assumption that the pointers could alias. That is to say, if your function takes int p, int q then doing a store to p requires reloading q, because p and q could point to the same thing. Thankfully in some situations the compiler can figure out that in a certain context p and q have different addresses and therefore can't alias, this helps the compiler generate faster code (by avoiding redundant loads). If p and q are allowed to alias even when they have different addresses, this would all go out the window and you'd basically need to assume that all pointer types could alias under any situation. This would be TERRIBLE for performance.
[^1]: https://rigtorp.se/split-locks/
This is generally not true.
Yeah and always everywhere a mistake. It was a mistake back in the 1970's and it's increasing bigger mistake as time goes on. Just like big endian and 'network order'
> For starters, not all hardware platforms allow unaligned accesses at all
If you're dealing with very simple CPUs like the ARM M0, sure. But even the M3/M4 allows unaligned access.
> Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses
I believe that information hasn't been true for a long time (since 1995). Unless you're talking about unaligned accesses that also cross a cache line boundary being slower [1]. But I imagine that aligned accesses crossing a cache line boundary are also similarly slower because the slowness is the cache line boundary.
> God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read
What you're referring to is atomic unaligned access that's also across cache line boundaries. I don't know what it is within a cache line, but I imagine it's not as bad as you make it out to be. Unaligned atomics across cache line boundaries also don't work on ARM and have much spottier support than unaligned access in general.
TLDR: People cargo cult advice about unaligned access but it's more because it's a simpler rule of thumb and there's typically very little benefit to pack things as tightly as possible which is where unaligned accesses generally come up.
[1] https://news.ycombinator.com/item?id=10529947
> The present blog post brings bad, and as far as I know, previously undocumented news. Even if you really are targeting an instruction set without any memory access instruction that requires alignment, GCC still applies some sophisticated optimizations that assume aligned pointers.
I could have told you this was true ~20 years ago, and the main reason I'm so conservative in how far back gcc has been doing this is that it's only around that time I started programming--I strongly suspect this dates back to the 90's.
Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".
The abstract machine idea was introduced much later.
> The arguments in this blogpost are fundamentally flawed.
The "fundamentally flawed" comment is revisionist idea.
1/ a way to emit specific assembly with a compiler dealing with register allocation and instruction selection
2/ an abstract machine specification that permits optimisations and also happens to lower well defined code to some architectures
My working theory is that the language standardisation effort invented the latter. So when people say C was always like this, they mean since ansi c89, and there was no language before that. And when people say C used to be typed/convenient assembly language, they're referring to the language that was called C that existed in reality prior to that standards document.
The WG14 mailing list was insistent (in correspondence to me) that C was always like this, some of whom were presumably around at the time. A partial counterargument is the semi-infamous message from Dennis Richie copied in various places, e.g. https://www.lysator.liu.se/c/dmr-on-noalias.html
An out of context quote from that email to encourage people to read said context and ideally reply here with more information on this historical assessment
"The fundamental problem is that it is not possible to write real programs using the X3J11 definition of C. The committee has created an unreal language that no one can or will actually use."
Regards
There was also a huge variety of compilers that were buggy and incomplete each in their own ways, often with mutually-incompatible extensions, not to mention prone to generating pretty awful code.
Of course you don’t get any of those pleasant optimizations either. But those optimizations are only possible because of the assumptions.
I first learned many years ago that you should pick apart binary data by casting structs, using pointers to the middle of fields and so on. It was ubiquitous for both speed and convenience. I don't know if it was legal even in the 90s, but it was general practice - MS Office file formats from that time were just dumped structs. Then at some point I learned about pointer alignment - but it was always framed due to performance, and due to the capabilities of exotic platforms, never as a correctness issue. But it's not just important to learn what to do, but also why to do it, which is why we need more articles highlighting these issues.
(And I have to admit, I am one of these misguided people who would love a flag to turn C into "portable assembler" again. Even if it is 10x slower, and even if I had to add annotations to every damn for loop to tell the compiler that I'm not overflowing. There are just cases where understanding what you are actually doing to the hardware trumps performance.)
There are a lot of things compilers do not optimize on even though they are technically illegal. As a result, people write code that relies on these kinds of manipulations. No, this is not your standard complaint about undefined behavior being the work of the devil, this is code that in certain places pushes the boundaries of what the compiler silently guarantees. The author’s job is to identify this, not what the standard says, because a tool that rejects any code that’s not entirely standards compliant is generally useless for any nontrivial codebase.
This view is alienating systems programmers. You're right that that's what the standard says, but nobody actually wants that except compiler writers trying to juice unrealistic benchmarks. In practice programmers want to alias things, they want to access unaligned memory, they want to cast objects right out of memory without constructing them, etc. And they have real reasons to do so! More narrowly defining how off the rails the compiler is allowed to go, rather than anything is a desirable objective for changing the standard.
"These people simply don't understand what C programmers want": https://groups.google.com/forum/#!msg/boring-crypto/48qa1kWi...
"please don't do this, you're not producing value": http://blog.metaobject.com/2014/04/cc-osmartass.html
"Everyone is fired": http://web.archive.org/web/20160309163927/http://robertoconc...
"No sane compiler writer would ever assume it allowed the compiler to 'do anything' with your code": http://web.archive.org/web/20180525172644/http://article.gma...
We need a C interpreter that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc. If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.
My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."
This is indeed a design mistake, but in another sense. Ordinary arithmetic ops like + or - should throw an exception on overflow (with both signed and unsigned operands) because most of the times you need an ordinary math, not math modulo 2^32. For those rare cases where wrap around is desired, there should be a function like add_and_wrap() or a special operator.
Making your program UBSan-clean is the bare minimum you should do if you're writing C or C++ in 2023, not an absurd goal. I know it'll never happen, but I'm increasingly of the opinion that UBSan should be enabled by default.
All C compilers implement the C abstract machine. It is not used to justify miscompiling code, it is used to specify behavior of compiled code.
> We need a C interpreter
Interpreter or not is not relevant, there must be some misconception. Any behavior you can implement with an interpreter can be implemented with compiled code. E.g., add a test and branch after each integer operation if you want to crash on overflow.
> that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc.
As others have mentioned there are static and dynamic checkers (sanitizers) that test for such things nowadays. In compiled, not interpreted code, mind you.
> If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.
It's not that bad.
> My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."
The spec uses implementation defined behavior for that. Although you can argue that they went the wrong way on some choices -- signed integer overflow "depends on the machine at hand" in the first K&R, which you could say would be reasonable to call it implementation specific and enumerate the behaviors of supported machines.
C had a long history with hardware manufacturers, compiler writers, and software developers though, so the standard can never universally please everybody. The purpose of standardization was never to make something that was easiset for software development, ignoring the other considerations. So a decision is not an example of design by committee gone wrong just because happened to be worse for software writers (e.g., choosing to make overflow undefined instead of implementation dependent). You would have to know why such a decision was made.
This is such an obvious thing to do that I'm surprised the C standard doesn't include wording along those lines to accommodate it. But I suppose even if it did, people would just ignore it.
Also, to be fair, GCC does appear to back off the optimisations when dealing with, for example, a struct with the packed attribute.
C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.
Intel added instructions that can’t handle unaligned access, so they broke that contract. I’d argue that it is an instruction set architecture bug.
Alternatively, Intel could argue that compilers shouldn’t emit vector instructions unless they can statically prove the pointer is aligned. That’s not feasible in general for languages like C/C++, so that’s a pretty weak defense of having the processor pay the overhead of supporting unaligned access on some, but not all, paths.
There are a bunch of misconceptions here:
- unaligned loads were never implementation defined, they are undefined;
- even if they were implementation defined, this would give the compiler the choice of how to define them, not the instruction set;
- unaligned memory accesses on x86 for non-vector registers still work fine, so old instructions were not impacted and there's no bug. It's just that the expectations were not fulfilled for the new extension of those instructions.
Frankly, I'd be ashamed to write this blog post since the only thing it accomplishes is exposing its writers as not understanding the very thing they're signaling expertise on.
Surely only after standardization tho?
"-Wcast-align=strict" will work in this but not all cases - that's why we have UBSAN:
HEAP32[ptr >> 2]
The k-th index in the array contains 4 bytes of data, so the pointer to an address must be divided by 4, which is what the >> 2 does. And >> 2 will "break" unaligned pointers because it discards the low bits.
In practice we did run into codebases that broke because of this, but it was fairly rare. We built some tools (SAFE_HEAP) that helped find such issues. In the end it may have added some work to a small amount of ports, but very few I think.
asm.js has been superceded by WebAssembly, which allows unaligned accesses, so this is no longer a problem there.
And I think we made the right call (other than the vestigial alignment bits in load/store immediates, which AFAIK, no engine is making use of).
When you compile for Linux x86_64 ABI, gcc assumes that the stack is 16 byte aligned because it’s required by the ABI.
Regardless of whether the ISA needs it.
If they want the compiler to make no assumptions about aligned accesses, they would need to define an ABI in GCC that operates that way and compile.with it. They were historically supported (though its been years since I looked)
Thus, if gcc/clang started seriously utilizing aligned pointer accesses everywhere, nearly every single load & store in the entire project would have to be replaced with something significantly more verbose. Maybe in a more fancy language you could have ptr<int> vs unaligned_ptr<int> or similar, but in C you kinda just have compiler flags, and maybe __attribute__-s if you can spare some verbosity.
C UB is often genuinely useful, but imo having an opt-out option for certain things is, regardless, a very reasonable request.
[0]: https://github.com/dzaima/CBQN
[1]: Any regularly allocated array has appropriate alignment. But there are some functions that take a slice of the array "virtually" (i.e. pointing to it instead of copying), and another one that bitwise-reinterprets an array to one with a different element type (again, operating virtually). This leads to a problem when e.g. taking i8 elements [3;7) and reinterpreting as an i32 array. A workaround would be to make the reinterpret copy memory if necessary (and this would have to be done if targeting something without unaligned load/store), but that'd ruin it being a nice O(1).
Even ignoring alignment issues, this is already UB because it violates the strict aliasing rule. You technically need to memcpy and hope that the compiler optimizes the memcpy out. In C++20 you can use std::bit_cast in some circumstances. https://en.cppreference.com/w/cpp/numeric/bit_cast. In C11 you can use a union, but that still requires a "copy" into the union.
Which you can use to wrap the unaligned type as a packed struct, i.e.
which has an alignment of 1.So the problem is not that GCC assumes your code has no UB.
The issue is that the C (and C++) specifications persist in this obnoxious and odious desire to label definable behaviour as UB, with no justification.
All of the arguments about needing UB to support different hardware fail immediately to the simple fact that the specification already has specific terms that would cover this: Implementation Defined Behavior, and Unspecified Behaviour. Using either of these instead of UB would support just as much hardware, without inflicting clearly anti-programmer optimizations on developers where the compiler is allowed to assume objectively false things about the hardware.
Undefined behaviour should be used solely for behavior that cannot be defined - for example using out of bounds, unallocated, or released memory cannot be defined because the C VM does not specify allocation, variable allocation, etc. Calling a function with a mismatched type signature is not definable as C does not specify the ABI. etc.
He'd be better looking at smaller, less-known compilers, like the Portable C Compiler or the Intel C Compiler. If you want hyper-optimized, better-than-assembly quality, you pretty much have to give up predictability. The best optimizations that are predictable can't be written using modern compiler theory. They instead involve a lot of work, care, and attention that can't be generalized to other architectures. It can require a love for an architecture, even if's a crap one.
It's a tradeoff. Not every compiler needs to be optimized, and not every compiler needs to embody the spirit of a language.
No. If the C standard wants to accommodate different target architectures, they use implementation-specified behavior. The undefined behavior is just polite way to say that the code is buggy.
The C standard just requires natural alignment even on architectures that allows unaligned accesses.