Undefined Behavior in C and C++ (2024)

Undefined behavior only means that ISO C doesn't give requirements, not that nobody gives requirements. Many useful extensions are instances where undefined behavior is documented by an implementation.

Including a header that is not in the program, and not in ISO C, is undefined behavior. So is calling a function that is not in ISO C and not in the program. (If the function is not anywhere, the program won't link. But if it is somewhere, then ISO C has nothing to say about its behavior.)

Correct, portable POSIX C programs have undefined behavior in ISO C; only if we interpret them via IEEE 1003 are they defined by that document.

If you invent a new platform with a C compiler, you can have it such that #include <windows.h> reformats all the attached storage devices. ISO C allows this because it doesn't specify what happens if #include <windows.h> successfully resolves to a file and includes its contents. Those contents could be anything, including some compile-time instruction to do harm.

Even if a compiler's documentationd doesn't grant that a certain instance of undefined behavior is a documented extension, the existence of a de facto extension can be inferred empirically through numerous experiments: compiling test code and reverse engineering the object code.

Moreover, the source code for a compiler may be available; the behavior of something can be inferred from studying the code. The code could change in the next version. But so could the documentation; documentation can take away a documented extension the same way as a compiler code change can take away a de facto extension.

Speaking of object code: if you follow a programming paradigm of verifying the object code, then undefined behavior becomes moot, to an extent. You don't trust the compiler anyway. If the machine code has the behavior which implements the requirements that your project expects of the source code, then the necessary thing has been somehow obtained.

throw-qqqqq · 4 months ago

> Undefined behavior only means that ISO C doesn't give requirements, not that nobody gives requirements. Many useful extensions are instances where undefined behavior is documented by an implementation.

True, most compilers have sane defaults in many cases for things that are technically undefined (like take sizeof(void) or do pointer arithmetic on something other than a char). But not all of these cases can be saved by sane defaults.

Undefined behavior means the compiler can replace the code with whatever. So if you e.g. compile optimizing for size, the compiler will rip out the offending code, as replacing it with nothing yields the greatest size optimization.

See also John Regehr's collection of UB-Canaries: https://github.com/regehr/ub-canaries

Snippets of software exhibiting undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc. UB should not be taken lightly IMO...

eru · 4 months ago

> [...] undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc.

Or replacing all you mp3s with a Rick Roll. Technically legal.

(Some old version of GHC had a hilarious bug where it would delete any source code with a compiler error in it. Something like this would technically legal for most compiler errors a C compiler could spot.)

pjmlp · 4 months ago

Unfortunely it also means that when the programmer fails to understand what undefined behaviour is exposed on their code, the compiler is free to take advantage of that to do the ultimate performance optimizations as means to beat compiler benchmarks.

The code change might come in something as innocent as a bug fix to the compiler.

account42 · 4 months ago

Ah yes, the good old "compiler writers only care about benchmarks and are out to hurt everyone else" nonsense.

I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

quietbritishjim · 4 months ago

> Including a header that is not in the program, and not in ISO C, is undefined behavior.

What is this supposed to mean? I can't think of any interpretation that makes sense.

I think ISO C defines the executable program to be something like the compiled translation units linked together. But header files do not have to have any particular correspondence to translation units. For example, a header might declare functions whose definitions are spread across multiple translation units, or define things that don't need any definitions in particular translation units (e.g. enum or struct definitions). It could even play macro tricks which means it declares or defines different things each time you include it.

Maybe you mean it's undefined behaviour to include a header file that declares functions that are not defined in any translation unit. I'm not sure even that is true, so long as you don't use those functions. It's definitely not true in C++, where it's only a problem (not sure if it's undefined exactly) if you ODR-rule use a function that has been declared but not defined anywhere. (Examples of ODR-rule use are calling or taking the address of the function, but not, for example, using sizeof on an expression that includes it.)

kazinator · 4 months ago

> I can't think of any interpretation that makes sense

Start with a concrete example. A header that is not in our program, or described in ISO C. How about:

  #include <winkle.h>

Defined behavior or not? How can an implementation respond to this #include while remaining conforming? What are the limits on that response?

> But header files do not have to have any particular correspondence to translation units.

A header inclusion is just a mechanism that brings preprocessor tokens into a translation unit. So, what does the standard tell us about the tokens coming from #include <winkle.h> into whatever translation unit we put it into?

Say we have a single file program and we made that the first line. Without that include, it's a standard-conforming Hello World.

gpderetta · 4 months ago

You are basically trying to explain the difference between a conforming program and a strictly conforming one.

We switched to Rust. Generally, are there specific domains or applications where C/C++ remain preferable? Many exist—but are there tasks Rust fundamentally cannot handle or is a weak choice?

pjmlp · 4 months ago

Yes, all the industries where C and C++ are the industry standards like Khronos APIs, POSIX, CUDA, DirectX, Metal, console devkits, LLVM and GCC implementation,....

Not only you are faced with creating your own wrappers, if no one else has done it already.

The tooling, for IDEs and graphical debuggers, assumes either C or C++, so it won't be there for Rust.

Ideally the day will come where those ecosystems might also embrace Rust, but that is still decades away maybe.

uecker · 4 months ago

Advantages of C are short compilation time, portability, long-term stability, widely available expertise and training materials, less complexity.

IMHO you can today deal with UB just fine in C if you want to by following best practices, and the reasons given when those are not followed would also rule out use of most other safer languages.

simonask · 4 months ago

This is a pet peeve, so forgive me: C is not portable in practice. Almost every C program and library that does anything interesting has to be manually ported to every platform.

C is portable in the least interesting way, namely that compilers exist for all architectures. But that's where it stops.

lifthrasiir · 4 months ago

> short compilation time

> IMHO you can today deal with UB just fine in C if you want to by following best practices

In the other words, short compilation time has been traded off with wetware brainwashing... well, adjustment time, which makes the supposed advantage much less desirable. It is still an advantage, I reckon though.

bluetomcat · 4 months ago

Rust encourages a rather different "high-level" programming style that doesn't suit the domains where C excels. Pattern matching, traits, annotations, generics and functional idioms make the language verbose and semantically-complex. When you follow their best practices, the code ends up more complex than it really needs to be.

C is a different kind of animal that encourages terseness and economy of expression. When you know what you are doing with C pointers, the compiler just doesn't get in the way.

eru · 4 months ago

Pattern matching should make the language less verbose, not more. (Similar for many of the other things you mentioned.)

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Alas, it doesn't get in the way of you shooting your own foot off, too.

Rust allows unsafe and other shenanigans, if you want that.

za_creature · 4 months ago

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Tell me you use -fno-strict-aliasing without telling me.

Fwiw, I agree with you and we're in good[citation needed] company: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg...

pizza234 · 4 months ago

Yes, based on a few attempts chronicled in articles from different sources, Rust is a weak choice for game development, because it's too time-consuming to refactor.

bakugo · 4 months ago

There's also the fact that a lot of patterns that are commonly used in game development are fundamentally at odds with the borrow checker.

Relevant: https://youtu.be/4t1K66dMhWk?si=dZL2DoVD94WMl4fI

Defletter · 4 months ago

Yup, this one (https://news.ycombinator.com/item?id=43824640) comes to mind. The first comment says "Another failed game project in Rust", hinting that this is very common.

ramon156 · 4 months ago

We've only had 6-7 years of hame dev in rust. Bevy is coming along nicely and will hopefully remove these pain points

mgaunard · 4 months ago

Rust forces you to code in the Rust way, while C or C++ let you do whatever you want.

nicoburns · 4 months ago

> C or C++ let you do whatever you want.

C and C++ force you to code in the C and C++ ways. It may that that's what you want, but they certainly dont let me code how I want to code!

mckravchyk · 4 months ago

If you wanted to develop a cross-platform native desktop / mobile app in one framework without bundling / using a web browser, only QT comes to mind, which is C++. I think there are some bindings though.

jandrewrogers · 4 months ago

An application domain where C++ is notably better is when the ownership and lifetimes of objects are not knowable at compile-time, only being resolvable at runtime. High-performance database kernels are a canonical example of code where this tends to be common.

Beyond that, recent C++ versions have much more expressive metaprogramming capability. The ability to do extensive codegen and code verification within C++ at compile-time reduces lines of code and increases safety in a significant way.

imadr · 4 months ago

I haven't used Rust extensively so I can't make any criticism besides that I find compilation times to be slower than C

ost-ing · 4 months ago

I find with C/++ I have to compile to find warnings and errors, while with Rust I get more information automatically due to the modern type and linking systems. As a result I compile Rust significantly less times which is a massive speed increase.

Rusts tooling is hands down better than C/++ which aids to a more streamlined and efficient development experience

kazinator · 4 months ago

The popular C compilers are seriously slow, too. Orders of magnitude compared to C compilers of yesteryear.

ykonstant · 4 months ago

I also hear that Async Rust is very bad. I have no idea; if anyone knows, how does async in Rust compare to async in C++?

teunispeters · 4 months ago

embedded hardware, any processor Rust doesn't support (there are many), and any place where code size is critical. Rust has a BIG base size for an application, uselessly so at this time. I'd also love to see if it offered anything that could be any use in those spaces - especially where no memory allocation takes place at all. C (and to a lesser extent C++) are both very good in those spaces.

steveklabnik · 4 months ago

You can absolutely make small rust programs, you just have to actually configure things the right way. Additionally, the Rust language doesn’t have allocation at all, it’s purely a library concern. If you don’t want heap allocations, then don’t include them. It works well.

The smallest binary rustc has produced is like ~145 bytes.

m-schuetz · 4 months ago

Prototyping in any domain. It's nice to do some quick&dirty way to rapidly evaluate ideas and solutions.

eru · 4 months ago

I don't think C nor C++ were ever great languages for prototyping? (And definitely not better than Rust.)

eru · 4 months ago

> Generally, are there specific domains or applications where C/C++ remain preferable?

Well, anything were your people have more experience in the other language or the libraries are a lot better.

mrheosuper · 4 months ago

Rust can do inline ASM, so finding a task Rust "fundamentally cannot handle" is almost impossible.

eru · 4 months ago

That's almost as vacuous as saying that Rust can implement universal Turing machines are that Rust can do FFI?