C++'s `noexcept` can sometimes help or hurt performance

Oh man, don't get me started. This was a point in a talk I gave years ago called "Please Please Help the Compiler" (what I thought was a clever cut at the conventional wisdom at the time of "Don't Try to Help the Compiler")

I work on MSVC backend. I argued pretty strenuously at the time that noexcept was costly and being marketed incorrectly. Perhaps the costs are worth it, but none the less there is a cost

The reason is simple: there is a guarantee here that noexcept functions don't throw. std::terminate has to be called. That has to be implemented. There is some cost to that - conceptually every noexcept function (or worse, every call to a noexcept function) is surrounded by a giant try/catch(...) block.

Yes there are optimizations here. But it's still not free

Less obvious; how does inlining work? What happens if you inline a noexcept function into a function that allows exceptions? Do we now have "regions" of noexceptness inside that function (answer: yes). How do you implement that? Again, this is implementable, but this is even harder than the whole function case, and a naive/early implementation might prohibit inlining across degrees of noexcept-ness to be correct/as-if. And guess what, this is what early versions of MSVC did, and this was our biggest problem: a problem which grew release after release as noexcept permeated the standard library.

Anyway. My point is, we need more backend compiler engineers on WG21 and not just front end, library, and language lawyer guys.

I argued then that if instead noexcept violations were undefined, we could ignore all this, and instead just treat it as the pure optimization it was being marketed as (ie, help prove a region can't throw, so we can elide entire try/catch blocks etc). The reaction to my suggestion was not positive.

terrymah · a year ago

Oh, cool! I googled myself and someone actually archived the slides from the talk I gave. I think it holds up pretty well today

https://github.com/TriangleCppDevelopersGroup/TerryMahaffeyC...

*edit except the stuff about fastlink

*edit 2 also I have since added a heuristic bonus for the "inline" keyword because I could no longer stand the irony of "inline" not having anything to do with inlining

*edit 3 ok, also statements like "consider doing X if you have no security exposure" haven't held up well

jahnu · a year ago

Props for the edits ;)

I would be very interested in an updated blog post on this if you felt so inclined!

pjmlp · a year ago

> Anyway. My point is, we need more backend compiler engineers on WG21 and not just front end, library, and language lawyer guys.

Even better, the current way of working is broken, WG21 should only discuss papers that come with a preview implementation, just like in other language ecosystems.

We have had too many features being approved with "on-paper only" designs, to be proven a bad idea when they finally got implemented, some of which removed/changed in later ISO revisions, that already prove the point this isn't working.

aw1621107 · a year ago

> I argued then that if instead noexcept violations were undefined, we could ignore all this, and instead just treat it as the pure optimization it was being marketed as (ie, help prove a region can't throw, so we can elide entire try/catch blocks etc).

Do you know if the reasoning for originally switching noexcept violations from UB to calling std::terminate was documented anywhere? The corresponding meeting minutes [0] describes the vote to change the behavior but not the reason(s). There's this bit, though:

> [Adamczyk] added that there was strong consensus that this approach did not add call overhead in quality exception handling implementations, and did not restrict optimization unnecessarily.

Did that view not pan out since that meeting?

[0]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n30...

terrymah · a year ago

I think WG21 has been violently against adding additional UB to the language, because of some hacker news articles a decade ago about people being alarmed at null pointer checks being elided or things happening that didn’t match their expectation in signed int overflow or whatever. Generally it seems a view of spread that compiler implementers view undefined behavior as a license to party, that we’re generally having too much fun, and are not to be trusted.

In reality undefined behavior is useful in the sense that (like this case) it allows us to not have to write code to consider and handle certain situations - code which may make all situations slower, or allows certain optimizations to exist which work 99% of the time.

Regarding “not pan out”: I think the overhead of noexcept for the single function call case is fine, and inlining is and has always been the issue.

flamedoge · a year ago

> did not restrict optimization unnecessarily.

well clearly there is a cost

formerly_proven · a year ago

It's kinda funny that C++ even in recent editions generally reaches for the UB gun to enable optimizations, but somehow noexcept ended up to mean "well actually, try/catch std::terminate". I bet most C++-damaged people would expect throwing in a noexcept function to simply be UB and potentially blow their heap off or something instead of being neatly defined behavior with invisible overhead.

cogman10 · a year ago

Probably the right thing for noexcept would be to enforce a "noexcept may only call noexcept methods", but that ship has sailed. I also understand that it would necessarily create the red/green method problem, but that's sort of unavoidable.

shrimp_emoji · a year ago

Unless you're C++-damaged enough to assume it's one of those bullshit gaslighting "it might actually not do anything lol" premature optimization keywords, like `constexpr`.

tsimionescu · a year ago

So instead of helping programmers actually write noexcept functions, you wanted to make this an even bigger footgun than it already is? How often are there try/catch blocks that are actually elideable in real-world code? How much performance would actually be gained by doing that, versus the cost of all of the security issues that this feature would introduce?

If the compiler actually checked that noexcept code can't throw exceptions (i.e. noexcept functions were only allowed to call other noexcept functions), and the only way to get exceptions in noexcept functions was calls to C code which then calls other C++ code that throws, then I would actually agree with you that this would have been OK as UB (since anyway there are no guarantees that even perfectly written C code that gets an exception wouldn't leave your system in a bad state). But with a feature that already relies on programmer care, and can break at every upgrade of a third party library, making this UB seems far too dangerous for far too little gain.

rockwotj · a year ago

Added to my list why I compile with -fno-exceptions

jcelerier · a year ago

-fno-exceptions only prevents you from calling throw. If you don't want overhead likely you want -fno-asynchronous-unwind-tables + that clang flag that specifies that extern "C" functions don't throw

mcdeltat · a year ago

> there is a guarantee here that noexcept functions don't throw. std::terminate has to be called. That has to be implemented

Could you elaborate on how this causes more overhead than without noexcept? The fact that something has to be done when throwing an exception is true in both cases, right?. Naively it'd seem like without noexcept, you raise the exception; and with noexcept, you call std::terminate instead. Presumably the compiler is already moving your exception throwing instructions off the happy hot path.

Very very basic test with Clang: https://godbolt.org/z/6aqWWz4Pe Looks like both variations have similar code structure, with 1 extra instruction for noexcept.

hifromwork · a year ago

Pick a different architecture - anything 32bit. Exception handling on 64bit windows works differently, where the overhead is in the PE headers instead of asm directly (and is in general lower). You don't have the setup and teardown in your example

Throwing exception has the same overhead in both cases. In case of noexcept function, the function has to (or used to have, depending on architecture setup an exception handling frame and remove it when leaving.

>Naively it'd seem like without noexcept, you raise the exception; and with noexcept, you call std::terminate instead

Except you may call a normal function from a noexcept function, and this function may still raise an exception.

bregma · a year ago

If you're on one of the platforms with sane exception handling, it's a matter of emitting different assembly code for the landing pad so that when unwinding it calls std::terminate instead of running destructors for the local scope. Zero additional overhead. If you're on old 32-bit Microsoft Windows using MSVC 6 or something, well, you might have problems. One of the lesser ones being increased overhead for noexcept.

denotational · a year ago

I’m curious: where does the overhead of try/catch come from in a “zero-overhead” implementation?

Is it just that it forces the stack to be “sufficiently unwindable” in a way that might make it hard to apply optimisations that significantly alter the structure of the CFG? I could see inlining and TCO being tricky perhaps?

Or does Windows use a different implementation? Not sure if it uses the Itanium ABI or something else.

terrymah · a year ago

Everyone keeps scanning over the inlining issues, which I think are much larger

“Zero overhead” refers to the actual functions code gen; there are still tables and stuff that have to be updated

Our implementation of noexcept for the single function case I think is fine now. There is a single extra bit in the exception function info which is checked by the unwinder. Other than requiring exception info in cases where we otherwise wouldn’t

The inlining case has always been both more complicated and more of a problem. If your language feature inhibits inlining in any situation you have a real problem

immibis · a year ago

Doesn't every function already need exception unwinding metadata? If the function is marked noexcept, then can't you write the logical equivalent of "Unwinding instructions: Don't." and the exception dispatcher can call std::terminate when it sees that?

dataflow · a year ago

I assume the /EHr- flag was introduced to mitigate this, right?

terrymah · a year ago

Nah that was mostly about extern "C" functions which technically can't throw (so the noexcept runtime stuff would be optimized out) but in practice there is a ton of code marked extern "C" which throws

> I didn't know std::uniform_int_distribution doesn't actually produce the same results on different compilers

I think this is genuinely my biggest complaint about the C++ standard library. There are countless scenarios where you want deterministic random numbers (for testing if nothing else), so std's distributions are unusable. Fortunately you can just plug in Boost's implementation.

nwallin · a year ago

It's actually really important that uniform_int_distribution is implementation defined. The 'right' way to do it on one architecture is probably not the right way to do it on a different architecture.

For instance, Apple's new CPUs has very fast division. A convenient and useful tool to implement uniform_int_distribution relies on using modulo. So the implementation that runs on Apple's new CPUs ought to use the modulo instructions of the CPU.

On other architectures, the ISA might not even have a modulo instruction. In this case, it's very important that you don't try to emulate modulo in software; it's much better to rely other more complicated constructs to give a uniform distribution.

C++ is also expected to run on GPUs. NVIDIA's CUDA and AMD's HIP are both implementations of C++. (these implementations are non-compliant given the nature of GPUs, but both they and the C++ standard's committee have a shared goal of narrowing that gap) In general, std::uniform_int_distribution uses loops to eliminate redundancies; the 'happy path' has relatively easily predicted branches, but they can and do have instances where the branch is not easily predicted and will as often as not have to loop in order to complete. Doing this on a GPU might be multiple orders of magnitude slower than another method that's better suited for a GPU.

Overzealously dictating an implementation is why C++ ended up with a relatively bad hash table and very bad regex in the standard. It's a mistake that shouldn't be made again.

lifthrasiir · a year ago

But reproducibility is as important as performance for the vast majority of use cases, if these implementation-defined bits start to affect the observable outcomes. (That's why we define the required time complexity for many container-related functions but do not actually specify the exact algorithm; difference in Big-O time complexity is just large enough to be "observed".)

A common solution is to provide two versions of such features, one for the less reproducible but maximally performant version and another for common middle grounds that can be reproduced reasonably efficiently across many common platforms. In fact I believe `std::chrono` was designed in that way to sidestep many uncertainties in platform clock implementations.

aw1621107 · a year ago

> Overzealously dictating an implementation is why C++ ended up with a relatively bad hash table and very bad regex in the standard.

What parts of the standard dictate a particular regex implementation? IIRC the performance issues are usually blamed on ABI compatibility constraints rather than the standard making a fast(er) implementation impossible.

myworkinisgood · a year ago

Nobody is using standard library for high-performant random number implementations.

quotemstr · a year ago

> I think this is genuinely my biggest complaint about the C++ standard library

What do you think of Abseil hash tables randomizing themselves (piggybacking on ASLR) on each start of your program?

slaymaker1907 · a year ago

Their justification is here https://github.com/abseil/abseil-cpp/issues/720

However, I personally disagree with them since I think it's really important to have _some_ basic reproducibility for things like reproducing the results of a randomized test. In that case, I'm going to avoid changing as much as possible anyways.

chipdart · a year ago

> There are countless scenarios where you want deterministic random numbers (for testing if nothing else), so std's distributions are unusable. Fortunately you can just plug in Boost's implementation.

I don't understand what's your complain. If you're already plugging in alternative implementations,what stops you from actually stubbing these random number generators with any realization at all?

akira2501 · a year ago

It's a compromised and goofy implementation with lots of warts. What's the point it in having a /standard/ library then?