I was surprised to see `fmt/format.h` on that list, but I do have to admit that the objections seem reasonable. Perhaps because he(?) mentioned wanting to use -O0. Template code is almost useless without optimization. If -O0 is needed then I am surprised that all of the STL doesn't get pitched.
Ok, I was also surprised to see co-routines on the nice list, but I don't have direct experience there. I normally see complaints about them. I would like them to be good because some code is easier to express that way.
> I was surprised to see `fmt/format.h` on that list, but I do have to admit that the objections seem reasonable
The author talks about the code bloat, beacause of "an API that encourages custom formatter specification to live in a template". But at the end he mentions the standard solution to this problem:
> A preferable interface (I use, but also others AFAIK) is to check the type in a template (no choice there), and dispatch the formatting routine to somewhere that lives in a single translation unit.
So what prevents you from doing this with <format>? As I understand, the implementations of parse() and format() of std::formatter don't depend on the template parameters and can delegate to non-template functions residing in one CPP file. You can also provide additional wformat_parse_context/wformat_context overloads if you need wchar_t support.
I guess there’s some legitimate complaint about compile time, but the code bloat issue is simply crazy if you use a linker written since ~1995. And the “fix” is simply to move the common code into a .cc file which the author even mentions.
The alternatives are worse: un-type-checked printf or the horrible stream interpreter system (std::cout << “foo”) which was a cute but bad idea in 1985.
The author talked about its effect on compile-times and I have to say, I agree with him. It's also why I dislike header-only libraries, they also bloat the compile times unnecessarily.
Don't get me wrong, the fmt library is very nice, but you can't deny its effect on compile times.
Wondering the same. Seems like you could provide your own implementation or use a third party implementation. Be curious to see a write up on the bloat, what exactly it looks like
{fmt} doesn't encourage "custom formatter specification to live in a template". On the contrary, if you look at the docs in https://fmt.dev/latest/api.html#formatting-user-defined-type..., none of the examples is parameterized. One even demonstrates how to define your formatting code in a source file. And if your formatters are so big that they meaningfully impact build speed you are doing something wrong. fmt/core.h is heavily optimized for build speed so you can just use it as a type-safe replacement for *printf. That said, implementations of std::format (especially Microsoft's) may not be as optimized for build speed yet. This will likely improve now that the ABI can be stabilized.
Unfortunately, if you're using the standard library, you get this just by switching to the C++20 mode. For example, the committee decided to put tons of std::ranges-related stuff right in <algorithm>.
It isn’t just a nicer API but type safe and much faster at runtime.
Since I rarely compile all my code at once (usually just a single file followed by a re-link) compile time doesn’t matter much. And that’s even though while editing or writing code I don’t have all the slowdown bloat of an IDE so compile time is more noticeable.
it's partly because of the engine code. there's even bigger stuff, especially especially if it's a company with any legacy codebase that's 10-20 years old or whatever (e.g. EA / Frostbite.) one i worked on took hours to compile the first time on a machine with 128gb of ram and a threadripper. the onboarding doc suggests getting some coffee at that point haha
a big part of working on them as a generalist ends up being the ability to know how to even navigate something like that (especially since they're often haphazardly documented)
(part of it is that most of the games "fork" the engine rather than using it as a standalone thing)
it's probably not everyone on the team building that whole thing each time, but yea. hundreds of solutions and millions of LOC isn't unusual
*i just did a quick check with unreal's source, it's ~20 million LoC (assuming I didn't mess up the filtering somehow)
The main time killer in day-to-day work on such big projects is usually the linker step, which is terribly slow with the MSVC linker and doesn't benefit from incremental or distributed compilation (not sure though how much the MSVC linker has improved in the last 5 years or so).
It's a lot more common than you'd think. I'm not in gamedev but similarly weird (multiple supported userspaces and OS's for embedded an device line) our "full build" is probably approaching 50M+ lines and only quite recently do people do incrementals from a build server snapshots. No bazel or distributed ccache or anything.
Especially for games or OS development, you might have shifting toolchains and SDks. Different teams may move out of sync because different teams want different things at a given time.
We've been around 1mloc for the Drakensang single player and MMO games, and that was 15..5 years ago, with a relatively small team (up to 20 programmers), and budget-wise far away from what's considered an AAA production.
While I wouldn't want to work on a 10mloc C++ code base either, it sounds totally realistic to me.
most of the builds are incremental, but even incremental builds still take awhile when it's that massive (linking, modifying a header / template code, etc)
True, but linker times still suck and don't parallelize well.
Also, sometimes you need to iterate on some very core .h file and touching any of those brings the whole house of cards down and triggers a full or nearly full rebuild.
I'm a junior in uni, and I hate it when I say "Yeah we learned this technique in the C class, but it's UB in C++ so please rewrite that" in reviewing friends' codes that do type-punning with unions.
So I'm also very happy with the 'std::bit_cast' in general.
BTW how about std::is_constant_evaluated()?
I assumed it would help folks who do heavy physics simulations, but looks like not listed in the article.
TBF, I have yet to see a C++ compiler where the union type punning trick doesn't work, there would be a lot of broken code if real-world compilers would change the current behaviour no matter what the standard says.
Of course now that std::bit_cast exists it's the safe thing to do (but then there's still C code that's compiled in C++ mode which was even recommended by Microsoft because the Visual Studio team couldn't be bothered to keep their C compiler in shape until a little while ago).
> I have yet to see a C++ compiler where the union type punning trick doesn't work
The problem isn’t that compilers won’t implement the feature (that would take more work); the problem is that it’s processor-specific.
The spec doesn’t mandate many specific bit-ordering layouts (some are, such two’s complement representation being mandated which was just added, &obj == &base, I think nullptr has to be 0, etc) rather than trying to make everything a PDP-11.
where the order is not related to the order the fields appear in the struct (that has to be maintained for ABI reasons), and not all fields need to appear (the others are initialized with 0/NULL).
...all those limitations taken together, and the C++ designated initialization feature is pretty much useless except for the most trivial structs - while in C99, designated initialization really shines with complex, nested structs.
The funny thing is that none of those limitations would be required. Clang had supported full C99 designated init in C++ mode just fine for many years before C++20 appeared.
Oh man, I've desperately wanted both of these features in basically every language I've used.
I think you can do a similar sort of array initialization in C#, but definitely not chained initializers. Those are both so useful, but I can see why they aren't included in C++/++
C++ isn’t C and has different structure semantics. Members are initialized in the order defined, which means you can write
struct foo {
int a = 0;
int b = a+1;
}
If the compiler just did the initialization in the order of declaration, regardless of the order in the initialization list this would not do what you expect:
struct obj {
int a;
int b;
}
int ival = 0;
auto o = obj {.b = ++ival, .a = ival};
o.a would not equal o.b.
I would like to have the initialization syntax of C because then one could reorder elements (say for packing reasons) and the designated initialization would “just work”…except it wouldn’t.
C++ designated initialization does buy you two things: 1- documentation, but more importantly 2- if you do reorder a struct or class data members the compiler will warn you that your initialization lists are now invalid rather than silently failing. I don’t know how to even find them all in a large code base any other way!
I think an exception might be made for a plain "C-like" struct that doesn't initialize members or contain anything except basic types. In the specific example[0] the code is actually surrounded by extern "C" { ... } so I suppose that the compiler "knows" this is a plain C struct? (Does extern "C" change parsing rules? I will need to look at what GCC does)
Destructors will execute in the reverse of declaration order, so if initialization order doesnt match declaration order, and some members depend on each other somehow, things will break. At the very least, it could be surprising. Not a problem in C where destructors don't exist.
I think, as usual this was the compromise that the committee was able to agree on above all objections. There is stills the possibility that the rules are relaxed if there is agreement. But somebody has to do the work to push it through standardization.
I also thought that the behaviour as standardized was useless, but recently I started writing more minimalist code eschewing constructors where aggregate initialisation would suffice, and I haven't really missed the ability to reorder initializers or skip them.
Initialization in C++ is already a mess. Making one of the core behaviours (members are initialized in declaration order) work subtly different for this case would make it even more difficult for the programmer to build a correct mental model.
From what I can tell, the snippet you posted would compile fine in C++20 mode.
> Personally, I find code that leverages ranges harder to read, not easier, because lambdas inlined in functions introduce new scopes that have a strong non-linearizing effect on the code. This isn’t a criticism of ranges per se, but certainly is a stylistic preference.
Does anyone know what “non-linearizing” means here?
I assume “code outside the lambda runs first, then code inside the lambda maybe runs later, maybe runs multiple times, maybe doesn’t run at all”.
It can especially create problems when the lambda captures a variable by reference which gets mutated and/or deallocated before the lambda runs, and the developer didn’t plan for mutation or deallocation.
Or (a problem with lambdas, but not “non-linearizing”), if the lambda captures a variable by value (copies the value) and mutates it, and the developer expected the mutation to persist outside the lambda.
This was my first encounter with the three-way comparison operator (<=>). Can someone give a practical use case? There must be one for it to be included in the spec, but I'm not seeing it.
But the sort answer is all the other operators are automatically generated from that one if it is defined. So it makes the code simpler. And for many types <=> isn't much more complicated than the others
> Signed overflow/underflow remain UB (and it’s understandable that changing this behavior would have dramatic consequences)
I think that the dramatic consequences are only understandable if you succumb to mimetic contagion.
The consequences are real but not dramatic and possibly not even measurable in many workloads.
It just means that you’ll have an extra sign extension (one of the cheapest ops the CPU has) in a subset of your loops, namely the ones that had a 32 bit signed induction variable and the compiler could reason about that variable but only if it also could assume no wrapping. That’s a lot of caveats.
Most loops will be unaffected by making signed integer overflow defined. Anything that’s not in a loop will almost certainly be unaffected by this change. If you use size_t as your indices then you’ll definitely be unaffected.
So yeah. “Dramatic consequences”. I wish folks stopped exaggerating. There’s nothing dramatic here. It’s a fraction of a percent of perf maybe.
> a 32 bit signed induction variable and the compiler could reason about that variable but only if it also could assume no wrapping.
(Amateur C programmer silly question) I think I understand it as if we increment the variable (i+10) and use it in an if condition. With UB the compiler could skip that code altogether and assume it will never be reached?
The compiler has to assume that I+10 won’t overflow by virtue of I never being big enough. So, it’ll emit all of the code and UB won’t come into play.
It’s more like this. If you say A[I] where I is 32 bit signed and you’re on a 64 bit target, then this lowers to:
- sign extend I to get a 64-bit value
- multiply it by the size of A’s element type
- add that to A
- then do the access
The last three steps will be just one instruction in the common case on arm and x86. The first step will require a separate instruction on x86.
The compiler can kill the sign extend if it’s sure that the integer value cannot be negative. That’s hard to prove. But you can almost prove it if you see code like:
for (int i = 0; something; ++i)
It looks like i starts out as zero and only grows! So it has to be positive! So if you say A[i] then no sign extend needed!
But wait, what if ++i overflows?
With signed int UB, the compiler can just assume it won’t overflow. And then it can prove that i is nonnegative. And then it can kill the sign extend on those CPUs where it’s not free, like x86.
I’m a compiler writer. I know how valuable this optimization is. Namely, it’s the tiniest of benefits on some program/CPU combos. Modern languages like Java or Swift just give this well defined semantics and call it a day because this isn’t a good hill to die on. Fucking up the language isn’t worth 0.3% on some stupid benchmark, period.
Is it just me, or the worst part of coroutines is lack of tooling around them? Whenever I get a crash in a coroutine, the "stacktrace" is totally useless and doesn't actually show where the crash happened, just some boiler plate code around executing some continuation which doesn't refer to real code that you wrote.
more or less agree, although this issue isn't even really unique to C++. in practice it's still worth it imo, since debugging callback heavy stuff isn't exactly fun either
You'd still have lines referring to a callback you've writtem. I found it that with coroutines, not even a single stackframe refers to my code , other than the one starting the loop.
Ok, I was also surprised to see co-routines on the nice list, but I don't have direct experience there. I normally see complaints about them. I would like them to be good because some code is easier to express that way.
The author talks about the code bloat, beacause of "an API that encourages custom formatter specification to live in a template". But at the end he mentions the standard solution to this problem:
> A preferable interface (I use, but also others AFAIK) is to check the type in a template (no choice there), and dispatch the formatting routine to somewhere that lives in a single translation unit.
So what prevents you from doing this with <format>? As I understand, the implementations of parse() and format() of std::formatter don't depend on the template parameters and can delegate to non-template functions residing in one CPP file. You can also provide additional wformat_parse_context/wformat_context overloads if you need wchar_t support.
The alternatives are worse: un-type-checked printf or the horrible stream interpreter system (std::cout << “foo”) which was a cute but bad idea in 1985.
Don't get me wrong, the fmt library is very nice, but you can't deny its effect on compile times.
In my corner of the C++ world though, I am so, so excited for <format> in 6 years or however long it will take us to move to C++20.
https://www.reddit.com/r/cpp/comments/o94gvz/what_happened_w...
Since I rarely compile all my code at once (usually just a single file followed by a re-link) compile time doesn’t matter much. And that’s even though while editing or writing code I don’t have all the slowdown bloat of an IDE so compile time is more noticeable.
Is that really 'an average' for modern AAA game?
Damn. That's an order of magnitude bugger then I'd imagine
Just C++ 11,375,669
Total (of everything) 31,379,114
That’s fairly representative of just the tooling side of things for a AAA engine. That’s not counting the logic of the game itself.
a big part of working on them as a generalist ends up being the ability to know how to even navigate something like that (especially since they're often haphazardly documented)
(part of it is that most of the games "fork" the engine rather than using it as a standalone thing)
it's probably not everyone on the team building that whole thing each time, but yea. hundreds of solutions and millions of LOC isn't unusual
*i just did a quick check with unreal's source, it's ~20 million LoC (assuming I didn't mess up the filtering somehow)
Especially for games or OS development, you might have shifting toolchains and SDks. Different teams may move out of sync because different teams want different things at a given time.
While I wouldn't want to work on a 10mloc C++ code base either, it sounds totally realistic to me.
Deleted Comment
Also, sometimes you need to iterate on some very core .h file and touching any of those brings the whole house of cards down and triggers a full or nearly full rebuild.
BTW how about std::is_constant_evaluated()? I assumed it would help folks who do heavy physics simulations, but looks like not listed in the article.
Of course now that std::bit_cast exists it's the safe thing to do (but then there's still C code that's compiled in C++ mode which was even recommended by Microsoft because the Visual Studio team couldn't be bothered to keep their C compiler in shape until a little while ago).
GCC maintainers: Hold my Jolt
The problem isn’t that compilers won’t implement the feature (that would take more work); the problem is that it’s processor-specific.
The spec doesn’t mandate many specific bit-ordering layouts (some are, such two’s complement representation being mandated which was just added, &obj == &base, I think nullptr has to be 0, etc) rather than trying to make everything a PDP-11.
I hate this about C++! In C you can initialize them in any order, and this allowed us to write nbdkit plugins in a very natural way:
where the order is not related to the order the fields appear in the struct (that has to be maintained for ABI reasons), and not all fields need to appear (the others are initialized with 0/NULL).For C++ we have to do this mess:
https://bugzilla.redhat.com/show_bug.cgi?id=1418328#c3
Anyway my question is .. why is this, C++ people?
...no chaining of designated initializers:
...and no array indexing: ...all those limitations taken together, and the C++ designated initialization feature is pretty much useless except for the most trivial structs - while in C99, designated initialization really shines with complex, nested structs.The funny thing is that none of those limitations would be required. Clang had supported full C99 designated init in C++ mode just fine for many years before C++20 appeared.
We find it pretty useful even with the limitations. Certainly not “pretty much useless.”
I think you can do a similar sort of array initialization in C#, but definitely not chained initializers. Those are both so useful, but I can see why they aren't included in C++/++
I would like to have the initialization syntax of C because then one could reorder elements (say for packing reasons) and the designated initialization would “just work”…except it wouldn’t.
C++ designated initialization does buy you two things: 1- documentation, but more importantly 2- if you do reorder a struct or class data members the compiler will warn you that your initialization lists are now invalid rather than silently failing. I don’t know how to even find them all in a large code base any other way!
[0] https://gitlab.com/nbdkit/nbdkit/-/blob/cd761c9bf770b23f678f...
I also thought that the behaviour as standardized was useless, but recently I started writing more minimalist code eschewing constructors where aggregate initialisation would suffice, and I haven't really missed the ability to reorder initializers or skip them.
From what I can tell, the snippet you posted would compile fine in C++20 mode.
This would still be a lot better for 99% of real world use cases than requiring the programmer to manually place the items in declaration order.
The problem is, someone thought this was a good idea, but the act of supporting it ruled out a lot of more-useful future improvements.
Deleted Comment
> Personally, I find code that leverages ranges harder to read, not easier, because lambdas inlined in functions introduce new scopes that have a strong non-linearizing effect on the code. This isn’t a criticism of ranges per se, but certainly is a stylistic preference.
Does anyone know what “non-linearizing” means here?
It can especially create problems when the lambda captures a variable by reference which gets mutated and/or deallocated before the lambda runs, and the developer didn’t plan for mutation or deallocation.
Or (a problem with lambdas, but not “non-linearizing”), if the lambda captures a variable by value (copies the value) and mutates it, and the developer expected the mutation to persist outside the lambda.
But the sort answer is all the other operators are automatically generated from that one if it is defined. So it makes the code simpler. And for many types <=> isn't much more complicated than the others
I think that the dramatic consequences are only understandable if you succumb to mimetic contagion.
The consequences are real but not dramatic and possibly not even measurable in many workloads.
It just means that you’ll have an extra sign extension (one of the cheapest ops the CPU has) in a subset of your loops, namely the ones that had a 32 bit signed induction variable and the compiler could reason about that variable but only if it also could assume no wrapping. That’s a lot of caveats.
Most loops will be unaffected by making signed integer overflow defined. Anything that’s not in a loop will almost certainly be unaffected by this change. If you use size_t as your indices then you’ll definitely be unaffected.
So yeah. “Dramatic consequences”. I wish folks stopped exaggerating. There’s nothing dramatic here. It’s a fraction of a percent of perf maybe.
(Amateur C programmer silly question) I think I understand it as if we increment the variable (i+10) and use it in an if condition. With UB the compiler could skip that code altogether and assume it will never be reached?
It’s more like this. If you say A[I] where I is 32 bit signed and you’re on a 64 bit target, then this lowers to:
- sign extend I to get a 64-bit value
- multiply it by the size of A’s element type
- add that to A
- then do the access
The last three steps will be just one instruction in the common case on arm and x86. The first step will require a separate instruction on x86.
The compiler can kill the sign extend if it’s sure that the integer value cannot be negative. That’s hard to prove. But you can almost prove it if you see code like:
for (int i = 0; something; ++i)
It looks like i starts out as zero and only grows! So it has to be positive! So if you say A[i] then no sign extend needed!
But wait, what if ++i overflows?
With signed int UB, the compiler can just assume it won’t overflow. And then it can prove that i is nonnegative. And then it can kill the sign extend on those CPUs where it’s not free, like x86.
I’m a compiler writer. I know how valuable this optimization is. Namely, it’s the tiniest of benefits on some program/CPU combos. Modern languages like Java or Swift just give this well defined semantics and call it a day because this isn’t a good hill to die on. Fucking up the language isn’t worth 0.3% on some stupid benchmark, period.
Deleted Comment