Oh, wow, I'm sorry! It was new to me, and it didn't flag it for me as a duplicate.
It is so neat to have this smorgasbord without needing to install emulators or containers with toolchains, or worse, acquire all of the relevant hardware as in the old old days.
I post those links because many users like to read older threads. Still looking for brief unambiguous wording to make that clear... "if curious see also" is still leading to some misunderstandings.
As a compiler writer, I find this useful when I have an "oh crap" moment and need to find a simple C program to generate specific IR, or I need to find the IR of a simple C program. Despite literally having a built tip-of-trunk compiler that I'm working on. It's that convenient.
Godbolt links are always great for resolving "well actually" arguments about how smart compilers are (which, as you can guess, are quite common on Hacker News). No more "the compiler will do this"–if you're going to claim that, you better have a Godbolt to back it up!
Something that surprised me: Compilers happily optimize pointer to member function calls out if it knows definitely what the pointer points to, but not if the pointer is to a virtual function. Recently figured that out thanks to a slightly related debate on whether it would optimize the first case at all - which I thought it would - but I’m surprised by the second case, since it’s not like there’s a virtual dispatch to worry about once you’ve already taken a reference - you’re already specifying exactly which one implementation you want. Just one of many nuggets learned from Godbolt: apparently marking a function virtual can kill optimizations even in cases where it seems like it shouldn’t.
Where's your point of confusion? There's still virtual dispatch going on with a pointer to virtual member function... they're typically implemented as fat pointers containing the vtable offset, and calling via the pointer still does virtual dispatch.
An alternative would be for the compiler to generate a stub function (can be shared across all classes for cases where the same vtable offset is used, unless your architecture has some very exotic calling convention) that jumps to a fixed offset in the vitable, and use a skinny pointer to that stub as a virtual member function pointer, but that's not the usual implementation.
In any case, calling through a pointer to virtual member function has to perform virtual dispatch unless the compiler can statically constrain the runtime type of *this being used at the call site. Remember that C++ needs to allow separate compilation, so without a full-program optimizer that breaks dlopen(), C++ can't perform devirtualization on non-final classes.
Making the class final will allow the compiler to statically constrain the runtime type and devirtualize the pointer to member function call.
Godbolt is great for comparing compiler versions (and compilers).
For example, you can see gcc's progression of efficiency for c atomics with https://godbolt.org/z/brsoEr. If you increment the gcc version number, you will see the (very slow) mfence disappear, and xchg show up.
Then there is Clang at O3: If an int falls in the forest, and there is no one around, was it ever incremented? No. The function turns into a bare ret.
Compilers, versions and also switches and instruction sets. I think the Compiler Explorer should be taught in one of the labs for any CS61C type course when you're first exposed to assembly language.
I'm actually quite surprised GCC's isn't also a bare ret. It of course is if you replace the atomic_int with a regular int, I don't know why that wouldn't be hitting the same optimizations. Yes it's atomic, but it's still an unused local that doesn't escape the function.
Clang/llvm also does heap ellision (removing unnecessary mallocs) while gcc didn't last time I checked. I think the llvm devs allows themselves to have a more practical understanding of the cpp standard.
Right, that's why atomic ints are not a replacement for volatile when doing inter-process communication.
If the memory access (load or store) itself is the desired behavior, you better make sure you add volatile, even to atomic<int> types. The atomic<> simply provides certain guarantees about atomicity of that access in regards to other accesses within the same process, not about access from different process. If the compiler analysis determines that the atomic<> store/load isn't necessary within the currently compiled program, than it may completely elide it.
I think GCC may be disabling certain optimizations on atomic<> access because many people mistakenly use atomic<> for interprocess communication and it would break that code?
Godbolt is such an amazing tool, and amazing that it's free.
For a random example from a few days ago, I wanted to understand how Rust compiles various approaches to doing pairwise addition between a f64 vector and a f32 vector: https://godbolt.org/z/9envsT. Profiling can tell me which is fastest, but godbolt is really helpful for understanding why.
(Fun fact I learned recently, after years of using it: Godbolt is named after its creator, Matt Godbolt [0]).
Indeed, with such a splendid last name, it must take considerable humility to avoid naming all of his projects and even variables with it. I doubt I would be able to resist.
Throwing a bunch of code at godbolt and seeing it spit out a single 'mov' is definitely on my list of Top 10 Most Satisfying Things: https://godbolt.org/z/bdn37q
Will watch the video later, and I'm probably missing something but at this moment I wonder what more it is than just running "gcc -S", which just spits out assembly code.
By the way, what I think could make this tool more useful for the average user is if the assembly were decompiled into another language (like C).
2018 https://news.ycombinator.com/item?id=18671993
2016 https://news.ycombinator.com/item?id=13182726
2016 (a bit) https://news.ycombinator.com/item?id=12627295
2016 https://news.ycombinator.com/item?id=11671730
2015 https://news.ycombinator.com/item?id=9861294
2015 https://news.ycombinator.com/item?id=9085158
2014 https://news.ycombinator.com/item?id=7593109
"Optimizations in C++ Compilers (acm.org)" https://news.ycombinator.com/item?id=23822044 (102 points, 23 days ago, 20 comments)
It is so neat to have this smorgasbord without needing to install emulators or containers with toolchains, or worse, acquire all of the relevant hardware as in the old old days.
I post those links because many users like to read older threads. Still looking for brief unambiguous wording to make that clear... "if curious see also" is still leading to some misunderstandings.
As a compiler writer, I find this useful when I have an "oh crap" moment and need to find a simple C program to generate specific IR, or I need to find the IR of a simple C program. Despite literally having a built tip-of-trunk compiler that I'm working on. It's that convenient.
An alternative would be for the compiler to generate a stub function (can be shared across all classes for cases where the same vtable offset is used, unless your architecture has some very exotic calling convention) that jumps to a fixed offset in the vitable, and use a skinny pointer to that stub as a virtual member function pointer, but that's not the usual implementation.
In any case, calling through a pointer to virtual member function has to perform virtual dispatch unless the compiler can statically constrain the runtime type of *this being used at the call site. Remember that C++ needs to allow separate compilation, so without a full-program optimizer that breaks dlopen(), C++ can't perform devirtualization on non-final classes.
Making the class final will allow the compiler to statically constrain the runtime type and devirtualize the pointer to member function call.
Edit: added paragraph break.
The problem is that most devs don't bother with PGO.
For example, you can see gcc's progression of efficiency for c atomics with https://godbolt.org/z/brsoEr. If you increment the gcc version number, you will see the (very slow) mfence disappear, and xchg show up.
Then there is Clang at O3: If an int falls in the forest, and there is no one around, was it ever incremented? No. The function turns into a bare ret.
If the memory access (load or store) itself is the desired behavior, you better make sure you add volatile, even to atomic<int> types. The atomic<> simply provides certain guarantees about atomicity of that access in regards to other accesses within the same process, not about access from different process. If the compiler analysis determines that the atomic<> store/load isn't necessary within the currently compiled program, than it may completely elide it.
I think GCC may be disabling certain optimizations on atomic<> access because many people mistakenly use atomic<> for interprocess communication and it would break that code?
For a random example from a few days ago, I wanted to understand how Rust compiles various approaches to doing pairwise addition between a f64 vector and a f32 vector: https://godbolt.org/z/9envsT. Profiling can tell me which is fastest, but godbolt is really helpful for understanding why.
(Fun fact I learned recently, after years of using it: Godbolt is named after its creator, Matt Godbolt [0]).
[0] https://xania.org/MattGodbolt
No other sticker has quite the cachet of a Godbolt sticker.
The thing is that he originally served it from its domain so people use both interchangeably.
And it still is, right? Or is there also a Compiler Explorer domain?
By the way, what I think could make this tool more useful for the average user is if the assembly were decompiled into another language (like C).
It's somewhere in the ideas list... https://github.com/compiler-explorer/compiler-explorer/issue...
Something similar is Andreas Fertig's awesome https://cppinsights.io/
[1] https://github.com/ethanhs/cce