Thanks for finding this out as a codepage issue. The implementation of the operator<< will indeed call ostream::widen() to expand character into a locale dependent equivalent.
Something else to consider is compile time versus runtime validation with formatting libraries, e.g. due to passing the wrong number or type of arguments. The Abseil str_format library does compile time validation for both when possible: https://abseil.io/docs/cpp/guides/format
{fmt} certainly does this too. It works quite nicely with the clangd language server flagging a line as an error until the format string and arguments match.
Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?
I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.
Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.
Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...
This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.
What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.
* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".
> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
Can I talk to you about our Lord and Savior the CPU trace cache[1]?
That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.
In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?
Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.
Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.
I remember using variadic templates to print things in a single function call, like this:
int i; float f; string s;
print_special(i, f, s);
It would somehow imitate the behavior of python's print()
I never really understood how variadic template worked, maybe one day I will, and to be honest, I'm suspecting it's really not very kind to compile time, it's a lot of type checks done under the hood.
It's a bit problematic that C++ cannot be compiled quickly without a fast CPU, I wonder how this is going to be addressed one day, because it seems that modules aren't a good solution to that, yet.
For list comprehension, we have (C++23): `std::ranges::to<std::vector>(items | std::views::filter(shouldInclude) | std::views::transform(f))` it’s not quite `[f(x) for x in items if shouldInclude(x)]` but it’s the same idea.
To be honest, if that's the notation, i will not be very eager to jump on cpp23. That said, I admire people who's minds stay open for c++ improvements and make that effort.
Buffer overflows are never irrelevant. You might get away with it until the day it blows up or someone manages to exploit it. Or you could code it correctly the first time.
Disagree. Use whatever is the most simple and boring option for the problem's solution.
Also, the standard library has so much stuff that will give you pain in runtime, that avoiding sprintf really is not relevant.
I don't know what "Safe" means in general in the scope of C++. If there is a memory corruption, my program will crash. Then I will compile it with C++ debug runtime, which will pinpoint the exact location that caused the memory leak. Then I fix the leak.
Not using sprintf will not result in code that would not have memory leaks. C++ in total is unsafe. You need to write code and have production system that foremost takes this into account. You can't make C++ safe by following dogma of not using functions with possible side-effects. There is a very hight chance your fall-back algorithms themselves will leak memroy anyway.
The only way to write 'as non-leaky' C++ as possible is to make the code as simple, and easy to reason about as possible, and to have tooling to assist in program validation. This requirement is much more important than avoiding some parts of standard library.
Use static checkers, use Microsoft's debug C runtime, use address sanitizers, etc.
If you know some parts of you standard library are broken then ofc avoid them. But what can be considered "broken" really depends what one is trying to achiece, and which platforms one is targeting.
asprintf is often a better choice if you're heap allocating anyway. For a static or stack buffer, obviously snprintf, unless you know the maximum possible length won't exceed (which you often do....).
no, it wont. if you are on an old Windows with code page 437 then sure. but on any sane UTF-8 system, you're just going to get some binary data.
1. https://wikipedia.org/wiki/Code_page_437
They dropped locale support for `std::to_chars` so hopefully they can be turned off for `std::format` too
Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...
Deleted Comment
This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.
What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.
* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".
Can I talk to you about our Lord and Savior the CPU trace cache[1]?
That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.
In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?
Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.
[1] https://chipsandcheese.com/2022/06/17/intels-netburst-failur...
I remember using variadic templates to print things in a single function call, like this:
It would somehow imitate the behavior of python's print()I never really understood how variadic template worked, maybe one day I will, and to be honest, I'm suspecting it's really not very kind to compile time, it's a lot of type checks done under the hood.
It's a bit problematic that C++ cannot be compiled quickly without a fast CPU, I wonder how this is going to be addressed one day, because it seems that modules aren't a good solution to that, yet.
Deleted Comment
Another example would be convenient list comprehension, convenient maps wihout juggling around with tuples, first(), second(), at()...
For example, if you have a std::map<std::string, int>, you can iterate over it like this:
You can test for membership: Although I still usually use find because if the key is in the map, I probably want the value.You can use initialization lists with them too:
(1) notation wise not very different from the old printf which is looked down upon.
(2) f"{name}'s hobby is {hobby} " would read like a novel and there a lot less comma seperated arguments.
(3) std::format is quite a lot of characters to type for something so ubiquitous.
Deleted Comment
Disagree. Use whatever is the most simple and boring option for the problem's solution.
Also, the standard library has so much stuff that will give you pain in runtime, that avoiding sprintf really is not relevant.
I don't know what "Safe" means in general in the scope of C++. If there is a memory corruption, my program will crash. Then I will compile it with C++ debug runtime, which will pinpoint the exact location that caused the memory leak. Then I fix the leak.
Not using sprintf will not result in code that would not have memory leaks. C++ in total is unsafe. You need to write code and have production system that foremost takes this into account. You can't make C++ safe by following dogma of not using functions with possible side-effects. There is a very hight chance your fall-back algorithms themselves will leak memroy anyway.
The only way to write 'as non-leaky' C++ as possible is to make the code as simple, and easy to reason about as possible, and to have tooling to assist in program validation. This requirement is much more important than avoiding some parts of standard library.
Use static checkers, use Microsoft's debug C runtime, use address sanitizers, etc.
If you know some parts of you standard library are broken then ofc avoid them. But what can be considered "broken" really depends what one is trying to achiece, and which platforms one is targeting.
Deleted Comment