Formatting text in C++: Old and new ways

Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?

xcvb · 2 years ago

I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.

lysium · 2 years ago

I think that might be it. Too bad, the results of this kind of benchmark would have been interesting.

Quekid5 · 2 years ago

Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.

Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...

Deleted Comment

deaddodo · 2 years ago

> There is no JIT in C++, or is it?

This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.

JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.

What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.

* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".

mananaysiempre · 2 years ago

> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.

Can I talk to you about our Lord and Savior the CPU trace cache[1]?

That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.

In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?

Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.

[1] https://chipsandcheese.com/2022/06/17/intels-netburst-failur...

ergeysay · 2 years ago

Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.

IshKebab · 2 years ago

Branch predictor maybe.

mike_hock · 2 years ago

That was my guess. Training the branch predictor on all those virtual calls.

I find it disappointing that cpp20 still doesn't have a solution that is more convenient than good ol printf (except for memory safety).

Another example would be convenient list comprehension, convenient maps wihout juggling around with tuples, first(), second(), at()...

criddell · 2 years ago

Maps have been improved quite a bit.

For example, if you have a std::map<std::string, int>, you can iterate over it like this:

    for (auto [s, n]: my_map) {
        // s = string key, n = int value
    }

You can test for membership:

    if (my_map.contains(“foo”)) { /* do something * }

Although I still usually use find because if the key is in the map, I probably want the value.

You can use initialization lists with them too:

    std::map<std::string, int> my_map = {
      { “one”, 1 },
      { “two”, 2 },
      { “three”, 3 }
    };

jcelerier · 2 years ago

    for (auto [s, n]: my_map)

copies all the data needlessly, better to use

    for (const auto& [s, n]: my_map)

nuancebydefault · 2 years ago

This is very useful, thanks!

BenFrantzDale · 2 years ago

For list comprehension, we have (C++23): `std::ranges::to<std::vector>(items | std::views::filter(shouldInclude) | std::views::transform(f))` it’s not quite `[f(x) for x in items if shouldInclude(x)]` but it’s the same idea.

nuancebydefault · 2 years ago

To be honest, if that's the notation, i will not be very eager to jump on cpp23. That said, I admire people who's minds stay open for c++ improvements and make that effort.

Tommstein · 2 years ago

Sweet baby Jesus I thought that was a joke as I started reading it. Still not entirely sure.

halayli · 2 years ago

What's wrong with std::format?

nuancebydefault · 2 years ago

Nothing but

(1) notation wise not very different from the old printf which is looked down upon.

(2) f"{name}'s hobby is {hobby} " would read like a novel and there a lot less comma seperated arguments.

(3) std::format is quite a lot of characters to type for something so ubiquitous.

miohtama · 2 years ago

It does not have enough greater or less signs in the function signature :)

Deleted Comment

38 · 2 years ago

    unsigned char str[]{3,4,5,6,0};
    std::stringstream ss;
    ss << "str=" << str;
    std::string text = ss.str();

> The content of text will be "str=♥♦♣♠".

no, it wont. if you are on an old Windows with code page 437 then sure. but on any sane UTF-8 system, you're just going to get some binary data.

1. https://wikipedia.org/wiki/Code_page_437

mrcode007 · 2 years ago

Thanks for finding this out as a codepage issue. The implementation of the operator<< will indeed call ostream::widen() to expand character into a locale dependent equivalent.

havermeyer · 2 years ago

Something else to consider is compile time versus runtime validation with formatting libraries, e.g. due to passing the wrong number or type of arguments. The Abseil str_format library does compile time validation for both when possible: https://abseil.io/docs/cpp/guides/format

a_e_k · 2 years ago

{fmt} certainly does this too. It works quite nicely with the clangd language server flagging a line as an error until the format string and arguments match.

secondcoming · 2 years ago

Does `std::format` still use locales under the hood like `std::stringstream` does?

They dropped locale support for `std::to_chars` so hopefully they can be turned off for `std::format` too

jokoon · 2 years ago

finally!

I remember using variadic templates to print things in a single function call, like this:

    int i; float f; string s;
    print_special(i, f, s);

It would somehow imitate the behavior of python's print()

I never really understood how variadic template worked, maybe one day I will, and to be honest, I'm suspecting it's really not very kind to compile time, it's a lot of type checks done under the hood.

It's a bit problematic that C++ cannot be compiled quickly without a fast CPU, I wonder how this is going to be addressed one day, because it seems that modules aren't a good solution to that, yet.

guitarbill · 2 years ago

It seems so trite, especially in 2023, but please don't use sprintf. It isn't safe in general. (Even snprintf is tricky.)

mkoubaa · 2 years ago

There are a lot of use cases where "isn't safe" is absolutely irrelevant

i-use-nixos-btw · 2 years ago

OTOH, in many of the cases where “isn’t safe” IS relevant, the developer believes it isn’t.

kevin_thibedeau · 2 years ago

Buffer overflows are never irrelevant. You might get away with it until the day it blows up or someone manages to exploit it. Or you could code it correctly the first time.

fsloth · 2 years ago

"It isn't safe in general"

Disagree. Use whatever is the most simple and boring option for the problem's solution.

Also, the standard library has so much stuff that will give you pain in runtime, that avoiding sprintf really is not relevant.

I don't know what "Safe" means in general in the scope of C++. If there is a memory corruption, my program will crash. Then I will compile it with C++ debug runtime, which will pinpoint the exact location that caused the memory leak. Then I fix the leak.

Not using sprintf will not result in code that would not have memory leaks. C++ in total is unsafe. You need to write code and have production system that foremost takes this into account. You can't make C++ safe by following dogma of not using functions with possible side-effects. There is a very hight chance your fall-back algorithms themselves will leak memroy anyway.

The only way to write 'as non-leaky' C++ as possible is to make the code as simple, and easy to reason about as possible, and to have tooling to assist in program validation. This requirement is much more important than avoiding some parts of standard library.

Use static checkers, use Microsoft's debug C runtime, use address sanitizers, etc.

If you know some parts of you standard library are broken then ofc avoid them. But what can be considered "broken" really depends what one is trying to achiece, and which platforms one is targeting.

saying "dont use" something isn't really actionable, unless you give a safe alternative.

jjnoakes · 2 years ago

The safe alternatives are in the article.

cozzyd · 2 years ago

asprintf is often a better choice if you're heap allocating anyway. For a static or stack buffer, obviously snprintf, unless you know the maximum possible length won't exceed (which you often do....).

thiht · 2 years ago

And explain why it’s not safe too

consoomer · 2 years ago

YOLO