> When performing string concatenation operations, it is more advantageous in terms of performance to use std::ostringstream rather than std::string. This approach is also used elsewhere, such as debug_utils and node_errors.
(From the Node.js GitHub issue.) Sounds like this guy is mixing up his Java knowledge with C++ knowledge.
C++ streams are frankly insane: Loads of implicit state, needing to set about a half dozen flags to do any nontrivial formatting, running the risk of accidentally "poisoning" all downstream operations if you forget to reset any of that state, the useless callbacks API [1], obfuscated function names (xsgetn, epptr, egptr), a ridiculously convoluted inheritance hierarchy that includes virtual/diamond inheritance [2], and use of virtual functions for simple buffer manipulation. These were all bad decisions even at the time.
C++ dev for 20+ years. I refused to use them, they encapsulated everything with C++ I hated. A ton of implicit actions and gotchas. It's a gun in hand, foot in target.
streams is the best evidence that C++ was an experiment. It was a sandbox to try a bunch of different language ideas.
Overloading the shift operators for this purpose is prima facie insane, and anyone who has single-stepped through a C++ "hello world" program can figure out it isn't remotely efficient, but it was certainly creative.
> "I recently learned that some Node.js engineers prefer stream classes when building strings, for performance reasons." Pretty much tells you everything you need to know about node js, I guess.
Google Closure Library includes a StringBuffer class. [1]
I recall it having explanatory notes, but I don't see them in the code now. JavaScript engines can optimize a string concatenating to in-place edit, if there is only one reference to the first string. The StringBuffer class keeps the reference count at one, guaranteeing this optimization is available, even if the StringBuffer itself is ever shared.
For what it's worth, even in Java the compiler is often smart enough to replace naive string concatenation with equivalent StringBuilder usage (although I don't know if it is smart enough to do that in a for loop like this)
A bigger problem is that iostream is still the only C++ way to read and write files. Yeah, you can still use `std::fopen` and so on, but the modern C++ strives to minimize type-ignorant C functions right? The introduction of `std::format` made the formatting aspect of iostream obsolete, but iostream still has no standard alternative for other aspects.
std::print is coming with C++23. In the meantime, there's std::format_to. You still have to dump the output into an std::ostream, but at least you don't have to use the disgusting ostream interface directly.
But iostream is like taking someone that's crazy to use every single language feature and who think there's some ulterior motive to create these crazy inheritance levels etc
What's the alternative to string streams for building strings pieces by pieces in C++? Plain old string concatenation? Asking for a friend... I should run benchmarks I guess.
Maybe not. The problem is that C++ (before C++20) has no normal print and format function. You supposed to do everything with streams. To switch to two decimal places you would first output some magic value that sets an internal flag in the stream. Then you need to remember to restore it again.
Of course you could just use good old C printf to get some work done. But if you did that the "real" C++ programmers would sneer at you.
<iostreams> are currently a good example of pure product of the 199X/2000s when the hype about Object Oriented was around its peak.
Almost everything related to c++ iostreams has this code smell of OOP pushed too far:
- Usage of runtime virtual dispatch with virtual calls when it was not necessary. Causing a negative unavoidable impact on performance.
- Heavy usage of function overloading with the "<<" operator. Leading to pages long compilation errors when an overload fails.
- Hidden states everywhere with the usage of state formatters and globals in the background.
- Unnecessary complexity with std::locale which is almost entirely useless for proper internationalisation.
- Bloat. Any statically compiled binary will inherit around ~100k of binary fat bloat when using iostream
- Useless encapsulation with error reports done as abstracted bit flags. Which is absolutely horrendous when dealing with file I/O: It hides away the underlying error with no proper way to access it.
- Deep class hierarchy making the entire thing looks like spaghetti.
- Useless abstraction with stringstream that hides the underlying buffer away, making it close to unusable on embedded safety critical systems where memory allocations are forbidden.
All of that made <iostreams> aged pretty badly, and for good reasons.
Fortunately there is an incoming way out of that with work of Victor Zverovich on std::format and libfmt [1].
I hear you, however something like streambuf is kinda necessary for a type-erased interface for input/output of trivial objects. The C alternative is FILE*, which isn't much better and isn't as customizable either.
I agree that the formatting could have been done better, and that part is indeed handled much better in fmt, although personally I dislike format strings. It's much better than printf, granted.
Avoiding dependencies is a good thing, especially for C++, that doesn't have a widely used centralized repository and dependency manager like npm, cargo or cpan. For the better or for the worse.
And pulling Boost, let alone Qt just to avoid the occasional use of iostreams (or printf) is a bit much IMHO. I usually try to avoid Boost, as I feel it is more of a sort of beta/preview for the standard library. Don't get me wrong, it is production-worthy, but it can lead to awkward things when some boost feature ends up in the standard libraries and the project ends up with bits of both.
std::format is great because at last, we can use it without dependencies.
Yes. Several alternatives have been available for a while.
The success of Victor has been to make the C++ committee accepts the idea that a new formatter was necessary and to bring <format> in the STL.
This was not a small task: The committee has its fair amount of dinosaur gatekeepers and windmills [1]. For the best and the worst.
We at least now have a way forward to evolve from <iostream> if we want to with maybe one day the hope of getting something that can entirely replace iostream.
[1]: Windmills: Person displacing air around but not much more than air.
The same thing struck me as well. This is one of the best optimization professionals on the planet, showing up with a huge improvement, and receiving some misplaced arrogance.
The lesson here is to always, always watch your own review tone, and not make this mistake.
The other lesson is that when a PR shows up with this kind of technical information attached to it, spend the 60 seconds it takes to Google for "lemire".
If I'm being super pedantic, I would argue that while `string::push_back` should take amortized constant time, `string::append` has no such guarantee [1]. So it is technically possible for `my_string += "a";` (same to `string::append`) will reallocate every time. Very pedantic indeed, but I have seen some C++ implementation where `std::vector<T>` is an alias to `std::deque<T>`, so...
One thing I don't like about lemire's phrasing is that he only looks at the current, often only most available, implementations and doesn't make this point explicit for most cases.
EDIT: Thankfully he does acknowledge that in a later post [2].
I am not at all surprised. Kids these days have no idea what CPUs can do. ;)
I periodically have interview candidates work through problems involving binary search, then switch to bounded and ask them how to make it go faster over N elements, where N is < 1e3. The answer is "just linear search, because CPUs really like to do that".
This feels like a conversation where it would have been useful for the participants to be very explicit about the points they were trying to convey: the reviewer could have said "Isn't this a quadratic algorithm, because each call to `+=` reallocates `escaped_file_path`?" (or whatever their specific concern was; I may have misunderstood), and the author's initial response could have been "No, because the capacity of the string is doubled when necessary."
My impression is that C++ streams are on their way out -- unlikely to be deprecated (too much existing code) but also unlikely to receive any more attention. They are old enough to likely not have any actual implementation bugs at this point, but in retrospect the design bugs from the 1980s are pretty serious.
The rapid incorporation of the excellent `format` package for printing points to a future falling back at least to ANSI buffered IO and possibly raw POSIX IO.
I like C much more than C++, but even with that must say that https://github.com/fmtlib/fmt is pretty nice (which is the base for std::format). Together with pystring (https://github.com/imageworks/pystring) it makes string processing in C++ somewhat bearable (pystring is slow though because it still uses the std::string type which excessively allocates, but at least it's convenient compared to 'raw' C++ string functionality).
It always surprises me how slow streams are. fscanf should be relatively slow because it has to parse the format string at runtime. So the new C++ format should be (and I believe is) much faster
fscanf() is also pretty slow because of thread safety so each call involves a mutex (which goes unused 99% of the time). I wonder do the new C++ libraries have faster non-threadsafe options?
The little I did competitive programming, input parsing time was negligible compared to the allowed runtime for solving the problem. Inputs were designed so that if you had the right algorithm, you could do it easily even with terrible optimization. Fast code could be an advantage in the algorithm (but not in parsing), as it could help you "cheat" and, for example, do a problem designed for N² in N³. Personally, I used iostreams, just because I found it a bit easier to type.
But then, different competition have different rules, and maybe there are some where fscanf really is an advantage.
What is the effect of turning off synchronization with legacy functions from C? When C++ is used for I/O and no C is used this should be a habit. I’ve the impression that most C++ books don’t mention it (e.g. Primer) or only late.
It is similar to String and StringBuilder from Java. You need to know it, remember it and use it by habit. And again, books often mention it only late (e.g. Head First).
By the way. I like the plain things from <iostream>, especially the shift << and >> operators and ease of concatenating and handling strings. But as others mentioned, the implementation (e.g. inheritance) looks complicate.
(From the Node.js GitHub issue.) Sounds like this guy is mixing up his Java knowledge with C++ knowledge.
C++ streams are frankly insane: Loads of implicit state, needing to set about a half dozen flags to do any nontrivial formatting, running the risk of accidentally "poisoning" all downstream operations if you forget to reset any of that state, the useless callbacks API [1], obfuscated function names (xsgetn, epptr, egptr), a ridiculously convoluted inheritance hierarchy that includes virtual/diamond inheritance [2], and use of virtual functions for simple buffer manipulation. These were all bad decisions even at the time.
[1] https://en.cppreference.com/w/cpp/io/ios_base/register_callb...
[2] https://i.stack.imgur.com/dXhXP.png
Overloading the shift operators for this purpose is prima facie insane, and anyone who has single-stepped through a C++ "hello world" program can figure out it isn't remotely efficient, but it was certainly creative.
That is exactly it. C++ string streams have had atrocious performance since forever. Good abstraction, not very useful in practice.
In Java, if I remember correctly, strings are immutable, so the StringBuilder or whatever ridiculous name it had was the faster way to build a string.
> "I recently learned that some Node.js engineers prefer stream classes when building strings, for performance reasons."
Pretty much tells you everything you need to know about node js, I guess.
Google Closure Library includes a StringBuffer class. [1]
I recall it having explanatory notes, but I don't see them in the code now. JavaScript engines can optimize a string concatenating to in-place edit, if there is only one reference to the first string. The StringBuffer class keeps the reference count at one, guaranteeing this optimization is available, even if the StringBuffer itself is ever shared.
[1] https://github.com/google/closure-library/blob/master/closur...
https://github.com/nodejs/node/pull/50253
Note that the person mixing java knowledge and C++ isn't Daniel Lemire.
C++ the base language has issues, for sure
But iostream is like taking someone that's crazy to use every single language feature and who think there's some ulterior motive to create these crazy inheritance levels etc
Maybe we need C+=2 to make things less crazy
If you really need speed, then estimate how large string you need in advance and preallocate it (either with new char[], or string::reserve ig).
Of course you could just use good old C printf to get some work done. But if you did that the "real" C++ programmers would sneer at you.
Almost everything related to c++ iostreams has this code smell of OOP pushed too far:
- Usage of runtime virtual dispatch with virtual calls when it was not necessary. Causing a negative unavoidable impact on performance.
- Heavy usage of function overloading with the "<<" operator. Leading to pages long compilation errors when an overload fails.
- Hidden states everywhere with the usage of state formatters and globals in the background.
- Unnecessary complexity with std::locale which is almost entirely useless for proper internationalisation.
- Bloat. Any statically compiled binary will inherit around ~100k of binary fat bloat when using iostream
- Useless encapsulation with error reports done as abstracted bit flags. Which is absolutely horrendous when dealing with file I/O: It hides away the underlying error with no proper way to access it.
- Deep class hierarchy making the entire thing looks like spaghetti.
- Useless abstraction with stringstream that hides the underlying buffer away, making it close to unusable on embedded safety critical systems where memory allocations are forbidden.
All of that made <iostreams> aged pretty badly, and for good reasons.
Fortunately there is an incoming way out of that with work of Victor Zverovich on std::format and libfmt [1].
[1]: https://github.com/fmtlib/fmt
I agree that the formatting could have been done better, and that part is indeed handled much better in fmt, although personally I dislike format strings. It's much better than printf, granted.
Deleted Comment
Those are great, but iostreams hasn't been necessary in a very long time thanks to other libraries like Qt and Boost.
And pulling Boost, let alone Qt just to avoid the occasional use of iostreams (or printf) is a bit much IMHO. I usually try to avoid Boost, as I feel it is more of a sort of beta/preview for the standard library. Don't get me wrong, it is production-worthy, but it can lead to awkward things when some boost feature ends up in the standard libraries and the project ends up with bits of both.
std::format is great because at last, we can use it without dependencies.
The success of Victor has been to make the C++ committee accepts the idea that a new formatter was necessary and to bring <format> in the STL.
This was not a small task: The committee has its fair amount of dinosaur gatekeepers and windmills [1]. For the best and the worst.
We at least now have a way forward to evolve from <iostream> if we want to with maybe one day the hope of getting something that can entirely replace iostream.
[1]: Windmills: Person displacing air around but not much more than air.
- crafting a high context yet succinct description
- addressing PR feedback well
- giving respect to a pedantic commenter who understands the inner workings far less than Daniel while not conceded to make a destructive change.
I will share this PR widely as arole model in open source contributions.
[1] https://github.com/nodejs/node/pull/50288
The lesson here is to always, always watch your own review tone, and not make this mistake.
The other lesson is that when a PR shows up with this kind of technical information attached to it, spend the 60 seconds it takes to Google for "lemire".
One thing I don't like about lemire's phrasing is that he only looks at the current, often only most available, implementations and doesn't make this point explicit for most cases.
EDIT: Thankfully he does acknowledge that in a later post [2].
[1] https://timsong-cpp.github.io/cppwp/n4861/strings#string.app...
[2] https://lemire.me/blog/2023/10/23/appending-to-an-stdstring-...
I periodically have interview candidates work through problems involving binary search, then switch to bounded and ask them how to make it go faster over N elements, where N is < 1e3. The answer is "just linear search, because CPUs really like to do that".
Deleted Comment
The rapid incorporation of the excellent `format` package for printing points to a future falling back at least to ANSI buffered IO and possibly raw POSIX IO.
Which is based on
https://fmt.dev/latest/index.html
There's also a proposal for a type safe scanf: scnlib, sort of format in reverse: https://scnlib.dev/en/master/
The array_source device can be used with any buffer (char*, size_t)
Deleted Comment
The little I did competitive programming, input parsing time was negligible compared to the allowed runtime for solving the problem. Inputs were designed so that if you had the right algorithm, you could do it easily even with terrible optimization. Fast code could be an advantage in the algorithm (but not in parsing), as it could help you "cheat" and, for example, do a problem designed for N² in N³. Personally, I used iostreams, just because I found it a bit easier to type.
But then, different competition have different rules, and maybe there are some where fscanf really is an advantage.
It is similar to String and StringBuilder from Java. You need to know it, remember it and use it by habit. And again, books often mention it only late (e.g. Head First).
By the way. I like the plain things from <iostream>, especially the shift << and >> operators and ease of concatenating and handling strings. But as others mentioned, the implementation (e.g. inheritance) looks complicate.
Source https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdi...