Mysterious Moving Pointers

I remember the C++ ca. 1994 year when I started my career. It was C with Objects back then. And it was great! C++ was better C, it was easy for any C dev to convince him/her to jump to it.

I recently had to work with C++ code and... it is not a happy story anymore:

- Lots of magic like described here

- F**ing templates - for those who like them, did you ever see a C++ core file? Or tried to understand a single symbol?

- Standarization that feels like pulling more Boost into the language, which means more templates. Which makes core files incomprehensible.

It used the be that average dev who knew C could read and work with simple C++. This is no longer true. C++ is no longer a better C.

P.S. Example from recent core file, one line in the stack trace: boost::asio::asio_handler_invoke<boost::asio::detail::binder2<core::AsyncSignalCatcher<2, 15, 17>::waitForSignal<XServer::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<lambda(int)> >(XServer::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<lambda(int)>&&)::<lambda(const boost::system::error_code&, int)>, boost::system::error_code, int> > (function=...)

tialaramex · 2 years ago

> I remember the C++ ca. 1994 year when I started my career. It was C with Objects back then

It is possible, and perhaps even likely that the C++ you wrote in 1994 was indeed this "better C" and maybe even the C++ you read, the language Stroustrup wrote about in 1985 although "C with Objects" is much older still. The ISO document (C++ 98 aka ISO 14882:1998) was four years into your future in 1994, but the committee to write that document had existed for quite some time. Stroustrup's 1991 Second Edition of his appropriately already big book "The C++ Programming Language" explains that the committee have accepted Templates, although I believe at that point they didn't realise they'd inadvertently thus added an entirely new meta-language which is programmed differently than the rest of C++

Boost comes much later, it's only about as old as the actual ISO document.

hyperman1 · 2 years ago

Templates as in C++ are a historical accident.

Erwin Unruh demonstrated to the C++ commitee how they had accidentally created a second, compile time programming language inside the primary language. He did this by calculating primes and printing them in error messages.

People got carried away a bit with the power made available by this new language. Unfortunately there are no basic ergonomics in template metaprogramming because it was not intended to be a programming language.

forrestthewoods · 2 years ago

boost is just awful. Don’t use boost.

templates are totally fine for basic containers. Much better than C macro shenanigans.

The world is still looking for a “better C”. Zig, Odin, Jai and probably more are all trying. Of the three I think Jai is the closest. But it’ll be awhile.

inetknght · 2 years ago

> - F*ing templates - for those who like them, did you ever see a C++ core file? Or tried to understand a single symbol?

I have 20+ years of experience writing C++.

Yes, I've looked at "core" C++ headers and source. The most annoying part to me is style (mixed tabs and spaces, curly braces at wrong places, parenthesis or not, the usual style complaints). But other than that they're very readable to a seasoned C++ engineer.

I've also tried to understand symbols. You're right, they're difficult. But there's also tooling available to do it automatically. Even if you don't want to use the tools, there is a method to the madness and it's documented...

Let me ask ChatGPT:

> What tool lets me translate an exported symbol name to a C++ name?

    C++filt

It's categorized as a demangler. That's your search term to look for (I had to remember what it was).

Then I asked:

> Is there a function in the standard library which allows to mangle or demangle a given name or symbol?

It tells about `__cxa_demangle` for GCC. While I had forgotten about that, I'm pretty sure there is (or perhaps something similar) in the standard library.

It also suggests to use a library such as `abi::__cxa_demangle`. Hah, that's what I was looking for. It's an implementation-specific API (eg, compiler-specific) API used as an example. It was mentioned on `std::type_info::name()` page here:

https://en.cppreference.com/w/cpp/types/type_info/name

So, to continue replying to you: yes, it's annoying but it's solvable with tools that you can absolutely integrate into your IDE or command-line workflow.

> - Standarization that feels like pulling more Boost into the language, which means more templates.

The boost libraries are open source and their mailing lists are active. If you don't like a given library because it has too many templates then you could make one with fewer templates.

And, as standardization goes, it's also quite open source. The C++ committee is very open and receptive to improvements. The committee are volunteers (so their time is limited) and (usually) have their own improvements to the standard that they want. So you have to drive the changes you want (eg, actively seek feedback and engagement).

> P.S. Example from recent core file, one line in the stack trace:

I've seen much longer -- I've seen templates entirely fill a terminal buffer for a single line. That's extremely rare, definitely not fun, and debuggability is absolutely a valid reason to refactor the application design (or contribute library changes).

I find it useful to copy the template vomit into a temporary file and then run a formatter (eg clang-format), or search/replace `s/(<\({[)/\1\n/g` and manually indent. Then the compiled type is easier to read.

Some debuggers also understand type aliases. They'll replace the aliased type with the name you actually used, and then separately emit a message (eg, on another line) indicating the type alias definition (eg, so you can see it if you don't have a copy of the source)

#include <cstdio> #include <list> #include <type_traits> #include <vector> struct Node; struct Connection { Node *from, *to; }; struct Node { std::vector<Connection> connections; }; int main() { printf("%d\n", std::is_nothrow_move_constructible_v<std::list<Node>>); return 0; }

I know C++ the language but not the STL (the overwhelming abundance of UB and total lack of safety make it an anathema), so my question is why the STL allows/requires non-move here copying here dependent on whether an object has a no throw move constructor?

Note I’m not asking about move constructor vs memmove/cpy but rather the use of copy constructor vs move depending on exception behavior? Is it something like prefer no throw copy constructor over “throwing” move?

chaboud · 2 years ago

That’s a bit like saying you know C++ but not streams or templates, or C but not floating point operations. It’s probably worth learning STL.

Anyway, the reason to use move instead of copy is for performance. Move constructors are faster because they can leave the source object modified (e.g., take over control of a pointer to deep contents). This falls apart when the move constructor can throw, because the container might be part way through a resize when this happens, leaving the object before the exception modified and the code in an unrecoverable state.

Basically, unless we can be super duper 100% certain we’re going to make it through the operation without throwing an exception, we’re going to copy, leaving the objects in question in an unaltered state, and holding to the promises of the standard.

olliej · 2 years ago

I phrased that badly - it should have been “I don’t know every edge case in the STL, and so I don’t know why this would have different behavior”.

However thanks for explaining the issue. This one is obvious and I just completely failed to think about how you ensure the source object is in a safe state if an exception occurs part way through moving the source data. It seems to imply the old MSVC behaviour was incorrect in such a scenario, but I hadn’t considered that possibility so assumed it was correct and therefore didn’t think of why this behaviour is required.

My solution is of course to simply not allow exceptions because the c++ model of everything implicitly throwing is just as annoying as Java’s “let’s explicitly annotate everything” model albeit with different paths to sadness.

inetknght · 2 years ago

> leaving the object before the exception modified and the code in an unrecoverable state

It isn't likely to leave the code in an unrecoverable state even if recovery is calling std::terminate (or worse).

It is likely to leave the data in an unrecoverable state. Imagine that a vector of 4 items was resized -- the first two objects move successfully, but the third one throws an exception. Then in your move function, you catch that and decide to undo your changes before propagating the error. Then when you're undoing the changes the first object throws an exception when it's being moved back. Oof! At best you've got multiple active exceptions (legal if you're in the catch handler, but should be rare and definitely should be avoided) and at worst your data is indeed unrecoverable (thus one of many reasons why std::terminate is the default option when multiple exceptions are alive on the same stack).

tialaramex · 2 years ago

The overwhelming C++ priority beating both safety and performance, often to the consternation of the performance people, is backwards compatibility with dusty archaic code. If it was written by somebody whose funeral was last century, WG21 thinks it's important that it still compiles in your C++ 23 compiler whenever you get one of those. Not crucial. Not so that they actually defined the language sensibly to avoid compatibility problems, but important enough to trump mere performance or safety concerns.

Last century move didn't exist. The terrible C++ move (which is basically an actual "destructive move" plus a default create step) was invented for C++ 11 which was, as the name suggests, standardised only in 2011.

So back then everybody is using copy assignment semantics. Your compiler might be smart enough, especially for trivial cases, to spot the cheap way to deliver the required semantics, but it might not especially as things get tricky (e.g. a std::vector of std::list) and semantically it's definitely a copy, not a move.

As a result the "non-move" that you're astonished by is how all C++ code last century was written, the semantics you're just assuming as necessary didn't even exist in ISO C++ 98 and it is considered important that such code still works.

kentonv · 2 years ago

I think the other replies may have misunderstood your question. I think you are asking:

Why does std::vector<T> require T's move constructor to be noexcept (or else it falls back to copying instead)?

The reason goes something like this:

When std::vector<T> grows, it needs to move or copy all of its elements into a new, larger-capacity array. It would prefer to move them, since that's a lot more efficient than copying (for non-trivial types). But what happens if it moves N elements, and then the move constructor for element N+1 throws an exception? Elements 0-N have been moved away already, so the vector is no longer valid as-is. Should it try to move those elements back to the original array? But what if one of those moves fails?

The C++ standards body decided to sidestep this whole problem by saying that std::vector<T> will refuse to use T's move constructor unless it is declared noexcept, so the above problem can't happen.

In my opinion, this was a huge mistake. Intuitively, everyone expects that when an std::vector<T> grows, it's going to move the elements, not making a ton of copies. Often, these copies result in hidden performance problems. Arguably the author of this post is lucky than in their case, the copies resulted in outright failure, thus revealing the problem.

There seem to be two other possibilities:

* std::vector<T> could simply refuse to compile if the move constructor was not `noexcept`. I think this could have been done in a way that wouldn't have broken existing code, if it had been introduced before move constructors existed in the wild -- unfortunately, that ship has now sailed and this cannot be done now without breaking people.

* std::vector<T> could always use move constructors, even if they are not declared `noexcept`, and simply crash (std::terminate()) in the case that one actually throws. IMO this would be fine and is the best solution. Move constructors almost never actually throw in practice, regardless of whether they are declared as such, because move constructors are almost always just "copy pointer, null out the original". You don't put complex logic in your move constructor. And anyway, C++ already has plenty of precedent for turning poorly-timed exceptions into terminations; why not add another case? But I think it's unlikely the standards committee would change this now.

Joker_vD · 2 years ago

Honestly, trying to move the elements back and calling std::abort if that fails seems fine. It is indeed an exceptional happenstance, and how quickly you can recover from it is probably not as important as being able to recover correctly. And who catches exceptions around resize()/push_back() anyway?

inetknght · 2 years ago

Throw specifications do not change function call binding behavior.

Move constructor and move operator will bind to an R-value reference if the move constructor or move operators are available. Conversely, if those functions which declare to not throw anything do end up throwing something then the result is std::terminate.

The only things that determine whether to use a move or copy is whether the reference is an R-value and whether the source or destination is const.

You can declare a {const R-value move operator (not a constructor) for the left-hand} and/or {const R-value move operator or constructor for the right-hand side} of the argument. But you won't be able to modify anything not marked mutable. You shouldn't do that though: that's a sure way to summon nasal demons. That said, I see it fairly often from less experience engineers, particularly when copy-pasting a copy operator intending to modify it for a move operator.

wffurr · 2 years ago

What do you use instead of std::vector, map, unique_ptr, etc?

I have a hard time thinking of C++ and the STL as separate. Even our internal utilities and such tend to be STL-like although often with safer defaults.

tialaramex · 2 years ago

Lots of the C++ standard library, including the STL containers isn't provided in freestanding C++. Now, in reality freestanding C++ has been kind of a joke - the committee for years barely bothered to keep it working - especially compared to freestanding C (which is well defined and used all over the place) and say Rust no_std (likewise) - and so many embedded systems may have the entire standard library notionally available even though parts of it are definitely nonsense for them and they've got local rules saying not to use the parts that would definitely explode in their environment... but many C++ programmers who have worked under such rules just reflexively avoid the STL's containers and maybe its algorithms even in an environment where those would work.

olliej · 2 years ago

Essentially the same things but reimplemented safely - see WTF in webkit.

There are still issues (the iterator API used by for(:) is very hard to make safe without terrible perf issues, though I was looking at this recently and the compilers are doing much better than they used to).

Things like unique_ptr and shared_ptr do not meaningfully improve the security of c++ despite being presented as if they did (all serious c++ projects already had smart pointers before the stl finally built them in so presenting them as a security improvement is disingenuous), and because of the desire to have shared_ptr be noninvasive it’s strictly worse than most other shared ownership smart pointers I’ve used.

quuxplusone · 2 years ago

I have a blog post on the topic here: https://quuxplusone.github.io/blog/2022/08/26/vector-pessimi...

The TLDR is: Using `move_if_noexcept` instead of plain old `move` can help you provide the "strong exception guarantee." For what _that_ is, see cppreference: https://en.cppreference.com/w/cpp/language/exceptions#Except...

and/or the paper by Dave Abrahams that introduced the term, "Exception-Safety in Generic Components: Lessons Learned from Specifying Exception-Safety for the C++ Standard Library." https://www.boost.org/community/exception_safety.html

> Is it something like prefer no throw copy constructor over “throwing” move?

Almost. If move won't throw (or if copy isn't possible), we'll move. But given a choice between a throwing move and any kind of copy, we'll prefer copy, because copy is non-destructive of the original data: if something goes wrong, we can roll back to the original data. If the original data's been moved-from, we can't.

mgaunard · 2 years ago

UB is a feature; people who keep on fighting it are such a pain.

Regarding your question, nothrow operations are essential to maintaining invariants. And maintaining invariants is how you make code correct in a world where UB exists.

Joker_vD · 2 years ago

Yes, if the programmer maintains certain invariants, the C's flavour of UB allows the compiler to take advantage of those invariants for performance gains, by omitting run-time checking for those invariants.

The problem with this flavour of UB being a programmer's promise "this is fine, trust me, no run-time checks needed" to the compiler is that a) it's made by the programmer by omitting said run-time checks ― and that often happens accidentally, not intentionally; b) the compilers are really bad at pointing out to the programmer places where they took advantage of such promises, which really complicates the task of writing conforming programs. Every time I add two int's, I promise to the compiler that an overflow won't happen: and of course, the moment an UB happens, all invariants cease to hold, so trying to find the initial bug where you've accidentally broke one of invariants turns into a nightmare.

userbinator · 2 years ago

IMHO this is another case where C++'s hidden layers of complexity hides bugs that would've been obvious in plain C. In fact for this particular use-case I'd probably use indices instead of pointers.

limaoscarjuliet · 2 years ago

quotemstr · 2 years ago

In far more cases, zero-cost abstractions make obvious or impossible bugs that would be hard to spot in C programs, e.g. memory lifetime rule violations. And you could make a similar argument that C obscures bugs that would be obvious in assembly. High level languages are a blessing, and programmers who avoid them are only decreasing their productivity and those around them.

The problem the article highlights appears to be an implementation defect: in my libstdc++ test just now, we do, in fact, mark the list as nothrow move constructible. The standard should mandate that std::list be infallibly moveable.

Are we going to indict a whole programming model based on an isolated implementation bug? If so, well, isn't doing that from a "C is better" perspective the galactic black hole calling the kettle black?

camblomquist · 2 years ago

I mentioned it in a side note that I trimmed because there were so many that it spilled into the footer (faster to trim the article than to fix the CSS,) but Microsoft is the only implementation of the big three that doesn't mark the move constructor here as nothrow. The standard doesn't require it so it's valid for MSVC to do things the way they do, it just creates problems like this that would arguably be harder to find the cause of if one had to build code for multiple platforms.

mhh__ · 2 years ago

On the flip side C projects often end up with things like linked lists and pointers to [it depends] because that lack of abstraction, while obvious, encourages the programmer to take shortcuts (otherwise no time for actual logic)

TillE · 2 years ago

Storing a pointer to memory that you did not explicitly allocate is always a red flag, I think. You really need to understand how everything works, and be very careful.

I would default to just using std::unique_ptr<Node> in a situation like this, especially since using std::list suggests performance isn't critical here, so the additional indirection probably doesn't matter.

unique_ptr seems inappropriate here since the pointers aren't unique. shared_ptr doesn't even work because it looks like this data structure is representing a graph and would expect to have cycles. Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed, but that would not have fixed the bug here, just replaced segfaults with some more controlled exception or panic.

In fact, full-on garbage collection wouldn't have prevented the bug here, and could arguably have made it worse. The problem is that nodes were unexpectedly being copied and then the originals deleted. With GC, you'd still have the copy, but never delete the originals, so you end up with split brain where there are multiple copies of each node and various pointers point to the wrong ones. That'd be pretty painful to debug!

IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead. This is, of course, for backwards compatibility with the before-times when there was no move and copying was the only logical thing to do. But I think life would be better these days if all non-trivial copies had to be requested explicitly.

> Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed

std::shared_ptr comes with std::weak_ptr. Referencing counting is rather ham-fisted approach but is certainly a solution.

> IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead.

IMO that's not a problem in the language but a problem with the engineer (misunderstanding when std::move is necessary) and the tooling (linter/static analyzer not clearly identifying that something should be moved instead, and raising a linter warning for it).

For that matter, the places where I see std::list used aren't places where "performance isn't important" but rather places where an inexperienced engineer was put in charge of implementation and a senior engineer accepted it. I can't remember the last time I accepted someone using std::list in a code review because there has always been a better design available even if it necessitated some teaching. If a stable pointer address is needed then indeed a smart pointer is the correct solution (perhaps std::vector<std::unique_ptr>). There are other reasons I've had coworkers cite for using std::list (eg constant allocation time) but that's generally resolved with std::vector.reserve(upper bound to size) or eg a slab allocator (unfortunately, I'm not aware of a standard-provided slab allocator, though to be fair I'm not very familiar with C++ standard allocators in general).

> I think life would be better these days if all non-trivial copies had to be requested explicitly

While I don't agree superficially (smells like bringing along deep-copy problems), I think the idea merits some thought experiments.

It would be fairly trivial to do that for non-plain-old-data types by deleting the copy constructor/operator (so it cannot happen implicitly) and providing a `make_copy(...)` function instead.

o11c · 2 years ago

The good news is that since C++ containers aren't special to the language, you can just implement your own wrapper classes that disable the copy ctor (and provide an explicit `.clone()` instead). Coupled with `#pragma GCC poison` it is pretty easy to blacklist legacy footguns in source files at least (though not in headers without some aggressive work).

... and yet, almost all vulnerabilities in C++ code are still written in C style, not even legacy C++.

im3w1l · 2 years ago

I mean the article explains the original author did take explicit steps to keep the Nodes fixed in memory (std::list), because they knew it could be a problem if the Nodes moved.

They just got tripped up by an obscure language feature that made the Nodes move anyway.

jnwatson · 2 years ago

This is a great reminder of the pox that was Microsoft of the early part of the millennium. Besides an allergy to investing in web standards, they were woefully behind in their language support. Their non-adoption of modern C++ standards held client security back for a decade, and arguable held language standards development back.

pjmlp · 2 years ago

There is a certain irony complaining about Microsoft, while praising everyone else in regards to C and C++ compilers, as if outside the beloved GCC, in a age where clang did not exist, the other proprietary compilers were an example of perfection.

Apparently the folks didn't learn their lesson with Web standards, given the power they gave Google to transform the Web into ChromeOS.

That hardly seems fair.

(Microsoft was doing this to C++ well before the early part of the millennium…)

Edit: and in non-joke fairness, Microsoft has really come a long way on this regard.

jinchengJL · 2 years ago

Having only worked with gcc and clang, OP’s code looked completely fine to me and I was baffled why many comments think the code is at fault. Judging by this page [1], I agree this is entirely MSVC’s doing.

[1] http://howardhinnant.github.io/container_summary.html

Yeah, Microsoft absolutely should've made their containers use throwing move-constructors in violation of the standard, in order to better adhere to the standard.

In reality, it's a defect in the standard, plus a quality-of-implementation issue in the Microsoft's C++ Standard Library.

fransje26 · 2 years ago

I must be misunderstanding something from this article. With:

struct Node {

    std::vector<Connection> connections;

};

struct Connection {

    Node* from, to;

Does this mean that to create the vector of connections, Nodes are created, and references are taken to store in the Connections? And then the Nodes are stored in the list, with std::move()?

I don't understand why you would want to go down that road. Intuitively, I would assume that you are not safe from an object copy somewhere down the line and your graph then comes crashing down like a house of cards. Wouldn't it make more sense to store the nodes as pointers? If you like to live dangerously, something like:

struct Graph {

    std::vector<std::list<Node\*>> nodes;

Or better:

    std::vector<std::list<std::unique_ptr<Node>>> nodes;

The later will give you plenty of warnings if you do not copy Nodes around with std::move().

Or less performant, but maybe safer, std::shared_ptr<Node>, together with:

    std::weak_ptr<Node> from, to;

so that you have some check guarantees before access?

I don't think you're misunderstanding, it's a strange choice to make even if the example here loses context to make it easier to make the point. It wasn't code that anyone currently here wrote and because it worked with the old compiler, nobody really touched it. I believe the nodes are pushed into the list before adding connections but I don't think that changes the point you're trying to make. That said, one's intuition is different from another's and I don't think it's unfair to assume that pushing into a vector of lists won't cause every existing list to be copied. For that to be the actual behavior is kind of disgusting. It may make more sense to create pointers here but that's a larger change to deal with and ensure correctness versus just swapping the vector out for another list. I don't claim to like that solution but it seems to me like legacy C++ code in general is fragile enough as it is, the less I have to change to fix bugs the better.

Thank you for the clarification!

> It may make more sense to create pointers here but that's a larger change to deal with and ensure correctness versus just swapping the vector out for another list. I don't claim to like that solution but it seems to me like legacy C++ code in general is fragile enough as it is,

That's more than fair enough. In such code, more often than not, you make what you think is a small change, and you end up with an entire cascade of unexpected side-effects that pop-up because the original assumptions and the testing scope of those assumptions were lost. A can of worms best left unopened.

By the way, wouldn't such an error be something that gets detected by the "new" ASAN functionality that has been added to the newer MSVC toolchain you are using?

wakawaka28 · 2 years ago

This is a noob mistake, not a huge mystery. It's not always wrong to store raw pointers to STL container elements, but if you do then you must take care of reallocations.

If you find storing pointers to elements too perilous, you should probably just make a container of pointers instead.

Deleted Comment

D13Fd · 2 years ago

> This is a noob mistake, not a huge mystery.

The interesting part is why it worked in VS2013 but failed in VS2022.

>The interesting part is why it worked in VS2013 but failed in VS2022.

I thought the explanation he came up with was interesting but such a change could have easily been brought about by changes to reallocation parameters in the implementation.

gsliepen · 2 years ago

There is a lesser known cousin to std::vector that doesn't have to move nor copy its elements when adding new elements, and that is std::deque.

vardump · 2 years ago

Right. But of course std::deque comes with a cost; iteration is slower and memory footprint is bigger.

SleepyMyroslav · 2 years ago

This is another nice reminder to people who build systems where there is only one language implementation. Implementations always change.