My favourite C++ footgun

dvt · 4 years ago

The nested function execution interleaving blew my mind (imagine having to debug that), so I had to look it up. Apparently, interleaving is prohibited in C++17 onward. So this:

> The second one is there since this is C++, priority() might raise an exception, meaning that new Widget will be called, but never passed to std::shared_ptr, and thus never deleted!

Is now impossible (thank God!). See a full SO discussion here[1]. Stuff like this makes me so happy I don't write C++ anymore, but the gist of it is (from the standard):

> For each function invocation F, for every evaluation A that occurs within F and every evaluation B that does not occur within F but is evaluated on the same thread and as part of the same signal handler (if any), either A is sequenced before B or B is sequenced before A.

[1] https://stackoverflow.com/questions/38501587/what-are-the-ev...

ncmncm · 4 years ago

In three decades' use of C++, of all generations going back to original cfront, I have never encountered any difficulty, confusion, or actual problem arising from this phenomenon. It is a favorite of language lawyers and people borrowing trouble.

dataflow · 4 years ago

I've actually run into this problem before. You just haven't, is all.

CJefferson · 4 years ago

I've frequently run into a variant of this:

    f(**x, g());

The number of stars evaluated before 'g()' (0, 1 or 2) varies between compilers, and the program I was working on would break if only one * was evaluated before g.

blake1 · 4 years ago

Agreed. This code just looks weird to me. Throwing from a constructor? Holding the result of new() in a temporary? This is just asking for trouble.

Another favorite “criticism” I have of C++ goes along the lines of “but someone could overload operator+ to be division!” I have only seen that as a contrived critic’s example, or a joke.

TwoBit · 4 years ago

> Is now impossible

I don't see how that is so. C++ 17 allows new Widget to complete and then the priority() call to execute and throw before both are passed to shared_ptr(), thus creating a leak. Your cited example [1] doesn't leak because both arguments are shared_ptr. Seems to me that C++ 17 does indeed solve this latter case but not the former.

dvt · 4 years ago

Per [1]:

> evaluations of A and B are indeterminately sequenced: they may be performed in any order but may not overlap: either A will be complete before B, or B will be complete before A. The order may be the opposite the next time the same expression is evaluated.

And:

> 21) Every expression in a comma-separated list of expressions in a parenthesized initializer is evaluated as if for a function call (indeterminately-sequenced)

Emphasis mine. What the above basically says is that in the case of some function `f(A, B)`, the arguments `A`, and `B`, are what's known as "indeterminately-sequenced" -- this mean that their execution cannot be interleaved (overlap) -- but they still individually execute in a non-deterministic order (A before B, and sometimes B before A)!

With that said, the good news is that B can now never throw in the middle of A, which is precisely what we have in OP's example.

[1] https://en.cppreference.com/w/cpp/language/eval_order

nyanpasu64 · 4 years ago

> both are passed to shared_ptr()

priority() is not passed to shared_ptr(), only to processWidget().

TwoBit · 4 years ago

OK, I must have mistaken what the original was, because I thought it was like this:

processWidget(new Widget, priority());

I think I saw the above code elsewhere.

armchairhacker · 4 years ago

I always wonder if compilers actually do weird optimizations taking advantage of semantics like those.

Another example is when undefined behavior occurs, like (iirc) signed integer overflow. Compilers can technically do anything they want, including weird things. But how often does that actually happen?

rocqua · 4 years ago

If you write something like

    int a;
    ...
    int b = a
    if (a + 1 > b){
       foo();
    }

Then compilers will use the fact that integer overflow is undefined behavior. This allows the compiler to assume the `if` statement will always be true. And hence it removes this check.

This isn't just the compiler removing a check the programmer wanted. This could occur because of constants, where other constants make the check meaningful. It could also occur after in-lining a specific function call.

Really, the only case where this optimization would be bad is if the author was trying to detect overflow. Sadly, detecting if something would cause undefined behavior in C code is hard. Because you cannot try the thing.

SAI_Peregrinus · 4 years ago

Others have mentioned that compilers take advantage of these things for optimization, but haven't mentioned that compilers will still often do unexpected things when encountering UB with optimizations off.

Turning optimizations on doesn't suddenly undefine any behavior, it just changes what the compiler does when encountering some of those behaviors (among other, non-UB based optimizations). A non-optimizing compiler and an optimizing compiler both have to conform to the standard.

blake1 · 4 years ago

Different architectures have their own function passing convention. It’s convenient to evaluate left-to-right, pushing as you go, if that matches the arch. And vice versa.

vnorilo · 4 years ago

Most optimization uses of UB that I've seen is to take full liberty to remove branches and instructions. For example, if the result of UB is used in a comparison, the compiler is free to pick whichever side of the branch (even inconsistently) and elide. It is actually quite common with for example loop unrolling and vectorization (ignoring signed overflow of the indvar is a huge boon)

UncleMeat · 4 years ago

The answer is yes. Compiler authors absolutely take advantage of these things for optimizations.

As for paranoia about undefined behavior? Compilers won't emit a program to call you a pizza or format your disk. There are risks, but those aren't real ones.

cratermoon · 4 years ago

There was a post here not too long ago about how C/C++ compiler writers have consistently taken the wiggle room of "undefined behavior" to do terrible, terrible, damage to otherwise fine code. https://news.ycombinator.com/item?id=27221552

saagarjha · 4 years ago

In this case there is no undefined behavior, the order is just unspecified.

rocqua · 4 years ago

Note that this post was heavily discussed and disputed. The question "is undefined behavior treated correctly?" is rather contentious.

It certainly is the case that most compiler writers have decided this is okay. And I personally tend to agree with them.

eklitzke · 4 years ago

The author writes "this might be why there is std::make_shared", but that's not really why it exists. When you create a std::shared_ptr it needs to allocate memory for a control block structure (which tracks the reference count) as well as the memory for the object itself. If you write std::shared_ptr<Foo>(new Foo) this will result in two memory allocations, one for the Foo object and the other for the std::shared_ptr control block. If you write std::make_shared<Foo>() then a single allocation will happen which has enough space for both the control block and the Foo object, and then placement new is used to initialize the memory for both structures. So std::make_shared exists to reduce the number of memory allocations that are being made, not for the reason suggested here.

haldean · 4 years ago

There's a crazy downside to make_shared that I learned recently because of this: if you have a weak pointer to a shared thing, and the refcount for the shared thing drops to zero, the weak pointers will keep the allocation for the object "alive", because they still need access to the remnant and the remnant was created in the same allocation as the object so they can't be freed separately. So now I only use make_shared if I know for sure there won't be a weak_ptr pointing at it (or if the base object has a relatively small memory footprint after it's been destructed).

Asooka · 4 years ago

Won't the object be destructed though? So while the memory for the object is kept, the memory for the object's heap-allocated members is not. I.e. if you have a 100mb string, after only weak references are left you won't have 100mb memory taken, but only sizeof(string) + sizeof(control block).

ot · 4 years ago

Yeah, that's definitely something to be aware of. It's usually not an issue as most objects have small footprint (and any allocations they in turn hold would be released when the strong refcount goes to 0).

quietbritishjim · 4 years ago

It exists for both reasons.

That's why there's also a std::make_unique even though it doesn't have a control block (although it was added later, but that was just an oversight).

MaxBarraclough · 4 years ago

> It exists for both reasons.

I believe this is correct. Here's [0] some old Boost documentation on Boost's make_shared which inspired the C++ standard's make_shared. It mentions both reasons.

[0] https://www.boost.org/doc/libs/1_67_0/libs/smart_ptr/doc/htm...

asveikau · 4 years ago

I think the idea that you are also relying on a presumably well tested library to get the exception corner cases right is also noteworthy. Memory leaks on allocation failure are pretty common in naive code, and a good thing to handle in a library where it can get well thought out.

Koshkin · 4 years ago

This is very important. Try to hide most of complexity in a library, and then unit-test the hell out of it.

overgard · 4 years ago

My (least) favorite footgun is "auto" when it comes to references.

If you want to get a pointer from something, you would write this:

    auto fooPtr = mywidget.getPtr();
    fooPtr->doCoolStuff();

But we all know references are better than pointers right? So we should just write...

   auto fooRef = myWidget.getRef();
   fooRef.doCoolStuff()

The problem is... this is valid and will probably not do what you want. It will make a copy. What you want is actually

   auto& fooRef = myWidget.getRef();

I once spent an entire day debugging a bizarre crash because of that. I was asking for a reference to a scene graph, and what was actually happening is I was getting a clone of the scene graph (which was an object that couldn't be safely copied), and then destroying a bunch of shared pointers when the function returned. Fun times.

jchw · 4 years ago

As others have pointed out, the idiomatic solution here is to delete the copy constructor if an instance is unsafe to copy — however, I suspect the reason why auto behaves differently from other inference in requiring this to be explicit is probably something along the lines of making it harder to have a dangling reference. It’s shockingly easy to wind up with a dangling reference with fairly innocuous code, something like QString().toUtf8().data() so maybe this makes sense. (Doesn’t help for that case since it’s a pointer to raw data, but you get the picture.)

overgard · 4 years ago

I think, all things considered, the way it works is probably the best way, since you might want to use "auto" to actually make a copy. It's just very surprising, because when I have explicit types I'm used to looking for that sort of error, but with an auto I wasn't (at the time) used to looking for that.

nyanpasu64 · 4 years ago

Worse yet is an object which is safe to copy in some cases, but you intended to not copy it at the moment, and you create a dangling reference to a field within. I wish C++ copy constructors were explicit by default. (But declaring the copy constructor explicit turns off aggregate initialization, so I can't do that.)

bregma · 4 years ago

Wait, you had an object that was unsafe to copy, but you told the computer it was safe by not deleting the copy constructor? I guess it should have done what you wanted, not what you told it.

suprfsat · 4 years ago

Simply by using C++, you tell the computer all sorts of things, such as "I know the entire language by heart", and "I have a death wish".

Dylan16807 · 4 years ago

> told the computer it was safe

> by not deleting

I think there's a problem here.

contravariant · 4 years ago

Yeah I learned that the hard way when I found out that std::vector apparently feels free to move your objects around in memory for you, no matter if you point to them from somewhere else.

I mean in hindsight it's obvious, but still not exactly what I wanted to happen.

For reference, what I did does actually work, temporarily.

secondcoming · 4 years ago

No, everyone gets LHS auto wrong at the start. They don't realise it makes a copy. I have to point it out every other code review.

overgard · 4 years ago

Well, the scene graph code wasn't code I wrote, and this was back in 2014. But sure, obviously it was an error on my part... we are talking about footguns afterall, I'm the one that pulled the trigger.

criddell · 4 years ago

If you develop on Windows and are using Visual Studio 2019 "auto fooRef = myWidget.getRef()" will get a little squiggle under it to warn you that you are making a copy.

overgard · 4 years ago

I am very happy tooling is helping to make these sorts of mixups easier to find, although sadly C++ is such a hard language to write tooling for that those kinds of nice features are pretty rare. I remember back when I had to work in Xcode a few years ago I could rarely even get basic intellisense to work.

Koshkin · 4 years ago

> and aren't using Visual Studio

OK, I tried this in Notepad, and it did not work.

decker · 4 years ago

Sounds like the root cause was due to not obeying the rule of five: https://en.wikipedia.org/wiki/Rule_of_three_(C%2B%2B_program...

ncmncm · 4 years ago

Nowadays, Rule of Zero. Let the compiler generate all the special members implicitly, when the members can each do their own cleanup. This turns out to be quite often.

overgard · 4 years ago

Probably! I don't remember the code exactly, this was back in 2014 and I think most of the code I had to consume was primarily written by a 24 year old out of college (smart of course, but C++ takes a long time to get good at).

amluto · 4 years ago

> What you want is actually > > auto& fooRef = myWidget.getRef();

That variant has issues if getRef() actually returns a temporary object. You may want auto&&.

Asooka · 4 years ago

If it returns a temporary, you'll get a compile-time error, as the temporary cannot be bound to a non-const ref. I would prefer "const auto&" as you shouldn't be modifying temporaries, or "auto x{std::move(myWidget.getRef())}" if I want a mutable value and to avoid copying the return value of getRef.

gpderetta · 4 years ago

auto by default removes top level const, reference qualification, and decays arrays and function types to pointer, same as for template argument deduction [1]; this is also true when auto is used as for return type deduction.

You can use decltype(auto) to preserve the actual type of the rhs.

[1] as discussed elsewhere initializer lists behave differently, but initializer_list itself was a mistake to start with.

leetcrew · 4 years ago

I too have fallen victim to this and similar traps. it's a good idea to use a deleted copy constructor when you know it's not safe to copy the object. you can't completely stop others (or yourself) from doing bad things, but you can at least give them a moment to reflect on what they are doing.

nyanpasu64 · 4 years ago

I wish C++ copy constructors were explicit by default.

gpderetta · 4 years ago

The problem is that structures are implicitly (and trivially) copyable in C and requiring adding an explicit copy constructor would break interoperability. It would also break trivial copiability which is a different issue.

So, yes, backward compatibility and historical baggage.

Note that if you use explicit wrappers for all object-owning pointers you would get either the right behavior or a compilation error.

secondcoming · 4 years ago

It's not about safety, it's about efficiency. It's usually an unintended copy. For example, if your function `foo` returns a `const std::string&` the code `auto x = foo()` creates a copy.

einpoklum · 4 years ago

So, if I write:

int x; auto y = x;

do you expect y to be a reference? Surely not.

No, my friend, auto is for values, not references. You want a reference - you have to say so.

(And same goes for rvalue references - auto&& .)

overgard · 4 years ago

My point was simply that auto to assign pointers doesn't cause them to lose their pointiness, but using auto to assign a reference causes them to lose their referenceness. Yes I understand why it works the way it does. I'm not saying it's wrong just that it's been an occasional footgun.

2bitencryption · 4 years ago

My favorite C++ footgun is creating a Vector with some items, taking a reference to, say, &vec[3], then adding another item to the vec, then trying to use the reference from the previous step.

If you write C++ it might be obvious what the problem is.

If you don't, this will absolutely ruin your entire day.

The worst part is, 95% of the time, it will probably work without issue.

But eventually, pushing a new item to the Vector will trigger a relocation of the whole vector, which will invalidate your reference and bring down production. Have fun debugging that.

nicoburns · 4 years ago

Yep. I learnt about this when I was learning Rust (which makes this a compile-time error). I was very glad I didn't have to learn this and the 100 other things like that seem to exist in C++ the hard way!

failwhaleshark · 4 years ago

Footguns might keep you prepared for the zombie apocalypse, but they'll inevitably make it difficult to walk. I'm glad I don't currently use C++ in production, but that may change soon. ):

kaik · 4 years ago

I kid you not, I spent a full working day debugging this exact same issue (taking a pointer to a vector element, before adding more elements). Very obvious if you understand how C++ and vectors work, yet it took me forever to realize, and it was miserable…

saagarjha · 4 years ago

Heh, you can run into this even if you understand how iterator invalidation works. Once you see the bug it's easy to understand, but finding it might not be…

duped · 4 years ago

This, and the entire class of iterator invalidation bugs that force you to memorize which collections are ok for which applications.

nyanpasu64 · 4 years ago

Rust takes the approach of flat-out not letting you mutate any collection while you have any references to its contents. It eliminates all dangling pointer bugs... I don't know if it rules out any useful use cases of collections with iterator stability. I think any C++ collection holding unique_ptr is stable (pushing to the collection doesn't invalidate the target of the unique_ptr), and Rust doesn't have an safe ergonomic way to achieve that (perhaps Pin<Box<MagicCell<T>>>, but we don't yet have a MagicCell that makes &mut MagicCell<T> not noalias).

pjmlp · 4 years ago

If you use a proper C++ compiler, the debug builds throw a runtime exception or abort on invalidation misuse, and it can be enabled for release build as well.

pizza234 · 4 years ago

Interestingly, AFAIK also Golang suffers from something similar: creating a slice from an array, then performing an operation on the array, that causes resizing - the slice will keep pointing to the old array data.

krylon · 4 years ago

You can run into the same problem in C, using malloc/realloc. realloc, in fact, makes for a nasty footgun, too (and remains, of course, available in C++).

flohofwoe · 4 years ago

An important difference is that in C it is usually obvious that a reallocation is happening while in the C++ stdlib, memory management is usually hidden (important to note that this isn't a design fault of the C++ language, but of the C++ stdlib, unfortunately the two are more and more entangled in newer C++ versions). Because of the fact that memory management is hidden in C++, I find claims that C++ is more memory-safe than C quite hilarious. If it works, it works fine, but if anything goes wrong (which is quite easy to achieve) it's much harder to find the actual problem in C++ than in C.

turminal · 4 years ago

Yes, but unlike with realloc and a custom dynamic array, C++ references and smart pointers and containers are half-smart and do all sorts of things on their own. Knowing when they will and when they will not be "smart" makes writing C++ really difficult sometimes.

flyingswift · 4 years ago

What is the safe way to achieve the same result?

Kranar · 4 years ago

The concept behind this is reference stability, and if you need a collection that has stable references, you must introduce a level of indirection, that is, instead of a vector<T>, you use a vector<unique_ptr<T>> and then you can take references as follows:

    auto& r = *some_vector[0];

yongjik · 4 years ago

In addition to other answers, sometimes you do know the final/max length of the vector when you construct it. In that case reserve() can reserve the necessary space, and as long as you stay under the limit all the addresses will remain valid.

(Though it's still pretty brittle, so you may want to add a ton of comments to warn yourself in the future...)

Deleted Comment

TylerGlaiel · 4 years ago

store the index 3 as an int instead of &vec[3]

bentcorner · 4 years ago

I can understand people used to a HLL running into this. It's useful to read the documentation: https://en.cppreference.com/w/cpp/container/vector/insert mentions that references/iterators may be invalidated.

I'm not going to pretend that I understand everything on that site (particularly anything about complex templates and things like SFINAE) but often there's comprehensible stuff in there. It can be really helpful.

throwaaskjdfh · 4 years ago

Why does it seem like C++ is constantly replacing its subtle hazards with even more subtle hazards? It's like they never go away, they just turn into something more obscure.

plorkyeran · 4 years ago

This is an example of a very old subtle hazard that has been removed entirely (the ordering which causes problems is no longer permitted in C++17).

Deleted Comment

overgard · 4 years ago

I think the simple reason is that developer ergonomics are the last priority. Zero cost abstractions and expressive features win the day even if you need to hold 19 obscure things in your head to use it right. (I like C++ btw, but it's not a friendly language). I guess those values kind of work though.. I mean, we keep using it.

AnimalMuppet · 4 years ago

In addition to what plorkyeran said, "more obscure" can mean either "harder to understand" or "less often encountered". The second one is progress.

Kranar · 4 years ago

Because the people involved with C++ have a toxic aversion to learning from other languages. Anytime issues are brought up the people involved with C++ standardization keep yapping the line that "But C++ is not like <other language>." and then proceed to implement a half-assed version of what other languages have and then 2-3 years later people find all kinds of footguns that could have simply been avoided had proper research been performed.

The big issue is that becoming involved in the C++ standardization process is purposely arcane, requiring people to physically travel to remote locations, miss time off work, and spend a lot of money. There are literally substantial language features added to C++ that are nothing more than the work of maybe 3-4 people. These features could have benefited enormously from having 100s, if not 1000s of potential developers exercising use cases and contributing suggestions, but the standardization process is heavily gated.

duped · 4 years ago

I feel like this is a strawman.

9/10 times when there is resistance to a particular feature it's because implementing it breaks the ABI.

> requiring people to physically travel to remote locations, miss time off work, and spend a lot of money.

It's a different kind of barrier to entry, but I would be surprised if the attendees of the standards meetings aren't being compensated for it.

Koshkin · 4 years ago

Well, no doubt it all comes from good intentions. (With which, you know, the road to hell is paved.)

pshc · 4 years ago

It’s Stroustrup’s Full Employment Theorem in action.

ot · 4 years ago

In modern C++ the use of `new` and `delete` should be considered smell, for this and other reasons.

towergratis · 4 years ago

The real problem is not specific to C++, memory or shared pointers, but as the author mentions later, the fact that "function parameters evaluation order is unspecified".

The problem is similar in C as well.

`printf("%d, %d", i++, i++);` will give you different results depending on the compiler.

Koshkin · 4 years ago

The situation is even worse, due to the possible interleaving of the execution of function calls.

towergratis · 4 years ago

The root cause is the same

MaxBarraclough · 4 years ago

One of my favourites too. Here's a good StackOverflow answer on the topic. [0] Looking here [1] though, did something change in C++17 to ameliorate things? edit Turns out yes, see dvt's comment. Second edit Apparently [2] I knew this 3 years ago and forgot.

[0] https://stackoverflow.com/a/7693775/

[1] The final bullet-point in the Notes section of https://en.cppreference.com/w/cpp/memory/shared_ptr/make_sha...

[2] https://stackoverflow.com/questions/38501587/what-are-the-ev...