Initialization in C++ is bonkers (2017)

shadowdev1 · 3 months ago

Heh, low comments on C++ posts now. A sign of the times. My two cents anyway.

I've been using C++ for a decade. Of all the warts, they all pale in comparison to the default initialization behavior. After seeing thousands of bugs, the worst have essentially been caused by cascading surprises from initialization UB from newbies. The easiest, simplest fix is simply to default initialize with a value. That's what everyone expects anyway. Use Python mentality here. Make UB initialization an EXPLICIT choice with a keyword. If you want garbage in your variable and you think that's okay for a tiny performance improvement, then you should have to say it with a keyword. Don't just leave it up to some tiny invisible visual detail no one looks at when they skim code (the missing parens). It really is that easy for the language designers. When thinking about backward compatibility... keep in mind that the old code was arguably already broken. There's not a good reason to keep letting it compile. Add a flag for --unsafe-initialization-i-cause-trouble if you really want to keep it.

C++, I still love you. We're still friends.

juliangmp · 3 months ago

> When thinking about backward compatibility... keep in mind that the old code was arguably already broken. There's not a good reason to keep letting it compile.

Oh how I wish the C++ committee and compiler authors would adopt this way of thinking... Sadly we're dealing with an ecosystem where you have to curate your compiler options and also use clang-tidy to avoid even the simplest mistakes :/

Like its insane to me how Wconversion is not the default behavior.

motorest · 3 months ago

> Oh how I wish the C++ committee and compiler authors would adopt this way of thinking...

I disagree. If you expect anyone to adopt your new standard revision, the very least you need to do is ensure their code won't break just by flipping s flag. You're talking about production software, many of which has decades worth of commit history, which you simply cannot spend time going through each and every single line of code of your >1M LoC codebase. That's the difference between managing production-grade infrastructure and hobbyist projects.

monkeyelite · 3 months ago

And the cost of this is that every time I open a project in another language it’s broken and I have to make changes to fix all their little breaking changes.

zahlman · 3 months ago

>Oh how I wish the C++ committee and compiler authors would adopt this way of thinking

Many different committees, organizations etc. could benefit, IMO.

josefx · 3 months ago

> keep in mind that the old code was arguably already broken

The code is only broken if the data is used before anything is written to it. A lot of uninitialized data is wrapped by APIs that prevent reading before something was written to it, for example the capacity of a standard vector, buffers for IO should only access bytes that where already stored in them. I have also worked with a significant number of APIs that expect a large array of POD types and then tell you how many entries they filled.

> for a tiny performance improvement

Given how Linux allocates memory pages only if they are touched and many containers intentionally grow faster then they are used? It reduces the amount of page faults and memory use significantly if only the used objects get touched at all.

riehwvfbk · 3 months ago

You are very very unlikely to trigger Linux overcommit behavior by not initializing a member variable. It's even more unlikely for this to be a good thing.

In effect, you are assuming that your uninitialized and initialized variables straddle a page boundary. This is obviously not going to be a common occurrence. In the common case you are allocating something on the heap. That heap chunk descriptor before your block has to be written, triggering a page fault.

Besides: taking a page fault, entering the kernel, modifying the page table page (possibly merging some VMAs in the process) and exiting back to userspace is going to be A LOT slower than writing that variable.

OK you say, but what if I have a giant array of these things that spans many pages. In that case your performance and memory usage are going to be highly unpredictable (after all, initializing a single thing in a page would materialize that whole page).

OK, but vectors. They double in size, right? Well, the default allocator for vectors will actually zero-initialize the new elements. You could write a non-initializing allocator and use it for your vectors - and this is in line with "you have to say it explicitly to get dangerous behavior".

fooker · 3 months ago

> keep in mind that the old code was arguably already broken.

Reminder than compiler devs are usually paid by trillion dollar companies that make billions with 'old code'.

tails4e · 3 months ago

Especially when doing the right/safe thing by default is at worst a minor performance hit. They could change the default to be sane and provide a backwards compatible switch to pragma to revert to the less safe version. They could, but for some reason never seem to make such positive changes

redandblack · 3 months ago

stupid question as I have not tpuched C++ since the 90s - can the IDEs not do this with all these now almost universal linters and AI assists. Maybe something that prompts before a commit and autoprompts before/after fixes to only the inititaization. Maybe simple as a choice in the refactoring menu? Rust - where are you for proposing this fix to C++ or, is it javascript?

vrighter · 3 months ago

that's the undefined keyword in zig. I love it. It makes UB opt-in and explicit

Deleted Comment

loeg · 3 months ago

Compilers should add this as a non-standard extension, right? -ftrivial-auto-var-init=zero is a partial solution to a related problem, but it seems like they could just... not have UB here. It can't be that helpful for optimization.

Matheus28 · 3 months ago

Yes but it’s not portable. If zero initialization were the default and you had to opt-in with [[uninitialized]] for each declaration it’d be a lot safer. Unfortunately I don’t think that will happen any time soon.

MichaelRo · 3 months ago

>> Of all the warts, they all pale in comparison to the default initialization behavior.

Come on. That's nothing compared to the horrors that lay in manual memory management. Like I've never worked with a C++ based application that doesn't have crashes lurking all around, so bad that even a core dump leaves you clueless as to what's happening. Couple OOP involving hundreds of classes and 50 levels deep calls with 100s of threads and you're hating your life when trying to find the cause for yet another crash.

bluGill · 3 months ago

I can write bad code in rust too. Rust makes it more difficult, but if you try hard you can abuse it to get the same hundreds of classes and 50 level deep calls, and 100s of threads. You can even do manual memory management in Rust - it isn't built into the language but you can call system APIs to allocate memory if you really want to be stupid. Don't do that is the answer.

Good programmers have long ago written best practices guides based on hard learned experience. Newer languages (like Rust) were designed by people who read those guides and made a language that made using those features hard.

kaashif · 3 months ago

50 levels deep? With some of the template metaprogramming I've seen, looking at just the types for just one level will not only fill your screen, but take up megabytes on disk...

motorest · 3 months ago

> Come on. That's nothing compared to the horrors that lay in manual memory management. Like I've never worked with a C++ based application that doesn't have crashes lurking all around, so bad that even a core dump leaves you clueless as to what's happening.

Have you tried fixing the bugs in your code?

That strategy has been followed by people writing code in every single language, and when used (even with C++) you do drive down the number of these crashes to a residual/purely theoretical frequency.

Scenarios such as those you've described are rare. There should be more to them than the tool you're using to do your job. So why blame the tool?

nlehuen · 3 months ago

Not to worry, there is a 278 page book about initialization in C++!

https://leanpub.com/cppinitbook

(I don't know whether it's good or not, I just find it fascinating that it exists)

bhk · 3 months ago

Wow! Exhibit 1 for the prosecution.

kazinator · 3 months ago

C++ doesn't have initiation hazing rituals, but initialization hazing rituals. (One of which is that book.)

codr7 · 3 months ago

That's what I've been saying, every line of C++ is a book waiting to be written.

nitrogen99 · 3 months ago

Well, authors are incentivized into writing long books. Having said that it obviously doesn't take away the fact that C++ init is indeed bonkers.

harry8 · 3 months ago

What would be the incentive for making this a long book? Couldn't be money.

agent327 · 3 months ago

The answer to this is to replace default-init by zero-init. This removes all special cases and all surprise, at a cost that is minimal (demonstrated experimentally by its implementation in things like Windows and Chrome) or even negative. Doing so would make software safer, and more reproducible, and it would make the object model more sound by removing the strange zombie state that exists only for primitive types.

Of course we should provide a mechanism to allow large arrays to remain uninitialized, but this should be an explicit choice, rather than the default behaviour.

However, will it happen? It's arguably the easiest thing C++ could do to make software safer, but there appears to be no interest in the committee to do anything with safety other than talk about it.

shultays · 3 months ago

  Of course we should provide a mechanism to allow large arrays to remain uninitialized, but this should be an explicit choice, rather than the default behaviour.

First you are saying "cost is minimal even negative" and then already arguing against it on the next paragraph.

ddulaney · 3 months ago

The general cost over a several large codebases has been observed to be minimal. Yet, there are specific scenarios where the costs are real and observable. For those rare cases, an explicit opt-in to risky behavior makes sense.

monkeyelite · 3 months ago

We all agree, poor defaults were chosen in C++ across the board. we have learned a lot about languages since then.

The question is what to do about it - balancing the cost of change to code and to engineers who learned it.

> but there appears to be no interest in the committee to do anything with safety other than talk about it.

There is plenty of interest in improving C++ safety. It’s a regular topic of discussion.

Part of that discussion is how it will help actual code bases that exist.

Should the committee do some breaking changes to make HN commenters happier, who don’t even use the language?

112233 · 3 months ago

There is no hope for committee. In C++33 we will probably have variables defined as

    int const<const> auto(decltype(int)) x requires(static) = {{{}}};

And when asked how on earth did this happen and why, there will be the same "we must think about the existing code, the defaults were very poor"

Meanwhile they absolutely could make sane defaults when you plonk "#pragma 2033" in the source (or something, see e.g. Baxter's Circle compiler), but where would be the fun of that.

They still use single pass compiling (and order of definitions) as the main guiding principle...

agent327 · 3 months ago

I was not proposing sweeping changes to all the defaults in C++, I was proposing to adopt a single, specific change. That change does not break any existing code, removes pitfalls from the language, and has already been tried by industry and found to be beneficial. Why is it not in C++26?

https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2754r0... provides what appears to be the answer to this question: "No tools will be able to detect existing logical errors since they will become indistinguishable from intentional zero initialization. The declarations int i; and int i = 0; would have precisely the same meaning." ...yes, they would. _That's the point_. The paper has it exactly the wrong way around: currently tools cannot distinguish between logical error and intentional deferred initialization, but having explicit syntax for the latter would make the intention clear 100% of the time. Leaving a landmine in the language just because it gives you more warnings is madness. The warning wouldn't be needed to begin with, if there were no landmine.

I'm not sure what you mean with "who don't even use the language". Are you implying that only people that program professionally in C++ have any stake in reliable software?

BlackFly · 3 months ago

> Should the committee do some breaking changes to make HN commenters happier, who don’t even use the language?

As phrased, you clearly want the answer to this question to be no, but the irony there is that that is how you kill a language. This is simply survivor bias, like inspecting the bullet damage only on the fighter planes that survive. You should also be listening to people who don't want to use your language to understand why they don't want that, especially people that stopped using the language. Otherwise you risk becoming more and more irrelevant. It won't all be valuable evidence, but they are clearly the people that cannot live with the problems. When other languages listen, better alternatives arise.

BlackFly · 3 months ago

I would say the answer is to replace both with explicit init unless you explicitly say some equivalent of "Trust me, bro," to the compiler. Some structs/data (especially RAII structs backing real resources) have no sensible default or zero.

But yeah, most structs have a good zero value so a shorthand to create that can be ergonomic over forced explicitness.

agent327 · 3 months ago

That would be a breaking change though. Having default zero-init would apply to existing source and convey it's benefits simply by recompiling, without any engineering hours being required.

vitaut · 3 months ago

https://www.reddit.com/r/ProgrammerHumor/comments/8nn4fw/for...

gnabgib · 3 months ago

Small discussion at the time (42 points, 6 comments) https://news.ycombinator.com/item?id=14532478

Related: Initialization in C++ is Seriously Bonkers (166 points, 2019, 126 points) https://news.ycombinator.com/item?id=18832311

ts4z · 3 months ago

This is a specialization of the general statement that C++ is bonkers.

MichaelMoser123 · 3 months ago

and putting structure instances into an array so that you can refer to them via indexes of the array entries (as the only escape from being maimed by the borrow checker) is normal?

ts4z · 3 months ago

C++ would be bonkers even if Rust did not exist.

lblume · 3 months ago

Unlike Rust, C++ at least has specialization...

kazinator · 3 months ago

> This rule makes sense when you think about it

No, it is bonkers; stick to your consistent point, please.

These two should have exactly the same effect:

  bar() = default;       // inside class declaration

  bar::bar() = default;  // outside class declaration

The only difference between them should be analogous to the difference between an inline and non-inline function.

For instance, it might be that the latter one is slower than the former, because the compiler doesn't know from the class declaration that the default constructor is actually not user-defined but default. How it would work is that a non-inline definition is emitted, which dutifully performs the initialization, and that definition is actually called.

That's what non-bonkers might look like, in any case.

I.e. both examples are rewritten by the compiler into

  bar() { __default_init; }

  bar::bar() { __default_init; }

where __default_init is a fictitious place holder for the implementation's code generation strategy for doing that default initialization. It would behave the same way, other than being inlined in the one case and not in the other.

Another way that it could be non-bonkers is if default were simply not allowed outside of the class declaration.

  bar::bar() default;  // error, too late; class declared already!

Something that has no hope of working right and is easily detectable by syntax alone should be diagnosed. If default only works right when it is present at class declaration time, then ban it elsewhere.

ValtteriL · 3 months ago

The book Beautiful C++: 30 Core Guidelines for Writing Clean, Safe, and Fast Code recommends initializing/providing default values for member variables in default member initializers intead of the initializer list used here.

""" Default member initializers define a default value at the point of declaration. If there is a member that cannot be defined in such a way, it suggests that there may be no legal mechanism by which a default constructor can be defined. """