xorblurb (u/xorblurb)

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

pcwalton · 8 years ago

Sure. Compile at -O0.

A huge number of seemingly-trivial optimizations depend on assuming that undefined behavior will never happen. The number of optimizations that don't depend on UB in any way is quite small. For example, if you want to get pedantic about it, even automatic promotion of local variables to registers is exploiting undefined behavior—who's to say you didn't have a pointer that just happened to point to one of them?

xorblurb · 8 years ago

That's, given the question, is a really moronic answer.

There are means to apply the modern approach of optimizations beyond O0 on C without importing all kind of UB from the language level. You just have to actually proove the properties you want to rely on, instead of relying on wishful "UB => authorized by the standard => programmer fault if anything goes bad" thinking.

And promoting local variable to registers CERTAINLY NOT depends on language level UB. It would be permitted by the as-if general rule even if anything prevented it to happen in the first place, which is not the case. You don't have to have an address if nobody wants it and random pointers haver never been required to allow access to all objects, especially those who might never have an address at all. Plus nobody ever expected that anyway. People expect 2s complement. Or at least something that can not result in nasal daemons, and given C history, something that matches what the processor does. So 2s complements is at least not utterly stupid. So conflating the two is dishonest to the highest point -- except maybe if the only intended audience of the C language is now experts who e.g. write compilers. What a bright future this would be.

Hell, we dropped the hypothetical flat memory model even without strict aliasing for maybe 20 years (and probably 30, to be honest), and this had NEVER caused the kind of issues we are talking about. So don't pretend it did, just to dismiss the real issues. So ok even then it was actually probably informal as hell and in some ways worse for experts, but the amount of exploited UB was also WAY smaller. Quantity matters in this area. And context too. Do you want secure OR fast embedded systems? I would prefer reasonably secure and reasonably fast. Certainly NOT fast to execute and exploit, or more probably fast to crash pathetically.

You know very well that compiling in O0 is not going to happen in prod on tons on projects.

Don't dismiss real concerns with false "solutions", especially when mixed with proofs of your misunderstanding of the situation.

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

jpfr · 8 years ago

You are invited to bring forward concrete proposals for changes to the C standard in the relevant committee.

http://www.open-std.org/jtc1/sc22/wg14/

Since UB allows the compiler to do anything in those situations, we can reduce the amount of UB without breaking existing “legal” code. By actually defining more behavior.

If a proposal is reasonable needs to be discussed in the standardization committee. All other discussions are a nice hobby. But eventually moot.

xorblurb · 8 years ago

Those kind of discussions have various effects, some of which I believe to be far from moot.

First, they permit that some people even take notice about this situation. Few developers read the standard and even less write it or follow the discussions to change it (are they even open?) or write a compiler for it. The rationales are not even tracked [1]. It actually would be insanely hard to get a good understanding of those subjects by e.g. just reading the standard, without having those kind of discussions on forums typically used by more devs than just a few dozens of compiler writers...

[1]: but while I'm thinking about it, an impressive independent book as been written by Derek Jones: The New C Standard: An Economic and Cultural Commentary http://www.knosof.co.uk/cbook/cbook.html

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

rootbear · 8 years ago

Sorry, I didn't mean to imply that there are no 1s-compliment systems still around. The comment I was replying to suggested it was a mistake to support 1s-compliment architectures because that was the source of some of the Undefined Behavior (UB) that so troubles C. My point was that not supporting 1s-compliment wasn't an option then (and probably is not one now.) After all, without 1s-compliment support, how could I write new C apps for the Apollo Guidance Computer?

I would like to give an example of how supporting both 1s- and 2s-compliment is the source of a specific UB, but I can't take the time to do that right now, regrettably. Similarly, supporting both Big Endian and Little Endian was necessary. As was supporting ASCII, EBCDIC, and probably Fieldata and five level Baudot (looking at you, trigraphs). All of this generality made it hard to say anything useful in some areas, so you end up in some cases just calling it Undefined, since there was no consensus on how it should be defined.

xorblurb · 8 years ago

Could have been (should, in some -most?- cases) implementation defined.

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

maxlybbert · 8 years ago

> > the article complains that unsigned integer overflow is defined in C while signed integer overflow is not.

> The article complains that signed integer overflow is undefined, while unsigned integer overflow is not. There's a difference.

You are correct.

> > the Clang team believes this makes loops up to 20% faster

> I'd love to see the benchmarks they conducted.

It wouldn’t surprise me if they have micro benchmarks. I convinced myself they were telling the truth based on crude instruction counting: a loop must be converted into something like:

- loop body (assume it contains at least one machine instruction)

- increment loop variable

- (for unsigned): clear overflow flag

- test loop variable, exit loop if appropriate

- goto beginning of loop

I believe they don’t have to actually check the overflow flag, it’s OK to let the overflow happen. But they do have to clear the flag to avoid a spurious error if the flag gets looked at later.

I’m no expert, so it’s possible this oversimplifies things, but removing the instruction to clear the flag does remove a big chunk of this loop. But it’s a big chunk only because it’s a tight loop.

For the record, I happily use the foreach loop constructs in C++, D, Java, C#, Python, Perl, etc. but I originally avoided them until I saw a comment by Walter Bright that there is no performance penalty in D (the compiler rewrites the loop appropriately; there may be a penalty in Java because the feature might be defined in terms of their relatively heavy iterators).

xorblurb · 8 years ago

Microbenchmark can be very misleading compared to real impact in real programs. Still, the gains allowed by UB of signed overflow (when you are lucky enough that this transformation is actually correct in the context of what the original programmer had in mind...) are positive and probably measurable even in real programs, or if hardly measurable, maybe they at least permit a few percent of whole system perf improvement when using SMT processors. But they are more suited to other programming language than C, and actually yes, in C++ (and probably in most languages at this point) it is better both of code readability (most important!) and performance (nice to have, but very secondary compared to code readability) to use for each constructs compared to maintaining an index yourself.

Technically there is no overflow flag to reset, it is just that some CPU instruction sets do not support indexing with a 32 bit register when using 64 bits addressing, so you have to insert an extra sign extend instruction if you want to support 2s-complement signed overflow on 32 bits indexes. So you typically already don't have any cost if your indexes are already size_t/ptrdiff_t, but ptrdiff_t signed overflow is still UB according to the C standard, which is also a shame, because it allows for far less interesting "optimizations" at this point (maybe a + w >= a --> true if w is positive, but that's actually typically dangerous, because that was historically what was used to check for overflow at source level, and now the compiler is suppressing all the checks!)

So all of that really only are trade-offs, and in the modern age (with e.g. a security picture that is kind of worrying, etc.) some people are arguing that this was a terrible idea to use this approach so carelessly, in their opinion. Most experts now think that no non-trivial codebase exist with no potential UB in it, so it is not just rants all around, some even are working on the mathematical model of the llvm optimizer to make it actually sound (for now even internally, it seems that it is not -- so unfortunately with this approach of optimisation for now there is no mathematical justification as for why the optimizations performed are actually correct even with the hypothesis of strict conformance to the C standard, so I let you imagine what happens in practice when almost no program is actually conforming...)

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

maxlybbert · 8 years ago

I found the title a little misleading: there isn’t much about where the standard is heading or even where it’s been.

I would vote for the title “a rant on undefined behavior in C.”

——

Simple example: the article complains that unsigned integer overflow is defined in C while signed integer overflow is not. There is very little in the article about this except for the claim that the performance for incrementing a signed int should match the performance for incrementing an unsigned int. The writer refuses to believe otherwise, even though he accepts that undefined behavior “supposedly” allows the compiler to omit overflow checks.

It’s the “supposedly” that makes this a rant. The article’s sources mention that Clang does omit overflow checks and that the Clang team believes this makes loops up to 20% faster (“up to” because the optimization can’t be applied to all loops, and the performance increase will depend on how tight the loop is, i.e., how much overhead there is in incrementing and testing the loop variable in comparison to the loop body).

xorblurb · 8 years ago

(Technically it is not really an overflow check the problem in that case, but typically more the need of extending an index from 32 to 64 bits because the instruction set in 64 bits do not support indexing with 32 bits index)

If you want perf in a critical tight loop that has been identified by some profiling, you can easily optimize it yourself (and yes, typically bumping the counter type to size_t / ptrdiff_t is enough to optimize, but the advantage you have is that you can actually check that this transformation is sound according of the intent of the programmer of the original code, whether in the context of the C programming language, the compiler doesn't even try to check that itself, it merely blindly makes the hypothesis that there is absolutely no UB ever, and to hell if there actually was)

But anyway we have since invented sane languages in which we BOTH have safety, and the capability to apply that kind of transformations that have a small but positive perf impact. In some cases it is actually way easier to optimize using those approaches from safe languages than from mostly unchecked ones culturally full of type punning and other kind of insanity (like C is), and this is not even a recent discovery: there is a reason for why number crunching stuck to Fortran. So C should be kind of considered as a legacy programming language, at this point. A very important one for historical reasons, but one should think twice before writing new critical infrastructure with it...

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

8_hours_ago · 8 years ago

Large and complex pieces of software, and operating systems in particular, tend to be tightly tied to their compilers. It is never easy and in some cases practically impossible to port to a different compiler. I expect that Microsoft has come to terms with the fact that Windows will only support being compiled by their compiler.

When a different toolchain introduces a new feature for finding bugs that would be useful for Windows, the Microsoft compiler team can add that functionality to their own tools instead of porting Windows. An advantage of this is that they can customize the feature for exactly their use case. Yes, this is the definition of NIH syndrome, but that’s how large companies work.

xorblurb · 8 years ago

Large pieces of sw have been able to switch some plateform specific code to other compilers (chrome for windows comes to mind).

This is probably way smaller than the whole Windows, but I would not be surprised if some MS dev are already internally compiling some of their components with clang for their own dev/testing (even if just for extra warnings, etc.)

And a major part of the work of the MSVC team today seems to be about standard compliance.

But yes, I do not really expect that they switch, and actually they probably don't even have the beginning of a serious reason to do so. This is not even a case of NIH. Their compiler derives from an ancient codebase and has been continuously maintained for several decades. They "invented" it. The only modern serious competition (that cares enough about Windows compat and some of their specific techs) has been started way after... They probably also have all kind of patents and whatnot about some security mitigations that are implemented by collaboration between the generated code and (the most modern versions of) low level parts of the Windows platform.

xorblurb commented on Depressing and faintly terrifying days for the C standard [pdf] yodaiken.com/2018/05/20/d... · Posted by u/signa11

iainmerrick · 8 years ago

You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Why not just have the compiler zero the memory, and thereby remove the trap? Seems very sensible to me. Do you think it’s a bad idea, and if so, why?

xorblurb · 8 years ago

Sane compilers should do that. The standard should eventually specify that. But before it does, you can not write portable code that expect that (but hopefully once enough compilers are sane but before the standard is updated, you can write code that targets only the compilers, and don't give a fuck about the other broken garbage that try to trap the world)

xorblurb commented on Intel Announces Optane DIMMs anandtech.com/show/12828/... · Posted by u/p1esk

monocasa · 8 years ago

And literally half the bandwidth. NVMe means that you have to write the data, flush the caches over those ranges to ensure that it's actually in DRAM, then instruct the NVMe controller to read it back out of DRAM. With these drives you just write (and maybe flush), and it's done. And they probably have their own dedicated DRAM protocol controller.

And it might even be slightly better than halving the bandwidth needs, since swapping DRAM banks isn't free, so you might be saving on mildly thrashing your DRAM controller when you're using DRAM and the NVMe drive is trying to read at the same time.

xorblurb · 8 years ago

Modern DMA is cache coherent (maybe except if you opt-out of it? I'm not even sure you can). It is still costly though.

xorblurb commented on Intel Announces Optane DIMMs anandtech.com/show/12828/... · Posted by u/p1esk

wtallis · 8 years ago

It's not just the connector. The electrical interface is DDR4, and it's accessed through the CPU's DRAM controller as part of the existing memory hierarchy. It's just that the memory addresses corresponding to the Optane DIMMs will be slower than the addresses corresponding to DRAM.

New CPUs will be required, and if there's a technical justification for that it will probably be that accessing an Optane DIMM requires timings that are far outside the normal range for DRAM modules that the existing memory controllers were designed to accommodate.

xorblurb · 8 years ago

It's also integrated to new instructions to flush things and wait for them to be made persistent.

xorblurb commented on Intel Announces Optane DIMMs anandtech.com/show/12828/... · Posted by u/p1esk

dragontamer · 8 years ago

Not even DDR4 RAM is byte-addressable. DDR4 is typically burst-length 8 for 64-bytes per burst (although BL2 exists, I'm fairly certain all modern processors have settled on BL8)

xorblurb · 8 years ago

I think you can still do non burst transactions and even set masks on byte granularity for writes. Classic processors probably don't do that, though.