Unexpected ways memory subsystem interacts with branch prediction

I'm pretty sure the following optimization is invalid, since a is not a bool array, but one of positive/negative numbers (see top of article):

> You can use arithmetics to go branchless.

    if (a[i] > 0) {
       cnt++;
    }

> Rewriting using arithmetic takes advantage of the fact that the expression a[i] > 0 has an arithmetic value 1 if true and 0 if false. So the whole expression can be rewritten as:

    cnt += a[i]

Kwantuum · 2 years ago

Surely this was meant to be cnt += a[i] > 0

amelius · 2 years ago

But a[i] > 0 can still compile to a branch, depending on the compiler.

inetknght · 2 years ago

cnt += a[i]

No, you've changed the right-hand side expression. Now you're adding a[i] to cnt and nowhere has it stated that a[i] is 1. But a[i] > 0's result is always zero or one.

I suggest what you want is this:

    cnt += !!a[i]

Now, a[i] is not-notted -- the expression returns a bool indicating whether the value is converted to true.

or this, depending on your style:

    cnt += bool(a[i])

which does the same thing

BTW, there's no way, documented or not, to switch off branch predictor on a modern CPU, right?

NobodyNada · 2 years ago

AArch64 has the SB ("speculation barrier") instruction, which you could insert after both sides of a branch instruction if you really wanted to: https://developer.arm.com/documentation/ddi0596/2021-12/Base...

However, it's not specified to completely disable speculation, only "to the extent that such speculation can be observed through side-channels as a result of control flow speculation or data value speculation". So e.g. an implementation could decide to disable speculative memory access while still allowing arithmetic operations.

Preventing Spectre vulnerabilities is pretty much the only reason you'd want to disable branch prediction though (except for curiosity of course). Without prediction, performance would be as bad as if every branch was mispredicted, since the pipeline has to stop and wait at every single branch. The idea of branch prediction is that the "wasted" cycles waiting for the branch to resolve can instead be used to do some computation that may or may not be useful; if it turned out to be useful you saved some time, if not you didn't lose anything (besides maybe some electricity). So even a branch predictor that randomly guesses with 50% accuracy is a huge performance win over not speculating at all.

mgaunard · 2 years ago

Simply replace your processor by a 8086.

jazzyjackson · 2 years ago

https://web.archive.org/web/20231228221536/https://johnnyssw...

t8sr · 2 years ago

The irony of a blog about software performance going down after being on HN for 2 hours is too much. (I know, it’s just Wordpress, I’m just being a grump.)

The archive version seems to be missing the branchless versions of the algorithms - is that missing from the article itself as well? It’d be interesting to put the different versions into compiler explorer.

codebrrr · 2 years ago

082349872349872 · 2 years ago

Contrary to title, TFA got the results they should've already expected

sylware · 2 years ago

I am currently coding x86_64 and I am trying to favor as much as reasonably possible "non-predicted branch"/"branchless" code paths. Setcc and cmovcc instructions are really usefull, even though if I can think of real "branchless" algorithms, I will favor them.

x86_64 assembly for me is just the transition step before the actual RISC-V jump. This is "register-ization" of some code paths. Once done, it is kind of easy to do a port to another modern ISA. Bu then, I am thinking about all that branch prediction on RISC-V:

Will we have a way to hint a core/hart to dodge prediction for some branches, fine-grained and "cleanly"? I was thinking about implicit branch prediction exclusion via some "known" instruction fusions, but the 'implicit' here is scary.

celeritascelery · 2 years ago

> Will we have a way to hint a core/hart to dodge prediction for some branches, fine-grained and "cleanly"?

Why would you want to avoid the branch predictor? All it can do is speedup your code.

If I did understand properly, in code paths with many branches near from each other, it would help to hint the predictor about which ones are not worth it.

nayuki · 2 years ago

I explored the idea of making code branchless by replacing if/else with bit masking arithmetic. I did this in the context of preventing timing attacks in cryptographic primitive algorithms. https://stackoverflow.com/questions/27865974/is-masking-effe...

Deleted Comment

ceeam · 2 years ago