The irony of a blog about software performance going down after being on HN for 2 hours is too much. (I know, it’s just Wordpress, I’m just being a grump.)
The archive version seems to be missing the branchless versions of the algorithms - is that missing from the article itself as well? It’d be interesting to put the different versions into compiler explorer.
I'm pretty sure the following optimization is invalid, since a is not a bool array, but one of positive/negative numbers (see top of article):
> You can use arithmetics to go branchless.
if (a[i] > 0) {
cnt++;
}
> Rewriting using arithmetic takes advantage of the fact that the expression a[i] > 0 has an arithmetic value 1 if true and 0 if false. So the whole expression can be rewritten as:
No, you've changed the right-hand side expression. Now you're adding a[i] to cnt and nowhere has it stated that a[i] is 1. But a[i] > 0's result is always zero or one.
I suggest what you want is this:
cnt += !!a[i]
Now, a[i] is not-notted -- the expression returns a bool indicating whether the value is converted to true.
I am currently coding x86_64 and I am trying to favor as much as reasonably possible "non-predicted branch"/"branchless" code paths. Setcc and cmovcc instructions are really usefull, even though if I can think of real "branchless" algorithms, I will favor them.
x86_64 assembly for me is just the transition step before the actual RISC-V jump. This is "register-ization" of some code paths. Once done, it is kind of easy to do a port to another modern ISA. Bu then, I am thinking about all that branch prediction on RISC-V:
Will we have a way to hint a core/hart to dodge prediction for some branches, fine-grained and "cleanly"? I was thinking about implicit branch prediction exclusion via some "known" instruction fusions, but the 'implicit' here is scary.
If I did understand properly, in code paths with many branches near from each other, it would help to hint the predictor about which ones are not worth it.
I explored the idea of making code branchless by replacing if/else with bit masking arithmetic. I did this in the context of preventing timing attacks in cryptographic primitive algorithms. https://stackoverflow.com/questions/27865974/is-masking-effe...
However, it's not specified to completely disable speculation, only "to the extent that such speculation can be observed through side-channels as a result of control flow speculation or data value speculation". So e.g. an implementation could decide to disable speculative memory access while still allowing arithmetic operations.
Preventing Spectre vulnerabilities is pretty much the only reason you'd want to disable branch prediction though (except for curiosity of course). Without prediction, performance would be as bad as if every branch was mispredicted, since the pipeline has to stop and wait at every single branch. The idea of branch prediction is that the "wasted" cycles waiting for the branch to resolve can instead be used to do some computation that may or may not be useful; if it turned out to be useful you saved some time, if not you didn't lose anything (besides maybe some electricity). So even a branch predictor that randomly guesses with 50% accuracy is a huge performance win over not speculating at all.
The archive version seems to be missing the branchless versions of the algorithms - is that missing from the article itself as well? It’d be interesting to put the different versions into compiler explorer.
> You can use arithmetics to go branchless.
> Rewriting using arithmetic takes advantage of the fact that the expression a[i] > 0 has an arithmetic value 1 if true and 0 if false. So the whole expression can be rewritten as:No, you've changed the right-hand side expression. Now you're adding a[i] to cnt and nowhere has it stated that a[i] is 1. But a[i] > 0's result is always zero or one.
I suggest what you want is this:
Now, a[i] is not-notted -- the expression returns a bool indicating whether the value is converted to true.or this, depending on your style:
which does the same thingx86_64 assembly for me is just the transition step before the actual RISC-V jump. This is "register-ization" of some code paths. Once done, it is kind of easy to do a port to another modern ISA. Bu then, I am thinking about all that branch prediction on RISC-V:
Will we have a way to hint a core/hart to dodge prediction for some branches, fine-grained and "cleanly"? I was thinking about implicit branch prediction exclusion via some "known" instruction fusions, but the 'implicit' here is scary.
Why would you want to avoid the branch predictor? All it can do is speedup your code.
Deleted Comment
However, it's not specified to completely disable speculation, only "to the extent that such speculation can be observed through side-channels as a result of control flow speculation or data value speculation". So e.g. an implementation could decide to disable speculative memory access while still allowing arithmetic operations.
Preventing Spectre vulnerabilities is pretty much the only reason you'd want to disable branch prediction though (except for curiosity of course). Without prediction, performance would be as bad as if every branch was mispredicted, since the pipeline has to stop and wait at every single branch. The idea of branch prediction is that the "wasted" cycles waiting for the branch to resolve can instead be used to do some computation that may or may not be useful; if it turned out to be useful you saved some time, if not you didn't lose anything (besides maybe some electricity). So even a branch predictor that randomly guesses with 50% accuracy is a huge performance win over not speculating at all.