Test if a number is even

Optimizing compilers have been able to recognize pretty complicated patterns for many years.

For instance if you're making a loop to count the bits that are set in a number, the compiler can recognize the entire loop and turn it into a single popcnt instruction (e.g. https://lemire.me/blog/2016/05/23/the-surprising-cleverness-... )

eichin · a year ago

They've been able to recognize this pattern since the 1960s (popcount is a very historically special case and not really a sign of complexity, since traditionally it was "if you write this exact code from the documentation, you'll get the machine instruction" and didn't imply any more general cleverness.)

cperciva · a year ago

popcount is a very historically special case

To elaborate a bit on the specialness of popcount: It is a generally accepted belief in the computer architecture community that several systems included a popcount iNstruction Solely due to A request from a single "very good customer".

LegionMammal978 · a year ago

Yeah, once I tried writing out a dumber popcount loop (checking each bit) for better readability, and was annoyed to find that GCC and Clang didn't recognize it. I ended up looking into the source of both compilers to find that they only transform the more 'idiomatic' version.

dietr1ch · a year ago

I feel that the compiler is doing too much work here. I know they are thinking about special cases on generated code, but at some point it feels that it just adds compile time for no good reason.

Look at this --beauty-- eww, thing, should compilers really spend time trying to figure out how to optimise insane code?

    def is_even(n):
      return str(n)[len(str(n))-1] in [str(2*n) for n in range(5)]

ajross · a year ago

These optimizations are very useful. Consider the only slightly less contrived case where you want to mod an index by the size of an array. And the compiler expands the inline function around a context where the array is a fixed power of two size at compile time. Poof, no division/modulus needed, magically. Lots and lots of code looks like this: general algorithms expressed in simple implementation that has a faster implementation in the specific instance that gets generated.

refulgentis · a year ago

Maybe one day there will be compilers that can choose what to optimize based on their aesthetic judgement of the code.

I could see that as a novel feedback mechanism for software engineers.

As it stands, I'm glad they design optimizations abstractly, even if that means code I don't like gets the benefits

matt_daemon · a year ago

Or even:

    def is_even(n):
        return str(n)[-1] in "02468"

The interesting thing about testing values (like testing whether a number is even) is that at the assembly level, the CPU sets flags when the arithmetic happens, rather than needing a separate "compare" instruction.

gcc likes to use `and edi,1` (logical AND between 32-bit edi register and 1). Meanwhile, clang uses `test dil,1` which is similar, except the result isn't stored back in the register, which isn't relevant in my test case (it could be relevant if you want to return an integer value based on the results of the test).

After the logical AND happens, the CPU's ZF (zero) flag is set if the result is zero, and cleared if the result is not zero. You'd then use `jne` (jump if not equal) or maybe `cmovne` (conditional move - move register if not equal). Note again that there is no explicit comparison instruction. If you don't use O3, the compiler does produce an explicit `cmp` instruction, but it's redundant.

Now, the question is: Which is more efficient, gcc's `and edi,1` or clang's `test dil,1`? The `dil` register was added for x64; it's the same register as `edi` but only the lower 8 bits. I figured `dil` would be more efficient for this reason, because the `1` operand is implied to be 8 bits and not 32 bits. However, `and edi,1` encodes to 3 bytes while `test dil,1` encodes to 4 bytes. I guess the `and` instruction lets you specify the bit size of the operand regardless of the register size.

There is one more option, which neither compiler used: `shr edi,1` will perform a right shift on EDI, which sets the CF (carry) flag if a 1 is shifted out. That instruction only encodes to 2 bytes, so size-wise it's the most efficient.

The right-shift option fascinates me, because I don't think there's really a C representation of "get the bit that was right-shifted out". Both gcc and clang compile `(i >> 1) << 1 == i` the same as `i & 1 == 0` and `i % 2 == 0`.

Which of the above is most efficient on CPU cycles? Who knows, there are too many layers of abstraction nowadays to have a definitive answer without benchmarking for a specific use case.

I code a lot of Motorola 68000 assembly. On m68k, shifting right by 1 and performing a logical AND both take 8 CPU cycles. But the right-shift is 2 bytes smaller, because it doesn't need an extra 16 bits for the operand. That makes a difference on Amiga, because (other than size) the DMA might be shared with other chips, so you're saving yourself a memory read that could stall the CPU while it's waiting its turn. Therefore, at least on m68k, shifting right is the fastest way to test if a value is even.

userbinator · a year ago

That instruction only encodes to 2 bytes, so size-wise it's the most efficient.

In isolation it's the smallest, but it's no longer the smallest if you consider that the value, which in this example is the loop counter, needs to be preserved, meaning you'll need at least 2 bytes for another mov to make a copy. With test, the value doesn't get modified.

dansalvato · a year ago

That is true, I deliberately set up an isolated scenario to do these fun theoretical tests. It actually took some effort to stop the compiler from being too smart, because it would want to transform the result into a return value, or even into a pointer offset, to avoid branching.

amiga386 · a year ago

> On m68k, shifting right by 1 and performing a logical AND both take 8 CPU cycles. But the right-shift is 2 bytes smaller

There's also BTST #0,xx but it wastefully needs an extra 16 bits say which bit to test (even though the bit can only be from 0-31)

> That makes a difference on Amiga, because (other than size) the DMA might be shared with other chips, so you're saving yourself a memory read that could stall the CPU while it's waiting its turn.

That's a load-bearing "could". If the 68000 has to read/write chip RAM, it gets the even cycles while the custom chips get odd cycles, so it doesn't even notice (unless you're doing something that steals even cycles from the CPU, e.g. the blitter is active and you set BLTPRI, or you have 5+ bitplanes in lowres or 3+ bitplanes in highres)

dansalvato · a year ago

> There's also BTST #0,xx but it wastefully needs an extra 16 bits say which bit to test (even though the bit can only be from 0-31)

That reminds me, it's theoretically fastest to do `and d1,d0` e.g. in a loop if d1 is pre-loaded with the value (4 cycles and 1 read). `btst d1,d0` is 6 cycles and 1 read.

> the blitter is active and you set BLTPRI

I thought BLTPRI enabled meant the blitter takes every even DMA cycle it needs, and when disabled it gives the CPU 1 in every 4 even DMA cycles. But yes, I'm splitting hairs a bit when it comes to DMA performance because I code game/demo stuff targeting stock A500, meaning one of those cases (blitter running or 5+ bitplanes enabled) is very likely to be true.

venning · a year ago

It may be worth pointing out: these are equivalent comparisons when testing for even numbers but cannot be extrapolated to testing for odd numbers. The reason being that a negative odd number modulus 2 is -1, not 1.

So `n % 2 == 1` should probably [1] be replaced with `n % 2 != 0`.

While this may be obvious with experience, if the code says `n % 2 == 0`, then a future developer who is trying to reverse the operation for some reason must know that they need to change the equality operator not the right operand. Whereas, with `n % 1 == 0`, they can change either safely and get the same result.

This feels problematic because the business logic that necessitated the change may be "do this when odd" and it may feel incorrect to implement "don't do this when even".

I really disfavor writing code that could be easily misinterpreted and modified in future by less-experienced developers; or maybe just someone (me) who's tired or rushing. For that reason, and the performance one, I try to stick to the bitwise operator.

[1] Of course, if for some reason you wanted to test for only positive odd numbers, you could use `n % 2 == 1`, but please write a comment noting that you're being clever.

I really disfavor writing code that could be easily misinterpreted and modified in future by less-experienced developers

That's their problem. Otherwise you're just contributing to the decline.

notfish · a year ago

In what languages is n % 2 -1 for negative odd numbers?

Edit: apparently JS, java, and C all do this. That’s horrifying

jagged-chisel · a year ago

Horrifying? It’s mathematically correct.

neuroelectron · a year ago

You can just not( iseven() )

tux3 · a year ago

Arcuru · a year ago

> Much better :) But what about C? Let’s try it:

> I tried both versions (modulo 2 and bitwise AND) and got the same result. I think the optimizer recognizes modulo 2 and converts it to bitwise AND.

Yes, even without specifying optimizations - https://godbolt.org/z/9se9c6qKT

You can see that the output of the compiler is identical whether you use `i%2 == 0` or `(i&1) == 0`. The bitwise AND is instruction 12 in the output.

Using -O3 like in the post actually compiles to SIMD instructions on x86-64 - https://godbolt.org/z/dWbcK947G

ryan-c · a year ago

With i < 71, the compiler will just turn it into a constant value of 36. Switches to SIMD at 72, idk why.

thaumasiotes · a year ago

That's how you check modulus for powers of 2. 2 is a power of 2. This barely even qualifies as an "optimization".

WesolyKubeczek · a year ago

> I think the optimizer recognizes modulo 2 and converts it to bitwise AND.

A quick check in the compiler explorer (godbolt.org) confirms that this is indeed true for GCC on x86_64 and aarch64, but not for clang on the same (clang does optimize it with -O3).

the_real_cher · a year ago

I like the algorithms that are the opposite of this where they try to find the slowest possible way to determine if a number is even.

mtmail · a year ago

https://github.com/blackburn32/serverlessIsEven "A serverless implementation of isEven. Now you can know if your numbers are even, even at mass scale."

TZubiri · a year ago

How about sending a packet back and forth to a server in another continent n times, and if it stops coming back, it was odd.

crazydoggers · a year ago

There’s no upper limit, so there’d have to be some set of rules like no sleep(n).

Also given the halting problem, you could write an algorithm that would be impossible to determine if it loops forever.

plagiarist · a year ago

If the results of TREE(n) had specific properties for even n, you could easily check by first calculating TREE(n) and then looking for those properties in the results. Might need a bignum library.

Deleted Comment

Rendello · a year ago

Why write redundant code? I just depend on an external is-even library, which depends on an is-odd library ;)

(https://news.ycombinator.com/item?id=38791094)

ketchupdebugger · a year ago

why not use ai? https://github.com/Calvin-LL/is-even-ai

nikolay · a year ago

"Premature optimization is the root of all evil." -- Donald Knuth [0]

[0]: https://www.youtube.com/watch?v=74RdET79q40

osigurdson · a year ago

The real ROAE is reasoning by unexamined phrase.