Mandatory enforcement of indirect branch targets

For anybody unfamiliar with this, as I was, this appears to refer to Intel's Indirect Branch Tracking feature[1] (and the equivalent on ARM, BTI). The idea is that an indirect branch can only pass control to a location that starts with an "end branch" instruction. An indirect branch is one that jumps to a location whose value is loaded or computed from either a register or memory address: think calling a function pointer in C.

Without IBT, you'd have this equivalence between C and assembly:

    main() {
        void (*f)();
        f = foo;
        f();
    }

    void foo() { }

    ---

    main:
        movl $foo, %edx
        call *%edx
        ret

    foo:
        ret

If IBT is enabled, the above code triggers an exception because foo doesn't begin with an "end branch" instruction. When IBT is enabled by the compiler, the above code gets assembled as:

    main:
        endbr64 
        movl $foo, %edx
        call *%edx
        ret

    foo:
        endbr64
        ret

Now the compiler inserts endbr64 at the start of each function prologue. The reason for this feature, is to use as a defense in depth against JOP, and COP attacks, as it means that the only "widgets" available to you are entire functions, which can be far harder to exploit and chain.

[1]: https://www.intel.com/content/dam/develop/external/us/en/doc...

rwmj · 3 years ago

The fun fact being that older CPUs decode ENDBR64 as a slightly weird NOP (with no architectural effects), but it'll fault on original Pentiums: https://stackoverflow.com/questions/56120231/how-do-old-cpus...

dataflow · 3 years ago

There's a good question in the comments there that I still don't see the answer to. How does this work if there's an interrupt between the branch and the endbranch? Does the OS need to save/restore the "branchness" bit?

rollcat · 3 years ago

Various architectures do other interesting things with NOPs, IIRC one convention on PowerPC had something vaguely related to debugging or tracing (I can't remember the details or find any references right now).

mattgreenrocks · 3 years ago

That’s really clever use of the opcode space. Thanks for passing that along.

SomeRndName11 · 3 years ago

NOP on intels is in fact xchg eax, eax

asveikau · 3 years ago

It was an old joke that the opposite of "goto" is "come from", or that if goto is considered harmful, nobody said anything about a "come from". Marking something as a branch target reminds me of this.

https://en.m.wikipedia.org/wiki/COMEFROM

dejj · 3 years ago

> GOTO considered harmful

COMEFROM considered harm-mitigating

It ingeniously makes Return Oriented Programming (ROP) a lot harder.

wongarsu · 3 years ago

Interesting. Seems like enforcement on Intel CPUs is supported since Tiger Lake (so ~2020). Windows has basically the same feature implemented in software since 2015, called Control Flow Guard [1]. I wonder what the story there is, and if Windows has any plans to (get everyone to) switch to the hardware version once those CPUs have sufficient market share.

1: https://learn.microsoft.com/en-us/windows/win32/secbp/contro...

andersa · 3 years ago

Windows also recently implemented a far better version of this called Extended Flow Guard (XFG) that not only checks whether the location is a valid destination, but also whether it's a valid destination for that specific source.

For example, for any virtual function call or function pointer call, the destination must have a correct tag with the hash of the arguments. It's much more secure, and also faster, since loading the tag from memory can be merged with loading the actual code after it.

I wish this was the one implemented in hardware..

haberman · 3 years ago

Interesting. I was able to get Clang to generate this using `-fcf-protection=branch`: https://godbolt.org/z/rooP8vPsM

It looks like endbr64 is a 4-byte instruction. That could be a significant code size overhead for jump tables with lots of targets: https://godbolt.org/z/xTPToaddh

notaplumber1 · 3 years ago

OpenBSD disables jump tables in Clang on amd64 due to IBT, some architectures also had jump tables disabled as part of the switch to --execute-only ("xonly") binaries by default, e.g: powerpc64/sparc64/hppa.

https://marc.info/?l=openbsd-cvs&m=168254711511764&w=2

E.g: https://marc.info/?l=openbsd-cvs&m=167337396024167&w=2

codedokode · 3 years ago

Why should every function start with endbr64 command? Aren't functions usually called directly?

Also, is it required to insert endbr64 command after function calls (for return address)?

eklitzke · 3 years ago

As to why they're not always called directly, imagine some code like this:

    int FooWithoutChecks(void *p);
    
    int Foo(void *p) {
      if (p == NULL) return -1;
      return FooWithoutChecks(p);
    }

In general the caller is expected to call Foo if they aren't sure if the pointer is nullable, or if they already know that pointer is not null (e.g. because they already checked it themselves) they can call FooWithoutChecks and avoid a null check that they know will never be true.

The naive way to emit assembly for this is to actually emit two separate functions, and have Foo call FooWithoutChecks the usual way. But notice that the FooWithoutChecks function call is a tail call, so the compiler can use tail call optimization. To do this it would inline FooWithoutChecks into Foo itself, so the compiler just emits code for Foo with the logic in FoowithoutChecks inlined into Foo. This is nice because now when you call Foo, you avoid a call/ret instruction, so you save two instructions on every call to Foo. But what if someone calls FooWithoutChecks? Simple, you just call at the offset into Foo just past the pointer comparison. This actually just works because Foo already has a ret instruction, so the call to FooWithoutChecks will just reuse the existing ret. This optimization also saves some space in the binary which has various benefits in and of itself.

The example here with the null pointer check is kind of contrived, but this kind of pattern happens a LOT in real code when you have a small wrapper function that does a tail call to another function, and isn't specific to pointer checks.

josephcsible · 3 years ago

> Why should every function start with endbr64 command? Aren't functions usually called directly?

They're usually called directly, but unless the compiler can prove that they always are (e.g., if they're static and nothing in the same file takes the address), endbr64 is required.

> Also, is it required to insert endbr64 command after function calls (for return address)?

No, IBT is only for jmp and call. SS is the equivalent mechanism for ret.

messe · 3 years ago

C allows for any function to be called via a function pointer, and functions can be in different translation units, so the compiler can't simply assume that a function will never be called indirectly and has to pessimistically insert endbr64 in order to maintain a reasonable ABI.

And no, as I understand it, this is only for branch/calls not returns.

aidenn0 · 3 years ago

A traditional compiler needs to insert them for all external functions, because other compilation units may make an indirect call.

cratermoon · 3 years ago

In case anyone wants a very simple introduction to JOP/COP exploits and mitigations of this type: <https://www.theregister.com/2020/06/15/intel_cet_tiger_lake/>

__failbit · 3 years ago

Thank you for the explanation!

Theo had to get his digs in against Linux in that announcement. Why not just focus on what OpenBSD is doing, and maybe contrast it to what Linux does without the speculation that they will still be doing the same thing in 20 years.

He's unquestionably brilliant, but I've had a few encounters with him on the mailing lists and he is so quick to take offense where none was meant and drop into name-calling and insults. I don't really get it. He may have some deep insecurities.

brynet · 3 years ago

It's an important comparison of the mechanisms, even in 2023, you can still find binaries on modern Linux distributions with executable stacks due to the fail-open design, 20 years later.

The fact that Linux hasn't learned the right lessons in 20 years, and has chosen to "double down" in respect to IBT/BTI, does not inspire confidence that they will ever fix it. I'd say his 20 year estimate was in fact being pretty generous given the evidence available.

https://news.ycombinator.com/item?id=21554975

whoopdedo · 3 years ago

It's the price you pay for never-break-userspace. OpenBSD is fine with the very small probability that an executable which doesn't do branch tracking will fail to run under the enforced rules. The answer to that is to recompile because you've still got the source, and if not, well, tough cookies.

mananaysiempre · 3 years ago

> It's an important comparison of the mechanisms, even in 2023, you can still find binaries on modern Linux distributions with executable stacks due to the fail-open design, 20 years later.

Unfortunately, for C code using GCC’s nested functions extension (or for languages that want to be ABI-compatible with C and support nested functions, like that paragon of advanced features called Pascal /s ), there’s no other compilation strategy in current ABIs. The patches to switch C (and not just Ada) to function descriptors[1] with an ABI break have been sitting on the GCC mailing list since approximately forever[2], but it doesn’t seem like there’s been any progress.

[1] The strategy is basically to compile (*fp)() not as

  call *%rax

but as (untested)

     test $1, %rax
     jz 1f
     mov 8(%rax), %r10
     mov (%rax), %rax
  1: call *%rax

thus essentially inlining the (currently stack-allocated) closure calling thunk at all indirect call sites. It is ABI-compatible on x86 and x86-64 with all code that does not involve nested functions, place functions at odd addresses, or tag function pointers itself (and I think with all arm64 and riscv code, although arm32’s usage of the low pointer bit for Thumb interworking is bound to make this trickier).

[2] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-01/msg00735.h...

jacquesm · 3 years ago

The funny thing is that this attitude towards breaking changes is one of the reasons why Theo is able to make this comment at all. If he would allow breaking changes then OpenBSD adoption likely would be higher and that in turn would cause him to resist the kind of things that Linux would not be able to get away with.

It's clearly different philosophies leading to different outcomes with neither of them clearly better than the other, it just depends on what you need. It would be possible to make that statement in a more graceful way.

VancouverMan · 3 years ago

That part doesn't look like a "dig" or an insult to me.

It seems like a reasonable, relevant, and plausible assessment of how the long-term outcomes may likely differ between OpenBSD's stricter approach versus a looser approach, specifically when it comes to the degree of security offered (which is one of OpenBSD's main focuses), based on a past situation that's similar.

How do you know that you aren't being, to use your words, "quick to take offense where none was meant" in this case?

jacquesm · 3 years ago

> How do you know that you aren't being, to use your words, "quick to take offense where none was meant" in this case?

Past knowledge about Theo?

redundantly · 3 years ago

I wouldn't have it any other way. I love the OpenBSD mailing lists. Always an entertaining read when Theo gets involved.

teknopurge · 3 years ago

upvoted and +1. Theo has been an important leader in OSS for decades: his brevity and impatience is a net positive. also he is usually correct.

Joker_vD · 3 years ago

That's the problem with many brilliant people: what they perceive as their interlocutors being deliberately obtuse on some completely obvious point is actually their interlocutors being just as smart as they always are on some point that is not obvious at all to them.

rkangel · 3 years ago

Perception of relative intelligence or sensible decision making is irrelevant. Just because you think you're doing a better job doesn't mean you need to shit on the other person.

You could not mention Linux at all, or you could even say "we think this is better than Linux's approach because of X" and it would be a great improvement.

I have always found it interesting that Rust purposefully avoided doing language comparisons - "we're better than Python like this and better than C like that". Their message purposefully avoided any positioning of it as a competition, instead focusing just on articulating Rust's value. It was an eye opening approach given our instinct is normally to pit things against each other.

ris · 3 years ago

This is my main takeaway too. As a one time OpenBSD enthusiast (and still admirer), now I'm a bit older I find the continual smugness starts to grate.

Truth is, Linux has a lot more constraints on how it can implement something because it has users. Users that have all sorts of different ways they need it to work.

insanitybit · 3 years ago

I think it's great that he's calling Linux's choices out. TBH Theo's attitude is borne of Linux's culture - most older school sec people have learned that this is the only way to get things to improve.

Ericson2314 · 3 years ago

Are Theo and Linux more alike than OpenBSD and Linux?

PrimeMcFly · 3 years ago

Now, yes. Linus wasn't always so abrasive though. At some point he caught up to Theo.

NoZebra120vClip · 3 years ago

> Are Theo and Linux more alike than OpenBSD and Linux?

Is a Canadian kernel developer more like a POSIX operating system than a POSIX operating system is like a POSIX operating system?

I'm not sure I understand. Perhaps you meant to write "Linus" since Linus is also a kernel developer? That seems more like apples to apples.

saagarjha · 3 years ago

> He may have some deep insecurities.

Explains why he spends all his time developing mitigations

microtherion · 3 years ago

I’d just like to interject for a moment. What you’re referring to as Linux, is in fact, NotOpenBSD/Linux, or as I’ve recently taken to calling it, Linux as opposed to OpenBSD…