The reason I'm negative is the entire article has zero detail on WTF this instruction set is or does. The best you can do is guess from the name of the instruction set.
Compare the linked iPhone article to this blog and you'll quickly see the difference. There's very real discussion in the MTE article of how the instructions work and what they do. This article just says "Memory safety is hard and we'll fix it with these new instructions that fix memory safety!"
So there's a long intellectual history behind these technologies, and Intel had multiple chances of taking the leadership on this around 2018 - they failed to do so, some of the talent went to Apple, and now Intel has to play catch-up.
I'm pretty certain it'll be the x86 variant of either MTE or MIE.
Probably because it's very likely that both AMD and Intel have had engineers working on this sort of thing for a long time, and they're now deciding to collectively hash out whatever the solution is going to be for both of them.
A lot of these extensions come from Intel/AMD/etc clients first, and because of how long it takes a thing to make it into mainstream chips, it was probably conceived of and worked on at least 5 years ago, often longer.
This particular thing has a long history and depending on where they worked, they know about that history.
However, they are often covered by extra layers of NDA's on top of whatever normal corporate employee NDA you have, so most people won't say a ton about it.
I don't know if it is intended this way, but there's one useful outcome even with the limited amount of detail disclosed:
There are industry partners who work closely with AMD and Intel (with on-site partner engineers etc.), but who are not represented in the x86 ecosystem advisory group, or maybe they have representation, but not at the right level. If these industry partners notice the blog post and they think they have technology in impacted areas, they can approach their contacts, asking how they can get involved.
Yeah it's the most succinct explanation I've seen of weird machines and memory tagging. Definitely bookmarking this one. I wonder if video of the talk that presumably presented this is available.
Is there a comparison of memory tagging designs for different architectures (POWER, SPARC, CHERI/Morello, Arm MTE/eMTE, Apple MIE, x86, RISC-V)? e.g. enforcement role of compiler vs. hardware, opt-in vs mandatory, hardware isolation of memory tags, performance impact, level of OS integration?
Presumably will be based on the existing Linear Address Masking/Upper Address Ignore specs, which are equivalent, and will be similar to CHERI.
If so it needs to be opt-in or at least opt-out per process, because many language runtimes use these pointers bits to optimize dynamic types, and would suffer a big performance hit if they were unable to use them.
I would not assume they just use bits in the address word for the tag.
LPDDR6 includes 16 bits of host-defined metadata per 256 bits of data. Systems can use that for ECC, and/or for other purposes, including things like tagging memory. DDR6 will likely include a similar mechanism.
SECDED ECC requires 10 bits, leaving you 6 bits. That's enough for one bit per aligned address word, which is probably used to denote "valid pointer" like CHERI.
Dynamic types have classically used the lower bits freed by alignment constraints. If I know a cons cell is 16 bytes then I can use the low 4 bits of an address to store enough type info to disambiguate.
There's a technique known as "NaN boxing" which exploits the fact double precision floats allow you to store almost 52 bits of extra data in what would otherwise be NaNs.
If you assume the top 16 bits of a pointer are unused[1], you can fit a pointer in there. This lets you store a pointer or a full double by-value (and still have tag bits left for other types!).
Last I checked LuaJIT and WebKit both still used this to represent their values.
[1] On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
There's numerous techniques used. Many are covered in Gudeman's 1993 paper "Representing Type Information in Dynamically Typed Languages"[1], which includes low-bits tagging, high-bits tagging, and NaN-boxing.
The high bits let us tag more types, and can be used in conjunction with low bits tagging. Eg, we might use the low bits for GC marking.
Depends on the architecture. Top bit usage lets you do what the hardware thinks if as an 'is negative' check for very cheap on a lot of archs for instance.
You don't need the hardware UAI/LAM to make use of the high pointer bits. The most common technique is to use `shl reg, 16; sar reg, 16`, which will shift in ones or zeros from the left depending on the 47th bit.
Several runtimes use high bits tagging combined with NaN-boxing, and have been doing so since before LAM/UAI existed.
It seems very strange to me to finally get around to this right as we are finally getting low level software that no longer needs it (and we've had high level software that doesn't need it for ages). At this point I think I'd prefer the transistor budget and bits of memory were spent on other things.
We have had it before C was even invented, Burroughs nowadays still sold as Unisys ClearPath MCP was written in ESPOL, latter NEWP, with zero Assembly.
The compiler provides intrisics, has bounds checking for strings and arrays.
PL/I and its variants were also used across several systems, as were ALGOL dialects.
Note C.A.R Hoare Turing award speech in 1980,
"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
The "1980 language designers and users have not learned this lesson" is meant to be C without explicit refering to it.
> we are finally getting low level software that no longer needs it
Ada has had memory safety for decades – not to mention Lisp, Java, etc. if you can live with garbage collection. Even PL/I was better than C for memory safety, which is why Multics didn't suffer from buffer overflows and Unix did. But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Only with spark (i.e. formal verification). Which similar to other projects of this age (e.g. Rocq/How the compcert C compiler was implemented and proved correct) seems not to be low enough friction to get widescale adoption.
> not to mention Lisp, Java, etc. if you can live with garbage collection.
Like I said, high level languages that won't benefit from this at all have existed for ages... and the majority of software is written in them. This is one of the stronger arguments against it...
> But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Fil-C shows this can be solved at the software layer for things in this category that can afford the overhead of a GC. Which does mean a larger performance penalty than the hardware proposal, but also more correctly (since hardware changes can never solve unintended compilations resulting from undefined behavior).
The linux kernel is probably an example of an actual long tail project that would benefit from this for a reasonably long time though, since it's not amenable to "use a modified C compiler that eliminates undefined behavior with GC and other clever tricks" and it's unlikely to get rewritten or replaced with a memory safe thing quickly due to the number of companies collaborating on it.
Sure but nobody is actually writing foundational software (as we are now calling it) in Lisp, Java or Ada (and it also has no good answer for use-after-free which is a huge class of vulnerabilities).
This is the first point in history where it's actually feasible in real non-research life to write an entire system in only memory safe languages. (And no, the existence of `unsafe` doesn't invalidate this point.)
There are hundreds of billions of lines of code of critical software[1] written in unsafe languages, that is not going to be rewritten any time soon. Adding memory safety "for free" to such software is a net positive.
Current CPUs are limited by power, transistors are essentially free.
[1] often the VMs of safer higher level languages, in fact.
> we are finally getting low level software that no longer needs it
We're not, though. There's a little bit of low-level software being written in Rust (and even that requires a non-trivial amount of unsafe code), but most new low-level software is being written in C++ or C. And even if a more popular safe low-level programming language arrived tomorrow and gained a more respectable adoption, that still wouldn't be fast enough because of all the existing software.
"Needs" is a strong word, would benefit from a bit, but in practice I think the number of vulnerabilities rust code typically has is not large enough to justify the expense of compromising the performance of every CPU ever sold (thus requiring more, consuming more energy, etc).
There's also been steady progress towards creating systems to prove unsafe rust correct - at which point it wouldn't even benefit from this. For example see the work amazon has been sponsoring to prove the standard library correct: https://github.com/model-checking/verify-rust-std/
it's like the end scene in fight club. except instead of credit card company office towers it's the borrow checker and associated skyscrapers that symbolize the ascent of rust that are going down in flames as the x86 antiheroes high five and the pixies start crooning "where is my mind" to rolling credits over the burning cityscape.
There is server hardware out there now that in theory can support MTE, but I don't know if there's commercial support for it. MTE needs to be set up by the firmware, it's not purely an OS/kernel matter.
GrapheneOS (hardened Android distribution) also has it enabled by default for the base OS and user-installed Apps that support it (you can also force it for all apps) on 8th Gen Google Pixels and newer
I agree it would’ve happened no matter what, it’s a very useful feature.
I do wonder if Apple‘s announcement that they started shipping it may have pushed them to announce this/agree to a standard earlier than they would have otherwise.
It would be nice to know how this memory safety instructions should be used by software developers. Assuming I write C++ code, what should I do? Enable some new compiler flags? Use special runtime library? Use some special variant of the language standard library which uses these new instructions? Completely rewrite my code to make it safe?
If it's anything like CHERI, you need to make sure to follow pointer provenance rules properly (called "strict provenance" in Rust) and then just recompile your program with some extra flags. Only low level memory-related things like allocators and JITs need any significant source code changes.
If ARM's memory tagging is a guide, not much for the general developer. You will be able to run with address sanitizers enabled at a much lower overhead. Perhaps, use some hardened allocators or library variants that rely on the extension.
fwiw "knee-jerk reaction to Apple MIE" is not exactly the right characterization of this. MPX existed and faded away, and it's not very surprising that x86-world would wait for someone else to try shipping hardware support for memory safety features before trying again.
I wouldn't say that's fair. MPX failed because it was a very problematic solution to this problem.
MPX had a large (greater than 15-20%) overhead and was brittle. It didn't play well with other x86 instructions and the developer experience was extremely poor (common C and C++ design patterns would cause memory faults with MPX).
Apple MIE (which is essentially ARM MTE v2) and MTE on the other hand have near invisible levels of overhead (~5-10%) with the ability to alternate between synchronous and asynchronous tracing of faults where the latter has a much lower overhead than the former (allowing you to check in production for very little overhead and get better debugging in test). They also work essentially seamlessly with the rest of the ARM ecosystem and it takes very little to integrate the functionality into any language ecosystem or toolchain.
If MPX was comparable with MTE, it certainly would have gotten the adoption that MTE is getting but the "tech" just wasn't there to justify it's use.
I'm not arguing MPX was a good solution, just that it's silly to assume folks designing x86 machines have been totally ignoring developments in that space for the past ten years.
For a good intuition why this (coupled with instrumenting all allocators accordingly) is a game-changer for exploitation, check https://docs.google.com/presentation/d/1V_4ZO9fFOO1PZQTNODu2...
In general, having this come to x86 is long-overdue and very welcome.
The reason I'm negative is the entire article has zero detail on WTF this instruction set is or does. The best you can do is guess from the name of the instruction set.
Compare the linked iPhone article to this blog and you'll quickly see the difference. There's very real discussion in the MTE article of how the instructions work and what they do. This article just says "Memory safety is hard and we'll fix it with these new instructions that fix memory safety!"
I'm pretty certain it'll be the x86 variant of either MTE or MIE.
IIRC, later SPARC64 chips also had a version of this.
A lot of these extensions come from Intel/AMD/etc clients first, and because of how long it takes a thing to make it into mainstream chips, it was probably conceived of and worked on at least 5 years ago, often longer.
This particular thing has a long history and depending on where they worked, they know about that history.
However, they are often covered by extra layers of NDA's on top of whatever normal corporate employee NDA you have, so most people won't say a ton about it.
There are industry partners who work closely with AMD and Intel (with on-site partner engineers etc.), but who are not represented in the x86 ecosystem advisory group, or maybe they have representation, but not at the right level. If these industry partners notice the blog post and they think they have technology in impacted areas, they can approach their contacts, asking how they can get involved.
I’m lukewarm on this.
- It is long overdue and welcome.
- It won’t stop a sufficiently determined attacker because its probabilistic and too easy to only apply partially
Is this good? Yes. Does it solve memory safety? No. But does it change the economics? Yes.
Previous discussion https://news.ycombinator.com/item?id=45186265
Presumably will be based on the existing Linear Address Masking/Upper Address Ignore specs, which are equivalent, and will be similar to CHERI.
If so it needs to be opt-in or at least opt-out per process, because many language runtimes use these pointers bits to optimize dynamic types, and would suffer a big performance hit if they were unable to use them.
LPDDR6 includes 16 bits of host-defined metadata per 256 bits of data. Systems can use that for ECC, and/or for other purposes, including things like tagging memory. DDR6 will likely include a similar mechanism.
SECDED ECC requires 10 bits, leaving you 6 bits. That's enough for one bit per aligned address word, which is probably used to denote "valid pointer" like CHERI.
If you assume the top 16 bits of a pointer are unused[1], you can fit a pointer in there. This lets you store a pointer or a full double by-value (and still have tag bits left for other types!).
Last I checked LuaJIT and WebKit both still used this to represent their values.
[1] On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
The high bits let us tag more types, and can be used in conjunction with low bits tagging. Eg, we might use the low bits for GC marking.
[1]:https://web.archive.org/web/20170705085007/ftp://ftp.cs.indi...
AFAIK, AMD only added it in Zen4.
[0] https://www.youtube.com/watch?v=y_QeST7Axrw
Several runtimes use high bits tagging combined with NaN-boxing, and have been doing so since before LAM/UAI existed.
But that can be problematic for any code that assumes that the size of pointers is the same as size_t/usize.
I don't see how this could not be opt in for backwards compatibility though, since existing code wouldn't use the new instructions.
The compiler provides intrisics, has bounds checking for strings and arrays.
PL/I and its variants were also used across several systems, as were ALGOL dialects.
Note C.A.R Hoare Turing award speech in 1980,
"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
The "1980 language designers and users have not learned this lesson" is meant to be C without explicit refering to it.
Ada has had memory safety for decades – not to mention Lisp, Java, etc. if you can live with garbage collection. Even PL/I was better than C for memory safety, which is why Multics didn't suffer from buffer overflows and Unix did. But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Only with spark (i.e. formal verification). Which similar to other projects of this age (e.g. Rocq/How the compcert C compiler was implemented and proved correct) seems not to be low enough friction to get widescale adoption.
> not to mention Lisp, Java, etc. if you can live with garbage collection.
Like I said, high level languages that won't benefit from this at all have existed for ages... and the majority of software is written in them. This is one of the stronger arguments against it...
> But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Fil-C shows this can be solved at the software layer for things in this category that can afford the overhead of a GC. Which does mean a larger performance penalty than the hardware proposal, but also more correctly (since hardware changes can never solve unintended compilations resulting from undefined behavior).
The linux kernel is probably an example of an actual long tail project that would benefit from this for a reasonably long time though, since it's not amenable to "use a modified C compiler that eliminates undefined behavior with GC and other clever tricks" and it's unlikely to get rewritten or replaced with a memory safe thing quickly due to the number of companies collaborating on it.
This is the first point in history where it's actually feasible in real non-research life to write an entire system in only memory safe languages. (And no, the existence of `unsafe` doesn't invalidate this point.)
Current CPUs are limited by power, transistors are essentially free.
[1] often the VMs of safer higher level languages, in fact.
We're not, though. There's a little bit of low-level software being written in Rust (and even that requires a non-trivial amount of unsafe code), but most new low-level software is being written in C++ or C. And even if a more popular safe low-level programming language arrived tomorrow and gained a more respectable adoption, that still wouldn't be fast enough because of all the existing software.
There's also been steady progress towards creating systems to prove unsafe rust correct - at which point it wouldn't even benefit from this. For example see the work amazon has been sponsoring to prove the standard library correct: https://github.com/model-checking/verify-rust-std/
There is server hardware out there now that in theory can support MTE, but I don't know if there's commercial support for it. MTE needs to be set up by the firmware, it's not purely an OS/kernel matter.
Interesting thread:
https://grapheneos.social/@GrapheneOS/113223437850603601
https://en.wikipedia.org/wiki/Intel_MPX
https://docs.oracle.com/en/operating-systems/solaris/oracle-...
Intel had a first attempt at this with MPX, but the design had flaws,
https://en.wikipedia.org/wiki/Intel_MPX
Then there is CHERI and the related ARM Morello,
https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
Apple isn't a first here, one first among general public.
I do wonder if Apple‘s announcement that they started shipping it may have pushed them to announce this/agree to a standard earlier than they would have otherwise.
MPX had a large (greater than 15-20%) overhead and was brittle. It didn't play well with other x86 instructions and the developer experience was extremely poor (common C and C++ design patterns would cause memory faults with MPX).
Apple MIE (which is essentially ARM MTE v2) and MTE on the other hand have near invisible levels of overhead (~5-10%) with the ability to alternate between synchronous and asynchronous tracing of faults where the latter has a much lower overhead than the former (allowing you to check in production for very little overhead and get better debugging in test). They also work essentially seamlessly with the rest of the ARM ecosystem and it takes very little to integrate the functionality into any language ecosystem or toolchain.
If MPX was comparable with MTE, it certainly would have gotten the adoption that MTE is getting but the "tech" just wasn't there to justify it's use.