This is super cool. This exploit will be one of the canonical examples that just running something in a VM does not mean it's safe. We've always known about VM breakout, but this is a no-breakout massive exploit that is simple to execute and gives big payoffs.
Remember: just because this one bug gets fixed in microcode doesn't mean there's not another one of these waiting to be discovered. Many (most?) 0-days are known about by black-hats-for-hire well before they're made public.
The problem is, VMs aren't really "Virtual Machines" anymore. You're not parsing opcodes in a big switch statement, you're running instructions on the actual CPU, with a few hardware flags that the CPU says will guarantee no data or instruction overlap. It promises! But that's a hard promise to make in reality.
Just how many times is the average operating system workload (with or without a virtual machine also running a second average operating system workload) context switching a second?
Like... unless I'm wrong... the kernel is the main process, and then it slices up processes/threads, and each time those run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know it's RAX, etc. for 64-bit now)
How many cycles is a thread/process given before it context switches to the next one? How is it managing all of the pushfd/popfd, etc. between them? Is this not how modern operating systems work, am I misunderstanding?
The comparison to Meltdown/Spectre are a bit misleading though - they were a whole new form of attack based on timing where the CPU did exactly what it should have done; This zenbleed case is a good old fashioned bug though - data in a register that shouldn't be.
Running untrusted code whether in a sandbox, container, or VM, has not been safe since at least Rowhammer, maybe before. I believe a lot of these exploits are down to software and hardware people not talking. Software people make assumptions about the isolation guarantees, hardware people don't speak up when said assumptions are made.
Hardware people are the ones making those promises, so I don't think that's right at all. And Rowhammer is a way overstated vulnerability - there are all sorts of practical issues with it, especially if you're on modern, patched hardware.
In the end, I'm thinking most of these are related to branch prediction?
It strikes me that it's either that branch prediction is so inherently complex enough it's always going to be vulnerable to this and/or it just so defies the way most of us intuitively think about code paths / instruction execution that it's hard to conceive of the edge cases until too late?
At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
More generally, most of them are related to speculative execution, where branch mis-prediction is a common gadget to induce speculative mis-execution.
Speculation is hard, it's sort of akin to the idea of introducing multithreading into a program, you are explicitly choosing to tilt at the windmill of pure technical correctness because in a highly concurrent application every error will occur fairly routinely. Speculation is great too, in combination with out-of-order execution it's a multithreading-like boon to overall performance, because now you can resolve several chunks of code in parallel instead of one at a time. It's just also a minefield of correctness issues, but the alternative would be losing something like the equivalent of 10 years of performance gains (going back to like ARM A53 performance).
The recent thing is that "observably correct" needs to include timings. If you can just guess at what the data might be, and the program runs faster if you're correct, that's basically the same thing as reading the data by another means. It's a timing oracle attack.
(in this case AMD just fucked up though, there's no timing attack, this is just implemented wrong and this instruction can speculate against changes that haven't propagated to other parts of the pipeline yet)
The cache is the other problem, modern processors are built with every tenant sharing this single big L3 cache and it turns out that it also needs to be proof against timing attacks for data present in the cache too.
> At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
Never for branch prediction. It just gets you too much performance. If it becomes too much of a problem, the solution is greater isolation of workloads.
>At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
Basically never for anything that's at all CPU-bound, that growth in complexity is really the only thing that's been powering single-threaded CPU performance improvements since Dennard scaling stopped in about 2006 (and by that time they were already plenty complex: by the late 90s and early 2000's x86 CPUs were firmly superscalar, out-of-order, branch-predicting and speculative executing devices). If your workload can be made fast without needing that stuff (i.e. no branches and easily parallelised), you're probably using a GPU instead nowadays.
You can rent one of the Atom Kimsufi boxes (N2800) to experience first hand a cpu with no speculative execution. The performance is dire, but at least it hasn’t gotten worse over the years - they are immune to just about everything
We demanded more performance and we got what we demanded. I doubt manufacturers are going to walk back on branch prediction no matter how flawed it is. They'll add some more mitigations and features which will be broken-on-arrival.
Theres VLIW/'preprediction'/some other technical name I forget for infrastructures which instead ask you to explicitly schedule instruction/data/branch prediction. If I remember, the two biggest examples I can think of were IA64 and Alpha. I wanna think HP-PA did the same but I'm not clear on that one.
For various reasons, all these infras eventually lost out in the market due to market pressure (and cost/watt/IPC, I guess).
Yup! I worked at a few companies that would co-mingle Internet facing/DMZ VMs with internal VMs. When pointing this out and recommending we should airgap these VMs to it's own dedicated hypervisor it always fell on deaf ears. Jokes on them I guess.
The problem is that the logical registers don't have a 1:1 relation to the physical registers.
For example, let's imagine a toy architecture with two registers: r0 and r1. We can create a little assembly snippet using them: "r0 = load(addr1); r1 = load(addr2); r0 = r0 + r1; store(addr3, r0)". Pretty simple.
Now, what happens if we want to do that twice? Well, we get something like "r0 = load(addr1); r1 = load(addr2); r0 = r0 + r1; store(addr3, r0); r0 = load(addr4); r1 = load(addr5); r0 = r0 + r1; store(addr6, r0)". Because there is no overlap between the accessed memory sections, they are completely independent. In theory they could even execute at the same time - but that is impossible because they use the same registers.
This can be solved by adding more physical registers to the CPU, let's call them R0-R6. During execution the CPU can now analyze and rewrite the original assembly into "R1 = load(addr1); R4 = load(addr4); R2 = load(addr2); R5 = load(addr5); R3 = R1 + R2; R6 = R4 + R5; store(addr3, R3); store(addr6, R6)". This means we can now start the loads for the second addition before the first addition is done, which means we have to wait less time for the data to arrive when we finally want to actually do the second addition. To the user nothing has changed and the results are identical!
The issue here is that when entering/exiting a VM you can definitely clear the logical registers r0&r1, but there is no guarantee that you are actually clearing the physical registers. On a hardware level, "clearing a register" now means "mark logical register as empty". The CPU makes sure that any future use of that logical register results in it behaving as if it has been clear, but there is no need to touch the content of the physical register. It just gets marked as "free for use". The only way that physical register becomes available again is after a write, after all, and that write would by definition overwrite the stale content - so clearing it would be pointless. Unless your CPU misbehaves and you run into this new bug, of course.
The problem is the freed entries in the register file. A VM can, at least, use this bug to read registers from a non-VM thread running on the adjacent SMT/HT of a single physical core. I suspect a VM could also read registers from other processes scheduled on the same SMT/HT.
The README in the tar file with the exploit (linked at "If you want to test the exploit, the code is available here") contains some more details, including a timeline:
- `2023-05-09` A component of our CPU validation pipeline generates an anomalous result.
- `2023-05-12` We successfully isolate and reproduce the issue. Investigation continues.
- `2023-05-14` We are now aware of the scope and severity of the issue.
- `2023-05-15` We draft a brief status report and share our findings with AMD PSIRT.
- `2023-05-17` AMD acknowledge our report and confirm they can reproduce the issue.
- `2023-05-17` We complete development of a reliable PoC and share it with AMD.
- `2023-05-19` We begin to notify major kernel and hypervisor vendors.
- `2023-05-23` We receive a beta microcode update for Rome from AMD.
- `2023-05-24` We confirm the update fixes the issue and notify AMD.
- `2023-05-30` AMD inform us they have sent a SN (security notice) to partners.
- `2023-06-12` Meeting with AMD to discuss status and details.
- `2023-07-20` AMD unexpectedly publish patches, earlier than an agreed embargo date.
- `2023-07-21` As the fix is now public, we propose privately notifying major
distributions that they should begin preparing updated firmware
packages.
Which is 0bc3126c9cfa0b8c761483215c25382f831a7c6f and b250b32ab1d044953af2dc5e790819a7703b7ee6
And b250b32ab1d044953af2dc5e790819a7703b7ee6 appears to be the commit I linked ealier at git.kernel.org so hopefully up-to-date Arch is not vulnerable to zenbleed
Either way, as noted elsewhere in the comments, only the Rome CPU series has received updated microcode with fixes. All other Zen 2 users need the fix that was released as part of Linux 6.4.6: https://lwn.net/Articles/939102/
This is incredibly scary. On my Zen 2 box (Ryzen 3600) logging the output of the exploit running as an unprivileged user while copying and pasting a string into a text editor in the background (I used Kate), resulted in pieces of the string being logged into the output of zenbleed. And this is after a few seconds of runtime mind you, not even a full minute.
Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult. Otherwise a nefarious tab left open for hours in the background could exfiltrate without an issue.
I'm eagerly waiting for Fedora maintainers to push the new microcode so the kernel can update it during the boot process.
> Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult. Otherwise a nefarious tab left open for hours in the background could exfiltrate without an issue.
> Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult.
I assume that once/if a method is found it will be applicable broadly though. At the same time, hopefully software patches in V8 and SpiderMonkey will be able to mitigate this further and sooner.
But a JS exploit would require some way to exfiltrate data and presumably doing that would be quite difficult to hide entirely.
I had to run make on the uncompressed folder. Perhaps the build-essential package doesn't come with NASM in Ubuntu? I'll need a bit more info on the error if you want me to try and help you :)
> AMD have released an microcode update for affected processors.
I don't think that is correct. AMD has released a microcode update[0] for family 17h models 0x31 and 0xa0, which corresponds to Rome, Castle Peak and Mendocino as per WikiChip [1].
So far, there seems to be no microcode update for Renoir, Grey Hawk, Lucienne, Matisse and Van Gogh. Fortunately, the newly released kernels can and do simply set the chicken bit for those. [2]
This technique is CVE-2023-20593 and it works on all Zen 2 class processors, which includes at least the following products:
AMD Ryzen 3000 Series Processors
AMD Ryzen PRO 3000 Series Processors
AMD Ryzen Threadripper 3000 Series Processors
AMD Ryzen 4000 Series Processors with Radeon Graphics
AMD Ryzen PRO 4000 Series Processors
AMD Ryzen 5000 Series Processors with Radeon Graphics
AMD Ryzen 7020 Series Processors with Radeon Graphics
AMD EPYC “Rome” Processors
Do they mean "only confirmed on Zen2", or is the problem definitely confined to only this architecture?
Is it likely that this same technique (or similar) also works on earlier (Zen/Zen+) or later (Zen3) cores, but they just haven't been able to demonstrate it yet?
I mean, the PS5 is running a Zen 2 processor [0] so I would assume it's vulnerable. In general I would assume that AAA games are safe. Websites and smaller games made by malefactors will be the issue. (Note that AAA game makers have little interest in antagonizing the audience, OTOH they also will push limits to install anti-cheat mechanisms. On balance I'd trust them.)
You are likely frequently running untrusted workloads. As javascript in a browser. I don't know about this one, but at least meltdown was fully exploitable from js.
It is a simple static HTML page, how is it possible in 2023 a static site could be hugged to death. In most cases HN traffic barely hits 100 page view per second.
It's a security writeup so it's probably run by a security expert who is not an expert at running high traffic websites. Most likely there is something on the page that causes a database hit. Possibly the page content itself.
According to AMD's security bulletin, firmware updates for non-EPYC CPUs won't be released until the end of the year. What should users do until then, disable the chicken bit and take the performance hit?
Presumably classified as severity 'medium' in an attempt to look marginally less negligent when announcing that they can't be bothered to issue microcode updates for most CPU models until Nov or Dec.
Under what circumstances is this not a medium? The only case this applies is if you have public runners running completely untrusted code, and if you're doing that I hope you're doing it on EPYC, which is fixed. And if you're doing that, you're probably mining crypto for randoms.
Remember: just because this one bug gets fixed in microcode doesn't mean there's not another one of these waiting to be discovered. Many (most?) 0-days are known about by black-hats-for-hire well before they're made public.
CPU vulnerabilities found in the past few years:
IBM's VM was and is a hypervisor. It dates to the mid 1960s, in the form of CP-40, and it didn't run opcodes in software, but in hardware.
https://en.wikipedia.org/wiki/IBM_CP-40
p-code machines, which interpret bytecode, date back almost as far, such as the O-code machine for BCPL.
https://en.wikipedia.org/wiki/BCPL
Getting people to distinguish between these concepts is probably a lost cause.
Just how many times is the average operating system workload (with or without a virtual machine also running a second average operating system workload) context switching a second?
Like... unless I'm wrong... the kernel is the main process, and then it slices up processes/threads, and each time those run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know it's RAX, etc. for 64-bit now)
How many cycles is a thread/process given before it context switches to the next one? How is it managing all of the pushfd/popfd, etc. between them? Is this not how modern operating systems work, am I misunderstanding?
It strikes me that it's either that branch prediction is so inherently complex enough it's always going to be vulnerable to this and/or it just so defies the way most of us intuitively think about code paths / instruction execution that it's hard to conceive of the edge cases until too late?
At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
Speculation is hard, it's sort of akin to the idea of introducing multithreading into a program, you are explicitly choosing to tilt at the windmill of pure technical correctness because in a highly concurrent application every error will occur fairly routinely. Speculation is great too, in combination with out-of-order execution it's a multithreading-like boon to overall performance, because now you can resolve several chunks of code in parallel instead of one at a time. It's just also a minefield of correctness issues, but the alternative would be losing something like the equivalent of 10 years of performance gains (going back to like ARM A53 performance).
The recent thing is that "observably correct" needs to include timings. If you can just guess at what the data might be, and the program runs faster if you're correct, that's basically the same thing as reading the data by another means. It's a timing oracle attack.
(in this case AMD just fucked up though, there's no timing attack, this is just implemented wrong and this instruction can speculate against changes that haven't propagated to other parts of the pipeline yet)
The cache is the other problem, modern processors are built with every tenant sharing this single big L3 cache and it turns out that it also needs to be proof against timing attacks for data present in the cache too.
Never for branch prediction. It just gets you too much performance. If it becomes too much of a problem, the solution is greater isolation of workloads.
Basically never for anything that's at all CPU-bound, that growth in complexity is really the only thing that's been powering single-threaded CPU performance improvements since Dennard scaling stopped in about 2006 (and by that time they were already plenty complex: by the late 90s and early 2000's x86 CPUs were firmly superscalar, out-of-order, branch-predicting and speculative executing devices). If your workload can be made fast without needing that stuff (i.e. no branches and easily parallelised), you're probably using a GPU instead nowadays.
For various reasons, all these infras eventually lost out in the market due to market pressure (and cost/watt/IPC, I guess).
For example, let's imagine a toy architecture with two registers: r0 and r1. We can create a little assembly snippet using them: "r0 = load(addr1); r1 = load(addr2); r0 = r0 + r1; store(addr3, r0)". Pretty simple.
Now, what happens if we want to do that twice? Well, we get something like "r0 = load(addr1); r1 = load(addr2); r0 = r0 + r1; store(addr3, r0); r0 = load(addr4); r1 = load(addr5); r0 = r0 + r1; store(addr6, r0)". Because there is no overlap between the accessed memory sections, they are completely independent. In theory they could even execute at the same time - but that is impossible because they use the same registers.
This can be solved by adding more physical registers to the CPU, let's call them R0-R6. During execution the CPU can now analyze and rewrite the original assembly into "R1 = load(addr1); R4 = load(addr4); R2 = load(addr2); R5 = load(addr5); R3 = R1 + R2; R6 = R4 + R5; store(addr3, R3); store(addr6, R6)". This means we can now start the loads for the second addition before the first addition is done, which means we have to wait less time for the data to arrive when we finally want to actually do the second addition. To the user nothing has changed and the results are identical!
The issue here is that when entering/exiting a VM you can definitely clear the logical registers r0&r1, but there is no guarantee that you are actually clearing the physical registers. On a hardware level, "clearing a register" now means "mark logical register as empty". The CPU makes sure that any future use of that logical register results in it behaving as if it has been clear, but there is no need to touch the content of the physical register. It just gets marked as "free for use". The only way that physical register becomes available again is after a write, after all, and that write would by definition overwrite the stale content - so clearing it would be pointless. Unless your CPU misbehaves and you run into this new bug, of course.
Wouldn't we be able to avoid the "big payoffs" of no-breakout exploits if we had specialized hardware handle the secrets?
Deleted Comment
- `2023-05-09` A component of our CPU validation pipeline generates an anomalous result.
- `2023-05-12` We successfully isolate and reproduce the issue. Investigation continues.
- `2023-05-14` We are now aware of the scope and severity of the issue.
- `2023-05-15` We draft a brief status report and share our findings with AMD PSIRT.
- `2023-05-17` AMD acknowledge our report and confirm they can reproduce the issue.
- `2023-05-17` We complete development of a reliable PoC and share it with AMD.
- `2023-05-19` We begin to notify major kernel and hypervisor vendors.
- `2023-05-23` We receive a beta microcode update for Rome from AMD.
- `2023-05-24` We confirm the update fixes the issue and notify AMD.
- `2023-05-30` AMD inform us they have sent a SN (security notice) to partners.
- `2023-06-12` Meeting with AMD to discuss status and details.
- `2023-07-20` AMD unexpectedly publish patches, earlier than an agreed embargo date.
- `2023-07-21` As the fix is now public, we propose privately notifying major distributions that they should begin preparing updated firmware packages.
- `2023-07-24` Public disclosure.
> As the fix is now public, we propose privately notifying major distributions that they should begin preparing updated firmware packages.
AMD had to drop the ball somewhere didn't it.
amd-ucode 20230625.ee91452d-5
last updated 2023-07-25 11:48 UTC
Contains the microcode update that addresses this?
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin... says that the fixed version is 2023-07-18, but the amd-ucode version in Arch is 20230625.. but it was last updated in 2023-07-25..
My guess is that this is still getting the 20230625 firmware, per the PKGBUILD at https://gitlab.archlinux.org/archlinux/packaging/packages/li...
Which contains those lines
_tag=20230625
source=("git+https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...")
I suppose that it isn't up to date and thus Arch Linux is still vulnerable, right?
edit:
but actually there's two commits in the _backports array (which contains cherry-picked commits) that was last edited 20 hours ago
https://gitlab.archlinux.org/archlinux/packaging/packages/li...
Which is 0bc3126c9cfa0b8c761483215c25382f831a7c6f and b250b32ab1d044953af2dc5e790819a7703b7ee6
And b250b32ab1d044953af2dc5e790819a7703b7ee6 appears to be the commit I linked ealier at git.kernel.org so hopefully up-to-date Arch is not vulnerable to zenbleed
Either way, as noted elsewhere in the comments, only the Rome CPU series has received updated microcode with fixes. All other Zen 2 users need the fix that was released as part of Linux 6.4.6: https://lwn.net/Articles/939102/
(which has been built and packaged for Arch)
Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult. Otherwise a nefarious tab left open for hours in the background could exfiltrate without an issue.
I'm eagerly waiting for Fedora maintainers to push the new microcode so the kernel can update it during the boot process.
At least one commentor here claims to be able to reproduce this with javascript: https://news.ycombinator.com/item?id=36849767 .
I assume that once/if a method is found it will be applicable broadly though. At the same time, hopefully software patches in V8 and SpiderMonkey will be able to mitigate this further and sooner.
But a JS exploit would require some way to exfiltrate data and presumably doing that would be quite difficult to hide entirely.
https://news.ycombinator.com/item?id=36838511
I don't think that is correct. AMD has released a microcode update[0] for family 17h models 0x31 and 0xa0, which corresponds to Rome, Castle Peak and Mendocino as per WikiChip [1].
So far, there seems to be no microcode update for Renoir, Grey Hawk, Lucienne, Matisse and Van Gogh. Fortunately, the newly released kernels can and do simply set the chicken bit for those. [2]
[0] https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...
[1] https://en.wikichip.org/wiki/amd/cpuid#Family_23_.2817h.29
[2] https://github.com/torvalds/linux/commit/522b1d69219d8f08317...
`good_revs` as per the kernel: https://github.com/torvalds/linux/commit/522b1d69219d8f08317...
Currently published revs ("Patch") (git HEAD):
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...
As of this writing, only two of the five `good_rev`s have been published.
That's the same codename Intel used for Celerons 24 years ago, the ones famous for 50% overclocks:
https://ark.intel.com/content/www/us/en/ark/products/codenam...
This technique is CVE-2023-20593 and it works on all Zen 2 class processors, which includes at least the following products:
Is it likely that this same technique (or similar) also works on earlier (Zen/Zen+) or later (Zen3) cores, but they just haven't been able to demonstrate it yet?
and also xbox and that thing from valve?
0 - https://blog.playstation.com/2020/03/18/unveiling-new-detail...
I have an "AMD Ryzen 9 5950x Desktop Processor" which appears to be Zen 3. I think I'm good?
(Not that I'm running untrusted workloads, but yknow, fortune favors the prepared)
But yes, you are fine, 5950x is Zen3.
The above are desktop. If they meant APUs, it would list "Ryzen 3000 Series Processors with Radeon Graphics."
It's a single-core 128 MB VPS, which seemed fine for my boring static html articles. I guess I underestimated the interest.
According to AMD's security bulletin, firmware updates for non-EPYC CPUs won't be released until the end of the year. What should users do until then, disable the chicken bit and take the performance hit?