It's pretty interesting that they didn't just introduce an RCE that anyone can exploit, it requires the attacker's private key. It's ironically a very security conscious vulnerability.
I suspect the original rationale is about preserving the longevity of the backdoor. If you blow a hole wide open that anyone can enter, it’s going to be found and shut down quickly.
If this hadn’t had the performance impact that brought it quickly to the surface, it’s possible that this would have lived quietly for a long time exactly because it’s not widely exploitable.
I agree that this is probably about persistence. Initially I thought the developer was playing the long-con to dump some crypto exchange and make off with literally a billion dollars or more.
But if that was the case they wouldn't bother with the key. It'd be a one-and-done situation. It would be a stop-the-world event.
If you think of it as a state sponsored attack it makes a lot of sense to have a "secure" vulnerability in system that your own citizens might use.
It looks like the whole contribution to xz was an effort to just inject that backdoor. For example the author created the whole test framework where he could hide the malicious payload.
Before he started work on xz, he made contribution to libarchive in BSD which created a vulnerability.
For real, it's almost like a state-sponsored exploit. It's crafted and executed incredibly well, the performance issue feels like pure luck it got found.
I like the theory that actually, it wasn’t luck but was picked up on by detection tools of a large entity (Google / Microsoft / NSA / whatever), and they’re just presenting the story like this to keep their detection methods a secret. It’s what I would do.
I don't think it was executed incredibly well. There were definitely very clever aspects but they made multiple mistakes - triggering Valgrind, the performance issue, using a `.` to break the Landlock test, not giving the author a proper background identity.
I guess you could also include the fact that they made it a very obvious back door rather than an exploitable bug, but that has the advantage of only letting you exploit it so it was probably an intentional trade-off.
Just think how many back doors / intentional bugs there are that we don't know about because they didn't make any of these mistakes.
one question I still have is what exactly the performance issue was? I heard it might be related to enumeration of shared libraries, decoding of the scrambled strings[1], etc. anyone know for sure yet?
one other point for investigation is if the code is similar to any other known implants? like the way it obfuscates strings, the way it detects debuggers, the way its setting up a vtable, there might be code fragments shared across projects. Which might give clues about its origin.
I read someone speculating that the performance issue was intentional, so infected machines could be easily identified by an internet wide scan without arousing further suspicicion.
If this is or becomes a widespread method, then anti-malware groups should perhaps conduct these scans themselves.
I'm not sure why everyone is 100% sure this was a state-sponsored security breach. I agree that it's more likely than not state-sponsored, but I can imagine all sorts of other groups who would have an interest in something like this, organized crime in particular. Imagine how many banks or crypto wallets they could break into with a RCE this pervasive.
Was the performance issue pure luck? Or was it a subtle bit of sabotage by someone inside the attacking group worried about the implications of the capability?
If it had been successfully and secretly deployed, this is the sort of thing that could make your leaders much more comfortable with starting a "limited war".
This is not that concept. That concept is no one but us can technically complete the exploit. Technical feasibility in that you need a supercomputer to do it, not protecting a backdoor with the normal cia triad
Am I reading it correctly that the payload signature includes the target SSH host key? So you can't just spray it around to servers, it's fairly computationally expensive to send it to a host.
It's a (failed) case study in "what if we backdoor it in a way only good guys can use but bad guys can't?"
Computers don't know who's / what's good or bad. They're deterministic machines which responds to commands.
I don't know whether there'll be any clues about who did this, but this will be the poster child of "you can't have backdoors and security in a single system" argument (which I strongly support).
IMHO it's not that surprising; asymmetric crypto has been common in ransomware for a long time, and of course ransomware in general is based on securing data from its owner.
This whole thing has been consuming me over the whole weekend. The mechanisms are interesting and a collection of great obfuscations, the social engineering is a story that’s shamefully all too familiar for open source maintainers.
I find most interesting how they chose their attack vector of using BAD test data, it makes the rest of the steps incredibly easier when you have a good archive, manipulate it in a structured method (this should show on a graph of the binary pattern btw for future reference) then use it for a fuzzy bad data test. It’s great.
The rest of the techniques are banal enough except the most brilliant move seems to be that they could add “patches” or even whole new back doors using the same pattern on a different test file. Without being noticed.
Really really interesting, GitHub shouldn’t have hidden and removed the repo though. It’s not helpful at all to work through this whole drama.
Edit: I don’t mean to say this is banal in any way, but once the payload was decided and achieved through a super clever idea, the rest was just great obfuscation.
It’s got me suspicious of build-time dependency we have in an open source tool, where the dependency goes out of its way to prefer xz and we even discovered that it installs xz on the host machine if it isn’t already installed — as a convenience. Kinda weird because it didn’t do that for any other dependencies.
These long-games are kinda scary and until whatever “evil” is actually done you have no idea what is actually malicious or just weird.
> It’s got me suspicious of build-time dependency we have in an open source tool, where the dependency goes out of its way to prefer xz and we even discovered that it installs xz on the host machine if it isn’t already installed — as a convenience. Kinda weird because it didn’t do that for any other dependencies.
Have you considered reaching out to the maintainers of that project and (politely) asking them to explain? In lieu of recent events I don't think anyone would blame you, in fact you might even suggest they explain such an oddly specific side effect in a README or such.
A main culprit seems to be the addition of binary files to the repo, to be used as test inputs. Especially if these files are “binary garbage” to prove a test fails. Seems like an obvious place to hide malicious stuff.
It is an obvious place for sure, but it also would have been picked up if the builds where a bit more transperant. That batch build script should have been questioned before approval.
Super impressed how quickly the community and in particular amlweems were able to implement and document a POC. If the cryptographic or payload loading functionality has no further vulnerabilities, this would have been also at least not introducing a security flaw to all the other attackers until the key is broken or something.
Edit: I think what's next for anyone is to figure out a way to probe for vulnerable deployments (which seems non-trivial) and also perhaps possibly ?upstreaming? a way to monitor if someone actively probes ssh servers with the hardcoded key.
Well, it's a POC against a re-keyed version of the exploit; a POC against the original version would require the attacker's private key, which is undisclosed.
Probing for vulnerable deployments over the network (without the attacker's private key) seems impossible, not non-trivial.
The best one could do is more micro-benchmarking, but for an arbitrary Internet host you aren't going to know whether it's slow because it's vulnerable, or because it's far away, or because the computer's slow in general -- you don't have access to how long connection attempts to that host took historically. (And of course, there are also routing fluctuations.)
Should be able to do it by having the scanner take multiple samples. As long as you don’t need a valid login and the performance issue is still observable, you should be about to scan for it with minimal cost
Has anyone tried the PoC against one of the anomalous process behavior tools? (Carbon Black, AWS GuardDuty, SysDig, etc.) I’m curious how likely it is that someone would have noticed relatively quickly had this rolled forward and this seems like a perfect test case for that product category.
Depends how closely the exploit mirrors and/or masks itself within normal compression behavior imo.
I don’t think GuardDuty would catch it as it doesn’t look at processes like an EDR does (CrowdStrike, Carbon black), I don’t think sysdig would catch it as looks at containers and cloud infra. Handwaving some complexity here, as GD and sysdig could prob catch something odd via privileges gained and follow-on efforts by the threat actor via this exploit.
So imo means only EDRs (monitoring processes on endpoints) or software supply chain evaluations (monitoring sec problems in upstream FOSS) are most likely to catch the exploit itself.
Leads into another fairly large security theme interestingly - dev teams can dislike putting EDRs on boxes bc of the hit on compute and UX issues if a containment happens, and can dislike limits policy and limits around FOSS use. So this exploit hits at the heart of a org-driven “vulnerability” that has a lot of logic to stay exposed to or to fix, depending on where you sit. Security industry’s problem set in a nutshell.
The main thing I was thinking is that the audit hooking and especially runtime patching across modules (liblzma5 patching functions in the main sshd code block) seems like the kind of thing a generic behavioral profile could get but especially one driven by the fact that sshd does not do any of that normally.
And, yes, performance and reliability issues are a big problem here. When CarbonBlack takes down production again, you probably end up with a bunch of exclusions which mean an actual attacker might be missed.
Sysdig released a blog on friday. "For runtime detection, one way to go about it is to watch for the loading of the malicious library by SSHD. These shared libraries often include the version in their filename."
The blog has the actual rule content which I haven't seen from other security vendors
That relies on knowing what to look for. I.e. "the malicious library". The question is whether any of these solutions could catch it without knowing about it beforehand and having a detection rule specifically made for it.
Thanks! That’s a little disappointing since I would have thought that the way it hooked those functions could’ve been caught by a generic heuristic but perhaps that’s more common than I thought.
Edit: I misunderstood what I was reading in the link below, my original comment is here for posterity. :)
> From down in the same mail thread: it looks like the individual who committed the backdoor has made some recent contributions to the kernel as well... Ouch.
No that patch series is from Lasse. He said himself that it's not urgent in any way and it won't be merged this merge window, but nobody (sane) is accusing Lasse of being the bad actor.
1) No-one has been is proven to "be" anyone in this case. Reputation is OSS is built upon behaviour only, not identity. "Jia Tan" managed to tip the scales by also being helpful. That identity is 99% likely to be a confection.
2) People can do terrible things when strongly encouraged or worse coerced. Including dissolving identity boundaries.
The first problem can be 'solved' by using real identities and web of trust but that will NEVER fly in OSS for a multitude of technical and social reasons. The second problem will simply never be solved in any context, OSS or otherwise. Bad actors be bad, yo.
The parallels in this one to the audacity event a couple years back are ridiculous.
Cookie guy claimed that he got stabbed and that the federal police was involved in the case, which kind of hints that the events were connected to much bigger actors than just 4chan. At the time a lot of people thought its just Muse Group that's involved, but maybe it was a (Russian) state actor?
Because before that he claimed that audacity had lots of telemetry/backdoors which were the reason he forked and removed in his first commits. Maybe audacity is backdoored after all?
When loading liblzma, it patches the ELF GOT (global offset table) with the address of the malicious code. In case it's loaded before libcrypto, it registers a symbol audit handler (a glibc-specific feature, IIUC) to get notified when libcrypto's symbols are resolved so it can defer patching the GOT.
Do we know if this exploit only did something if a SSH connection was made? There's a list of strings from it on Github that includes "DISPLAY" and "WAYLAND_DISPLAY":
These don't have any obvious connection to SSH, so maybe it did things even if there was no connection. This could be important to people who ran the code but never exposed their SSH server to the Internet, which some people seem to be assuming was safe.
Those are probably kill switches to prevent the exploit from working if there is a terminal open or runs in a GUI session. In other words someone trying to detect, reproduce or debug it.
Could that be related x11 session forwarding (common security hole on the connectors' side if they don't turn it off when connecting to an untrusted machine).
If this hadn’t had the performance impact that brought it quickly to the surface, it’s possible that this would have lived quietly for a long time exactly because it’s not widely exploitable.
The tradeoff is that, once you find it, it's very clearly a backdoor. No way you can pretend this was an innocent bug.
But if that was the case they wouldn't bother with the key. It'd be a one-and-done situation. It would be a stop-the-world event.
Now it looks more like nation-state spycraft.
Deleted Comment
Deleted Comment
It looks like the whole contribution to xz was an effort to just inject that backdoor. For example the author created the whole test framework where he could hide the malicious payload.
Before he started work on xz, he made contribution to libarchive in BSD which created a vulnerability.
I guess you could also include the fact that they made it a very obvious back door rather than an exploitable bug, but that has the advantage of only letting you exploit it so it was probably an intentional trade-off.
Just think how many back doors / intentional bugs there are that we don't know about because they didn't make any of these mistakes.
one other point for investigation is if the code is similar to any other known implants? like the way it obfuscates strings, the way it detects debuggers, the way its setting up a vtable, there might be code fragments shared across projects. Which might give clues about its origin.
[1] https://gist.github.com/q3k/af3d93b6a1f399de28fe194add452d01
If this is or becomes a widespread method, then anti-malware groups should perhaps conduct these scans themselves.
If it had been successfully and secretly deployed, this is the sort of thing that could make your leaders much more comfortable with starting a "limited war".
There are shades of "Setec Astronomy" here.
It's practically a good backdoor then, crypto graphically protected and safe against "re-play" attacks.
Computers don't know who's / what's good or bad. They're deterministic machines which responds to commands.
I don't know whether there'll be any clues about who did this, but this will be the poster child of "you can't have backdoors and security in a single system" argument (which I strongly support).
"It's not only the good guys who have guns."
Dead Comment
I find most interesting how they chose their attack vector of using BAD test data, it makes the rest of the steps incredibly easier when you have a good archive, manipulate it in a structured method (this should show on a graph of the binary pattern btw for future reference) then use it for a fuzzy bad data test. It’s great.
The rest of the techniques are banal enough except the most brilliant move seems to be that they could add “patches” or even whole new back doors using the same pattern on a different test file. Without being noticed.
Really really interesting, GitHub shouldn’t have hidden and removed the repo though. It’s not helpful at all to work through this whole drama.
Edit: I don’t mean to say this is banal in any way, but once the payload was decided and achieved through a super clever idea, the rest was just great obfuscation.
These long-games are kinda scary and until whatever “evil” is actually done you have no idea what is actually malicious or just weird.
Have you considered reaching out to the maintainers of that project and (politely) asking them to explain? In lieu of recent events I don't think anyone would blame you, in fact you might even suggest they explain such an oddly specific side effect in a README or such.
Edit: I think what's next for anyone is to figure out a way to probe for vulnerable deployments (which seems non-trivial) and also perhaps possibly ?upstreaming? a way to monitor if someone actively probes ssh servers with the hardcoded key.
Kudos!
The best one could do is more micro-benchmarking, but for an arbitrary Internet host you aren't going to know whether it's slow because it's vulnerable, or because it's far away, or because the computer's slow in general -- you don't have access to how long connection attempts to that host took historically. (And of course, there are also routing fluctuations.)
Deleted Comment
I don’t think GuardDuty would catch it as it doesn’t look at processes like an EDR does (CrowdStrike, Carbon black), I don’t think sysdig would catch it as looks at containers and cloud infra. Handwaving some complexity here, as GD and sysdig could prob catch something odd via privileges gained and follow-on efforts by the threat actor via this exploit.
So imo means only EDRs (monitoring processes on endpoints) or software supply chain evaluations (monitoring sec problems in upstream FOSS) are most likely to catch the exploit itself.
Leads into another fairly large security theme interestingly - dev teams can dislike putting EDRs on boxes bc of the hit on compute and UX issues if a containment happens, and can dislike limits policy and limits around FOSS use. So this exploit hits at the heart of a org-driven “vulnerability” that has a lot of logic to stay exposed to or to fix, depending on where you sit. Security industry’s problem set in a nutshell.
The main thing I was thinking is that the audit hooking and especially runtime patching across modules (liblzma5 patching functions in the main sshd code block) seems like the kind of thing a generic behavioral profile could get but especially one driven by the fact that sshd does not do any of that normally.
And, yes, performance and reliability issues are a big problem here. When CarbonBlack takes down production again, you probably end up with a bunch of exclusions which mean an actual attacker might be missed.
The blog has the actual rule content which I haven't seen from other security vendors
https://sysdig.com/blog/cve-2024-3094-detecting-the-sshd-bac...
Dead Comment
> From down in the same mail thread: it looks like the individual who committed the backdoor has made some recent contributions to the kernel as well... Ouch.
https://www.openwall.com/lists/oss-security/2024/03/29/10
The OP is such great analysis, I love reading this kind of stuff!
1) No-one has been is proven to "be" anyone in this case. Reputation is OSS is built upon behaviour only, not identity. "Jia Tan" managed to tip the scales by also being helpful. That identity is 99% likely to be a confection.
2) People can do terrible things when strongly encouraged or worse coerced. Including dissolving identity boundaries.
The first problem can be 'solved' by using real identities and web of trust but that will NEVER fly in OSS for a multitude of technical and social reasons. The second problem will simply never be solved in any context, OSS or otherwise. Bad actors be bad, yo.
This style of fake doubt is really not appropriate anywhere.
Cookie guy claimed that he got stabbed and that the federal police was involved in the case, which kind of hints that the events were connected to much bigger actors than just 4chan. At the time a lot of people thought its just Muse Group that's involved, but maybe it was a (Russian) state actor?
Because before that he claimed that audacity had lots of telemetry/backdoors which were the reason he forked and removed in his first commits. Maybe audacity is backdoored after all?
Have to check the audacity source code now.
How did the exploit do this at runtime?
I know the chain was:
opensshd -> systemd for notifications -> xz included as transient dependency
How did liblzma.so.5.6.1 hook/patch all the way back to openssh_RSA_verify when it was loaded into memory?
How was this part obfuscated/undetected?
Deleted Comment
Deleted Comment
https://gist.github.com/q3k/af3d93b6a1f399de28fe194add452d01
These don't have any obvious connection to SSH, so maybe it did things even if there was no connection. This could be important to people who ran the code but never exposed their SSH server to the Internet, which some people seem to be assuming was safe.