The goal is to use a standardized test framework to ease writing of tests in XZ.
Much of the functionality remains untested, so it will be helpful for long term project stability to have more tests
-- Jia, 2022-06-17
What I haven't seen discussed much is the linking mechanism that allowed the lib to hook into RSA_public_decrypt. Plenty of talk about what could or could not be achieved by even more process separation and the like, but little about that function call redirect. Could it be possible to establish a way to link critical components like the code for incoming ssh with libraries in some tiered trust way? "I trust you when and where I call you, but I won't allow you to introduce yourself to other call sites"?
This would surely fall into the category of "there would be ways around it, so why bother?" that triggers a "by obscurity" reflex in many, but I'd consider it reduced attack surface.
An Erlang actor like model where a program’s sub components call each other by message passing with each having its own security context may work.
However, there are multiple security contexts at play in an operating system; with regards to the XZ backdoor it’s mostly about capability based security at the module level, but you also have capabilities at the program level, and isolation at the memory level (paging), isolation at the micro architectural level, and so on. Ensuring all of these elements work together while still delivering performance seems to be rather challenging, and it’s definitely not possible for the Unix-likes to make a move to such a model because of the replacement of the concept of processes.
From what I can tell the problem is the use of glibc's IFUNC. This broke the testing for XZ, suddenly accounts appeared to lobby for disabling that testing, which enabled the exploit.
IFUNC is arguably not the real issue. IFUNC was used to create a function that will be called on library load (since you need a resolver function to decide which function to map in). There are other ways to create "callback on library load" as well. I think ifunc was actually used for obfuscation instead. Jia Tan created a somewhat plausible scenario of using ifunc's to select which CRC function at load time to use for performance reasons (whether that actually increases performance is arguable but at least plausible). The malicious version swapped the resolver function to be a malicious one but either way it's just a way to create a function that can be called on library init.
The actual hook for intercepting the call was done via audit hooks.
So I guess it's really two things working together.
Essentially the code patches ifunc resolvers to call into a supplied malicious object by surreptitiously modifying c files at build time (when ./configure && make is run) and links the compromised object files with a `now` linker flag that causes the ifuncs to be resolved immediately at library load time, which calls the patched resolvers while the process linkage table is still writable, which is the important part that allows them to just hijack the RSA_public_decrypt function in memory when the library is loaded.
Why do they say "almost" infected the world? At least 3 quite popular Linux distributions (arch, gentoo, and opensuse tumbleweed) ended up shipping the backdoor _for weeks_ , and it was most definitely working in at least tumbleweed. For weeks! A backdoored ssh! Hardly "almost".
Arch and Gentoo are fairly popular as hobbyist distributions but they’re far less common in professional use, especially for the servers running SSH which this attack targeted. That doesn’t mean what happened is in any way okay but if this hadn’t been noticed long enough to make it into RHEL or Debian/Ubuntu stable you would be hearing about it in notifications from your bank, healthcare providers, etc. A pre-auth RCE would mean anyone who doesn’t have a tightly-restricted network and robust flow logging would struggle to say that they hadn’t been affected.
aye, this. RHEL is the industry standard and if you're not using that because you want Enterprise Support than you're using a derivative like Fedora, CentOS, or Rocky. Or else you hang out in the .deb side and use Debian or Ubuntu.
Arch is popular with a niche group of end users, but that ain't what most enterprise architectures are working on.
It doesn't seem to have been actually included in the Arch (binary) package but only because the backdoor build system itself didn't include the backdoor for Arch. If you cmp -l liblzma.so.5.6.1 between xz-5.6.1-1 and xz-5.6.1-2 there are only tiny differences. I'm guessing they didn't notice this before writing the advisory.
That’s because one of the first stages of the backdoor, that’s actually running at build time, checks if the distro is DEB or RPM based and aborts otherwise
I think the message would more likely be "don't use open source and pay for closed source" than "give money to open source and cross your fingers that it does something".
Arch and Gentoo were also not supported, although the code shipped, because the exploit explicitly checked for RPM- and deb-packaged distros.
Suse is RPM based, but don’t remember whether the check was for the utilities or another method — Suse uses zypper for package management, as opposed to yum/dnf on the far more popular RedHat-based distros, so it depends how the exploit checked.
I bet $101 that we find something similar in the wild in the next 12 months as the maintainers start to look at each other's past commits with suspicion.
I wonder if we'll find the cases that were done and used, because if I had something like this and it worked, afterwards I'd "find it" with another account and get it fixed ...
1. Source distribution tarballs that contain code different from what's in the source repository are bad, we should move away from them. The other big supply chan attack (event-stream) also took advantage of something similar.
1a. As a consequence of (1) autogenerated artifacts should always be committed.
2. Autogenerated artifacts that everyone pagedowns over during code reviews is a problem. If you have this type of stuff in your repository also have an automatic test that checks that nobody tampered with it (it will also keep you from having stale autogenerated files in your repository).
3. A corollary of (1) and (2) is that autotools is bad and the autotools culture is bad.
4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.
5. In general there's a culture that code reuse is always good, that depending on large libraries for small amounts of functionality is good. This is not true, dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in.
6. Distro maintainers applying substantial patches to packages is a problem, it creates widely used de facto forks for libraries and applications that do not have real maintainers looking at them.
7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.
8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.
> 8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.
That won't help. There's no evidence that Jia Tan is a real name, or even a real person for that matter. If projects stop accepting contributions from asian-sounding names, the next attack will just use Richard Jones as a name.
I think you could interpret this as "you need to know, personally, the party you're handing this off to, and make reasonable judgments as to whether or not they could be easily compromised by bad actors".
Like, meeting someone at several dev conferences should be a requirement at the very least.
I don't think that #8 implies that projects should stop accepting contributions from Asian-sounding names. To me it means that people should be more careful who they give access. It doesn't matter if it was China or some other state or organization pretended to be China, the problem is that people don't expect that open source contributor wouldn't act in altruistic way, but can be a malicious entity.
> 4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.
I couldn't agree more. Coming from the BSD world, systemd is a shock to the system; it's monstrous and has tendrils everywhere.
I read recently that systemd does not recommend that you link to libsystemd to participate in systemd-notify message passing. The API for talking to it is quite simple, and vendors are encouraged to implement a compliant interface rather than loading all of libsystemd into your program to manage this. This of course would mean maintaining your own compliant interface as API changes happen, which is likely why it isn't done more frequently. It seems to me that there would be a lot of value in systemd stubbing out libraries for the various functions so that dependent projects could link to the specific parts of systemd it needs. That, or some other way to configure what code gets load when linking libsystemd. Full disclosure, I've not looked at libsystemd to see if this is already possible or if there are other recommendations by the project.
If you actually come from BSD, you'd hopefully recognize a set of different utilities combined to form a holistic system released under a single name. It's not a new idea.
Besides, the gpp is incorrect: systemd dependencies are not needed for initialisation notifications.
8. Consumers are naive, yes. But the software industry itself is naive about the security threat.
9. The social exploit is part of the code exploit.
10. The FOSS axiom "More Eyes On The Code" works, but only if the "eyes" are educated. FOSS needs material support from industry. A MSFT engineer caught this exploit, but it still was released to G.A. in Fedora 41, openSUSE, and Kali.
11. The dev toolchain and testing process were never conceived to test for security. (edit: Also see Solarwinds [1] )
That's the whole problem right there: lack of eyes on the code. If this code was actually maintained by more than one person, there's a high chance one of them would have caught on to it.
> 10. The FOSS axiom "More Eyes On The Code" works, but only if the "eyes" are educated.
One thing that could help with this is if somebody points an LLM at all these foundational repositories, prompted with "does this code change introduce any security issues?".
Re: 1. People keep saying this. We should stop distributing tarballs. It's an argument that completely ignores why we have release artifacts in the first place. A release artifact contains more than just autoconf scripts.
There can be many reasons to include binary blobs in a release archive. Game resources, firmware images, test cases. There was today a comment that mpv includes parts of media files generated with proprietary encoders as test cases. That's good, not bad.
The well maintained library sqlite is everywhere, and has an excellent test suite. They release not one but two tarballs with every release, for different stages of compilation. It would be trivial to stop doing this, but it would make maintaining packages more work, which does nothing to improve security.
The reason Debian builds from curated tarballs are because they are curated by a human, and signed with a well known key. They could certainly build from git instead. But would that improve the situation? Not all projects sign their release tags. And for those that do, it is more likely to be automated. We the collective want changes to be vetted first by the upstream maintainer, then by the package maintainer, and would prefer these entities to be unrelated.
This time the process was successfully attacked by a corrupt upstream maintainer, but that does not mean we should do away with upstream maintainers. Several backdoor attempts have been stopped over the years by this arrangement and that process is not something we should throw away without careful consideration.
The same improvements we have been talking about for years must continue: We should strive for more reproducible builds. We should strive for lower attack surface and decrease build complexity when possible. We should trust our maintainers, but verify their work.
>4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.
They never did. In fact the systemd maintainers are confused on that point and adding documentation on how to implement the simple datagram without libsystemd.
>7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.
Way more than tens of millions. Python, php, ruby and many other languages depend on libxml2, libxml2 uses liblzma. And there's many other dependencies.
>8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.
Not any maintainer's job. OSS is provided without warranty. Also indication is "Jia Tan" may have been completely fake as their commit timestamps show even on the same day that their timezone switches from Eastern Europe to Asia. So at the very least, they were playing identity games.
I said this days ago, but re timezones - they are meaningless as even GCHQ and NSA etc will place false flags in code which has any kind of risk of exposure. I first learned about those techniques from all the high profile intelligence agency leaks from the USA who were performing those themselves.
> 1a. As a consequence of (1) autogenerated artifacts should always be committed.
Why don't object files and binaries count as autogenerated artifacts? Should we commit those to the repo too? Where is the line between an artifact that should be committed, and one that shouldn't be?
> 4. Libsystemd is a problem for the ecosystem.
libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?
Those two currently have a much better guaranteed quality than systemd, thus systemd is a much more pressing issue. But they don't stop being a problem just because they are not the largest one.
> Why don't object files and binaries count as autogenerated artifacts? Should we commit those to the repo too? Where is the line between an artifact that should be committed, and one that shouldn't be?
I'd say anything that is input to the compiler should be.
> libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?
> libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?
IMO yes. I definitely believe having basic common functionality (malloc, printf, memcpy etc.) provided by one library with all the crazy/obscure stuff that very few people need or want somewhere else would be an improvement.
There are so many bugs like the "added dot to disable landlock" added as part of this action (which can also be typos [0]), not to mention that relying on some tools in autoconf to set feature flags will just disable them if those tools are not present [1].
I don’t understand how this is still the best way to test if features are available in C. Can’t the OS / environment provide a “features_available” JSON blob listing all the features on the host system? Is AVX2 available on the cpu? OpenSSL? (And if so, where?) and what about kernel features like io_uring?
Doing haphazard feature detection by test compiling random hand written C programs in a giant sometimes autogenerated configure script is an icon of everything wrong with Unix.
This hack shows that the haphazard mess of configure isn’t just ugly. It’s also a pathway for malicious people to sneak backdoors into our projects and our computers. It’s time to move on.
I'd add a 9: performance differences can indicate code differences. Without the 0.5s startup delay being noticed the backdoor wouldn't have been found. It would be much easier to backdoor low-performance software that takes several seconds to start than something that starts nearly instantly.
> 1a. As a consequence of (1) autogenerated artifacts should always be committed.
I philosophically and fundamentally hate this suggestion, but have to agree with it. It's going to make porting harder, but is sadly a cost worth paying.
> dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in
Tough call. A major library is more likely to be bug fixed and tuned than something you write (which is a good reason to use them which is what makes them attractive as an attack vector). Getting this right requires taste and experience. The comment says "depending on large libraries for small amounts of functionality [is bad but thought to be good]". What constitutes "small amount" vs large requires experience. Certainly cases of this tip my bias towards re-implement vs re-use.
9. We should move toward formal verification for the trusted core of systems (compilers, kernel, drivers, networking, systemd/rc, and access control).
With regard to 1, there are some other practical steps to take. Use deterministic builds and isolate the compilation and linking steps from testing. Every build should emit the hashes of the artifacts it produces and the build system should durably sign them along with the checksum of the git commit it was built from. If there need to be more transformations of the artifacts (packaging, etc.) it should happen as a separate deterministic build. Tests should run on a different machine than the one producing the signed build artifacts. Dropping privileges with SECCOMP for tests might be enough but it's also unlikely to be practical for existing tests that expect a normal environment.
I would more just say autogenerated artifacts should just be autogenerated by the build. Committing them doesn't really solve the problem. This is pretty much just a historical hangover in autotools where it targeted building on platforms where autotools wasn't installed, but it's not really a particularly relevant use-case anymore. (I do agree in general that autotools is bad. Especially on projects where a simple makefile is almost always sufficient and much more debuggable if autotools fails).
I don't think libsystemd is a particular problem. Or at least it being linked in only made the job of writing the exploit slightly easier: there's enough services running as root that will pull in a dependency like this that the compromise still exists, it just requires a few more hoops to jump through. And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency (if not for security, then simply for ease of building in a way which supports systemd notification but doesn't need anything else).
Dependencies are a liability, for sure, but I think a lot of the reaction there is not entirely helpful. At least, the size of the dependency tree in a package manager is only about as good a proxy for the risk as number of lines of code is for software project progress. Dependencies need to be considered, but not just minimised out of hand. There are plenty of risks on the reimplement-it-yourself side. The main thing to consider is how many people and who you are depending on, and who's keeping an eye on them. The latter part is something which is really lacking: the most obvious thing about these OSS vulnerabilites is that basically no-one is really auditing code at all, and if people are, they are not sharing the results. It should in principle be possible to apply the advantages of open-source to that as well, but it's real hard to set up the incentives to do it (anyone starting needs to do a lot to make it worthwhile).
> I would more just say autogenerated artifacts should just be autogenerated by the build.
There are practical and philosophical problems with this. From the practical point of view you generally want to make contributing (or even just building) your stuff as low friction as possible and having extra manual build steps (install tools X at version X1.X2.X3, Y at version Y1.Y2 and Z at version Z1.Z2rc2) isn't low friction.
Philosophically, you are just shifting the attack vector around, you now need to compromise one of tools X, Y and Z, which are probably less under your control than the artifacts they produce.
> And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency
People say this but I'm skeptical, this is the actual documentation of the protocol:
"These functions send a single datagram with the state string as payload to the socket referenced in the $NOTIFY_SOCKET environment variable. If the first character of $NOTIFY_SOCKET is "/" or "@", the string is understood as an AF_UNIX or Linux abstract namespace socket (respectively), and in both cases the datagram is accompanied by the process credentials of the sending service, using SCM_CREDENTIALS. If the string starts with "vsock:" then the string is understood as an AF_VSOCK address, which is useful for hypervisors/VMMs or other processes on the host to receive a notification when a virtual machine has finished booting. Note that in case the hypervisor does not support SOCK_DGRAM over AF_VSOCK, SOCK_SEQPACKET will be used instead. The address should be in the form: "vsock:CID:PORT". Note that unlike other uses of vsock, the CID is mandatory and cannot be "VMADDR_CID_ANY". Note that PID1 will send the VSOCK packets from a privileged port (i.e.: lower than 1024), as an attempt to address concerns that unprivileged processes in the guest might try to send malicious notifications to the host, driving it to make destructive decisions based on them."
So technically you have to support unix domain sockets, abstract namespace sockets, whatever SCM_CREDENTIALS is, whatever AF_VSOCK is and the SOCK_SEQPACKET note is completely obscure to me.
This attack used the fact that several distros patch OpenSSH to link to libsystemd for notifications. Libsystemd links liblzma, and the backdoor checks if it's been linked into OpenSSH's sshd process to run. Without distro maintainers linking libsystemd, xz wouldn't have been a useful target for attacking OpenSSH.
To clarify (1), we should not be exchanging tarballs, period. Regardless of whether it's different from what's in the source repository.
It's 2024, not 1994. If something masquerading as open-source software is not committed to, and built from, a publicly verifiable version-controlled repository, it might as well not exist.
There are a lot of bad to terrible takes here, ranging from hindsight 20/20 to borderline discriminatory:
3. The issue here has more to do with the generated tarball doesn't match source. You (i.e. distro owners) should be able to generate the tarball locally and compare with the generated artifact and compare. Autotools is just a scapegoat.
4. xz is used in a lot of places. Reducing dependencies is good, but trying to somehow say this is all systemd's fault, for depending on liblzma is not understanding the core issue here. The attacker could have found another dependency to social engineer into, or find a way to add dependencies and whatnot. It's very easy to say all these stuff in hindsight.
5. Again, I agree with you on principle that dependencies and complexity is a big issue and I always roll my eyes when people bring in 100's of dependencies, but xz is a pretty reputable project. I really really doubt someone would have raised an issue with adding liblzma or think that the build script would introduce a vulnerability like that. Again, a lot of hindsight talking here, instead of actually looking forward to how something like this could realistically be prevented. Too many dependencies are but it's not suddenly everyone will write their own compression libs.
6. Again, I mean, I don't disagree with you on principle but that is not the lesson from this particular incident. This may be your pet peeve but it wasn't like the integration with libsystemd would have raised anyone's alarm.
8. This is just a thinly veiled way of saying "don't work with anyone of Chinese descent". I don't want to use the R word but you know exactly what I mean. There's no evidence Jia Tan is Chinese anyway, or that this is done by China. We simply don't know right now, and as far as we know they could have used any western sounding name. The core issue here is that the trust was misplaced, and the overworked maintainer didn't try to make sure the other person is a real one (e.g. basic Googling). So what, if you don't work with any Chinese, if someone is called "Ryan Gosling" you automatically trust them?
The other problem is that C’s engineering culture is termites all the way down.
A test resource getting linked into a final build is, itself, a problem - the tooling should absolutely make this difficult, and transparent/obvious when it happens.
But that’s difficult because C never shed the “pile of bash scripts” approach to build engineering… and fundamentally it’s an uphill battle to engineering a reliable system out of a pile of bash scripts.
The oft-discussed problems with undefined behavior, obscure memory/aliasing rules, etc are just the obvious smoke. C is termites all the way down and really shouldn’t be used anymore, it’s just also Too Big To Fail. Like if the world’s most critical infrastructure had been built in PHP.
libsystemd is too juicy of a target, especially with the code reuse that does not appear to take into account these attack vectors.
Perhaps any reuse of libraries in sensitive areas like libsystemd should require a separate copy and more rigorous review? This would allow things like libxv to be 'reused', but the 'safe' versions would require a separate codebase that gets audited updates from the mainline.
But seriously, yes, I think I've seen people dismissing each one of those points. And now we have concrete proof they are real. The fact that somehow an scandal like this didn't happen before due to #1, 2, or 3 is almost incredible... on the meaning that a viable explanation is that somebody is suppressing knowledge somewhere.
Point 8 simply isn't going to happen. And that means that if you want secure OSS, you must pay somebody to look around and verify those things. And the problem with that is this means you are now into the software vendor political dump - anybody that gets big doing that is instantaneously untrustworthy.
Overall, my point is that we need some actual democratic governance on software. Because it's political by nature, and pushing for anarchy works just as well as with any other political body.
I think your eight point is regrettable but mostly true - I’d soften it to professional relationships, which kind of sucks for anyone trying to get started in the field who doesn’t get a job with someone established, and adds an interesting wrinkle to the RTO discussion since you might “work” with someone for years without necessarily knowing anything about them.
It also seems like we need some careful cultural management around trust: enshrine trust-but-verify pervasively to avoid focusing only on, say, Chinese H1-Bs or recent immigrants (whoops, spent all of your time on them and it turns out you missed the Mossad and Bulgarian hackers) and really doubling down on tamper-evidence, which also has the pleasant property of reducing the degree to which targeting OSS developers makes sense.
Combining your 7th point with that one, I’ve been wondering whether you could expand what happened with OpenSSL to have some kind of general OSS infrastructure program where everyone would pay to support a team which prioritizes supporting non-marquee projects and especially stuff like modernizing tool chains, auditing, sandboxing, etc. so basically any maintainer of something in the top n dependencies would have a trusted group to ask for help and be able to know that everyone on that team has gone through background checks, etc.
>4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.
This is ridiculous, nobody "encourages" every service to depend on it for initialization notifications, you can implement the logic in 10 lines of code or less.
And I think people should now look at all oddities with Valgrind. Since that is how the issue got discovered. And then look at the problematic library and look for similar outliers of fake personas taking over a project.
It seems it is common practice for people to ignore these errors.
Do you remember the Debian openssl flaw? Okay, it's almost 20 years old now, so you may have forgotten, or you could be too young, I don't know. But it was caused by someone attempting to fix an oddity found by valgrind.
Valgrind will tell you about memory leaks and won't always behave the way it did here when there's a backdoor. In this case it just so happened that valgrind was throwing errors because the stack layout didn't match what the exploit was expecting. Otherwise valgrind would have probably worked without issues.
I’m guessing the original maintainer of xz handed responsibilities to Jia Tan without ever seeing him/her or at least sharing a phone call. Is that common to only communicate only through email/github? I guess some maintainers of open source projects will be more cautious after this story.
> Is that common to only communicate only through email/github?
Absolutely. I've both taken over libraries as a maintainer and given away the responsibility of maintaining a library after only communicating via text, and having no idea who the "real" person is.
> I guess some maintainers of open source projects will be more cautious after this story.
Which is completely the wrong takeaway. It's not the maintainer who is responsible for what people end up pulling into their project, it's up to the people who work on the project. Either you trust the maintainer, or you don't, and when you start to depend on a library, you're implicitly signing up for updating yourself on who you are trusting. For better or worse.
Trusting the maintainer als means trusting that they won't hand over the project to someone untrustworthy. It is the maintainers responsibility to honor that trust if they want their software to be used in the first place.
That’s basically how it is right now. Millions of companies freeloading off the work of unpaid open source developers. Unsurprisingly they sometimes leave and it causes problems.
> Is that common to only communicate only through email/github?
Yes. I’ve joined half a dozen open-source projects of various sizes (from 100 to 30k stars on GitHub) without ever calling anyone; written communication is the standard.
If you’re being berated by multiple people as to your speed of delivery, then it is not unexpected for them to be convinced that they are somehow the problem, and transfer the project to whoever they feel at the time is the best choice without thinking through their decisions.
However, knowing a person personally doesn’t necessarily solve the problem.
I used to work on an open source project a long time ago (under a pseudonym) that I do not wish to name here for reasons that’ll become clear shortly. The lead programmer had a co-maintainer who the lead seemed to have known quite well.
The co-maintainer constantly gaslit me, and later, other maintainers, belittled them, criticized them for the smallest of bugs etc. (and not in a Linus Torvalds way, where the rants are educational if you remove the insults) until they left; and was egged on by the lead maintainer as they agreed with the technical substance of these arguments.
Many years later, the co-maintainer attempted a hostile takeover of the project, which did not go as expected, and soon after, multiple private correspondences with other people became public where it became clear that the co-maintainer always wanted to do this, and gaslighting other maintainers was just part of this goal. All of this, despite the fact that the two of them knew each other.
He wouldn’t be able to do more than that if publicity were expected from core maintainers. Maybe he is trying to do the exact same thing with another project at this very moment.
They did communicate off list and non publicly, that's as much as we know at the moment.
As an open source developer he might have received donations too from the adversary - it's reasonably common for devs to get donations to "say thanks". He might have had voice chats with them, who knows. The emails might be with LEO at the moment but I think its in the public interest for all communications to be released.
If you look at their early commit history, "Jia Tan" was always a devious actor.
It's easy to think that they would just have made a video call, but it is a lot harder to lie convincingly over sync videochat than over async text. And a lot harder still to lie in person, and esp over multiple meetings.
Not to say it's impossible, people get scammed in person all the time! But it raises the bar, for sure.
Adding on to that, it might be difficult to differentiate between people from China vs Taiwan/Singapore/etc and since people are generally anonymous online, they can use any name they want
I guess the blame is on the people who decide to depend on a very small (by team size at least) project: https://xkcd.com/2347/ . While having plenty of safer alternatives.
Lets suppose I create a personal and hobby project. Suddenly RedHat, Debian, Amazon, Google... you name it, decide to put my project as a fundamental dependency of their toolchain, without giving me at least some support in the form of trustable developers. The more cautious I would be is to shut down the project entirely or abandon it, but more probably I would have fallen to Jia Tan tricks.
Also, the phone call and even a face to face meeting wouldn't give you extra security. In what scenario a phone conversation with Jia would expose him, or would make you suspicious enough to not delegate?
So while everyone thinks this backdoor was caught early, its purpose might have been achieved already. Especially if those targets were developers who used rolling release distros, like Kali and Debian.
This might be possible. I picked up some SSH traffic earlier in the week, and didn't think much of it at the time. Of course, this could also be a red herring. https://www.nubi-network.com/news.php?id=21
This would surely fall into the category of "there would be ways around it, so why bother?" that triggers a "by obscurity" reflex in many, but I'd consider it reduced attack surface.
However, there are multiple security contexts at play in an operating system; with regards to the XZ backdoor it’s mostly about capability based security at the module level, but you also have capabilities at the program level, and isolation at the memory level (paging), isolation at the micro architectural level, and so on. Ensuring all of these elements work together while still delivering performance seems to be rather challenging, and it’s definitely not possible for the Unix-likes to make a move to such a model because of the replacement of the concept of processes.
The actual hook for intercepting the call was done via audit hooks.
So I guess it's really two things working together.
There's an excellent technical breakdown of the backdoor *injection process here: https://research.swtch.com/xz-script
Dead Comment
Deleted Comment
Arch is popular with a niche group of end users, but that ain't what most enterprise architectures are working on.
https://github.com/QubesOS/qubes-issues/issues/9067#issuecom...
Suse is RPM based, but don’t remember whether the check was for the utilities or another method — Suse uses zypper for package management, as opposed to yum/dnf on the far more popular RedHat-based distros, so it depends how the exploit checked.
Deleted Comment
Just look at any critical, yet largely unknown codebase with very few maintainers.
1. Source distribution tarballs that contain code different from what's in the source repository are bad, we should move away from them. The other big supply chan attack (event-stream) also took advantage of something similar.
1a. As a consequence of (1) autogenerated artifacts should always be committed.
2. Autogenerated artifacts that everyone pagedowns over during code reviews is a problem. If you have this type of stuff in your repository also have an automatic test that checks that nobody tampered with it (it will also keep you from having stale autogenerated files in your repository).
3. A corollary of (1) and (2) is that autotools is bad and the autotools culture is bad.
4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.
5. In general there's a culture that code reuse is always good, that depending on large libraries for small amounts of functionality is good. This is not true, dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in.
6. Distro maintainers applying substantial patches to packages is a problem, it creates widely used de facto forks for libraries and applications that do not have real maintainers looking at them.
7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.
8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.
That won't help. There's no evidence that Jia Tan is a real name, or even a real person for that matter. If projects stop accepting contributions from asian-sounding names, the next attack will just use Richard Jones as a name.
Like, meeting someone at several dev conferences should be a requirement at the very least.
The problem with any social test is that it’s biased by default towards whomever is controlling access
I couldn't agree more. Coming from the BSD world, systemd is a shock to the system; it's monstrous and has tendrils everywhere.
Besides, the gpp is incorrect: systemd dependencies are not needed for initialisation notifications.
8. Consumers are naive, yes. But the software industry itself is naive about the security threat.
9. The social exploit is part of the code exploit.
10. The FOSS axiom "More Eyes On The Code" works, but only if the "eyes" are educated. FOSS needs material support from industry. A MSFT engineer caught this exploit, but it still was released to G.A. in Fedora 41, openSUSE, and Kali.
11. The dev toolchain and testing process were never conceived to test for security. (edit: Also see Solarwinds [1] )
= = =
[1] _ https://www.wired.com/story/the-untold-story-of-solarwinds-t...
That's the whole problem right there: lack of eyes on the code. If this code was actually maintained by more than one person, there's a high chance one of them would have caught on to it.
One thing that could help with this is if somebody points an LLM at all these foundational repositories, prompted with "does this code change introduce any security issues?".
There can be many reasons to include binary blobs in a release archive. Game resources, firmware images, test cases. There was today a comment that mpv includes parts of media files generated with proprietary encoders as test cases. That's good, not bad.
The well maintained library sqlite is everywhere, and has an excellent test suite. They release not one but two tarballs with every release, for different stages of compilation. It would be trivial to stop doing this, but it would make maintaining packages more work, which does nothing to improve security.
The reason Debian builds from curated tarballs are because they are curated by a human, and signed with a well known key. They could certainly build from git instead. But would that improve the situation? Not all projects sign their release tags. And for those that do, it is more likely to be automated. We the collective want changes to be vetted first by the upstream maintainer, then by the package maintainer, and would prefer these entities to be unrelated.
This time the process was successfully attacked by a corrupt upstream maintainer, but that does not mean we should do away with upstream maintainers. Several backdoor attempts have been stopped over the years by this arrangement and that process is not something we should throw away without careful consideration.
The same improvements we have been talking about for years must continue: We should strive for more reproducible builds. We should strive for lower attack surface and decrease build complexity when possible. We should trust our maintainers, but verify their work.
They never did. In fact the systemd maintainers are confused on that point and adding documentation on how to implement the simple datagram without libsystemd.
>7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.
Way more than tens of millions. Python, php, ruby and many other languages depend on libxml2, libxml2 uses liblzma. And there's many other dependencies.
>8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.
Not any maintainer's job. OSS is provided without warranty. Also indication is "Jia Tan" may have been completely fake as their commit timestamps show even on the same day that their timezone switches from Eastern Europe to Asia. So at the very least, they were playing identity games.
Why don't object files and binaries count as autogenerated artifacts? Should we commit those to the repo too? Where is the line between an artifact that should be committed, and one that shouldn't be?
> 4. Libsystemd is a problem for the ecosystem.
libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?
Absolutely yes. And also the size of the kernel.
Those two currently have a much better guaranteed quality than systemd, thus systemd is a much more pressing issue. But they don't stop being a problem just because they are not the largest one.
I'd say anything that is input to the compiler should be.
> libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?
Yes, the libnss stuff is also a problem.
IMO yes. I definitely believe having basic common functionality (malloc, printf, memcpy etc.) provided by one library with all the crazy/obscure stuff that very few people need or want somewhere else would be an improvement.
There are so many bugs like the "added dot to disable landlock" added as part of this action (which can also be typos [0]), not to mention that relying on some tools in autoconf to set feature flags will just disable them if those tools are not present [1].
[0] https://twitter.com/disconnect3d_pl/status/17744965092596453...
[1] https://twitter.com/disconnect3d_pl/status/17747470223623252...
Doing haphazard feature detection by test compiling random hand written C programs in a giant sometimes autogenerated configure script is an icon of everything wrong with Unix.
This hack shows that the haphazard mess of configure isn’t just ugly. It’s also a pathway for malicious people to sneak backdoors into our projects and our computers. It’s time to move on.
Or… just have downstream users run autotools as part of the build?
See point 3:
> 3. A corollary of (1) and (2) is that autotools is bad and the autotools culture is bad.
I philosophically and fundamentally hate this suggestion, but have to agree with it. It's going to make porting harder, but is sadly a cost worth paying.
> dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in
Tough call. A major library is more likely to be bug fixed and tuned than something you write (which is a good reason to use them which is what makes them attractive as an attack vector). Getting this right requires taste and experience. The comment says "depending on large libraries for small amounts of functionality [is bad but thought to be good]". What constitutes "small amount" vs large requires experience. Certainly cases of this tip my bias towards re-implement vs re-use.
With regard to 1, there are some other practical steps to take. Use deterministic builds and isolate the compilation and linking steps from testing. Every build should emit the hashes of the artifacts it produces and the build system should durably sign them along with the checksum of the git commit it was built from. If there need to be more transformations of the artifacts (packaging, etc.) it should happen as a separate deterministic build. Tests should run on a different machine than the one producing the signed build artifacts. Dropping privileges with SECCOMP for tests might be enough but it's also unlikely to be practical for existing tests that expect a normal environment.
I don't think libsystemd is a particular problem. Or at least it being linked in only made the job of writing the exploit slightly easier: there's enough services running as root that will pull in a dependency like this that the compromise still exists, it just requires a few more hoops to jump through. And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency (if not for security, then simply for ease of building in a way which supports systemd notification but doesn't need anything else).
Dependencies are a liability, for sure, but I think a lot of the reaction there is not entirely helpful. At least, the size of the dependency tree in a package manager is only about as good a proxy for the risk as number of lines of code is for software project progress. Dependencies need to be considered, but not just minimised out of hand. There are plenty of risks on the reimplement-it-yourself side. The main thing to consider is how many people and who you are depending on, and who's keeping an eye on them. The latter part is something which is really lacking: the most obvious thing about these OSS vulnerabilites is that basically no-one is really auditing code at all, and if people are, they are not sharing the results. It should in principle be possible to apply the advantages of open-source to that as well, but it's real hard to set up the incentives to do it (anyone starting needs to do a lot to make it worthwhile).
There are practical and philosophical problems with this. From the practical point of view you generally want to make contributing (or even just building) your stuff as low friction as possible and having extra manual build steps (install tools X at version X1.X2.X3, Y at version Y1.Y2 and Z at version Z1.Z2rc2) isn't low friction.
Philosophically, you are just shifting the attack vector around, you now need to compromise one of tools X, Y and Z, which are probably less under your control than the artifacts they produce.
> And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency
People say this but I'm skeptical, this is the actual documentation of the protocol:
"These functions send a single datagram with the state string as payload to the socket referenced in the $NOTIFY_SOCKET environment variable. If the first character of $NOTIFY_SOCKET is "/" or "@", the string is understood as an AF_UNIX or Linux abstract namespace socket (respectively), and in both cases the datagram is accompanied by the process credentials of the sending service, using SCM_CREDENTIALS. If the string starts with "vsock:" then the string is understood as an AF_VSOCK address, which is useful for hypervisors/VMMs or other processes on the host to receive a notification when a virtual machine has finished booting. Note that in case the hypervisor does not support SOCK_DGRAM over AF_VSOCK, SOCK_SEQPACKET will be used instead. The address should be in the form: "vsock:CID:PORT". Note that unlike other uses of vsock, the CID is mandatory and cannot be "VMADDR_CID_ANY". Note that PID1 will send the VSOCK packets from a privileged port (i.e.: lower than 1024), as an attempt to address concerns that unprivileged processes in the guest might try to send malicious notifications to the host, driving it to make destructive decisions based on them."
So technically you have to support unix domain sockets, abstract namespace sockets, whatever SCM_CREDENTIALS is, whatever AF_VSOCK is and the SOCK_SEQPACKET note is completely obscure to me.
It's 2024, not 1994. If something masquerading as open-source software is not committed to, and built from, a publicly verifiable version-controlled repository, it might as well not exist.
3. The issue here has more to do with the generated tarball doesn't match source. You (i.e. distro owners) should be able to generate the tarball locally and compare with the generated artifact and compare. Autotools is just a scapegoat.
4. xz is used in a lot of places. Reducing dependencies is good, but trying to somehow say this is all systemd's fault, for depending on liblzma is not understanding the core issue here. The attacker could have found another dependency to social engineer into, or find a way to add dependencies and whatnot. It's very easy to say all these stuff in hindsight.
5. Again, I agree with you on principle that dependencies and complexity is a big issue and I always roll my eyes when people bring in 100's of dependencies, but xz is a pretty reputable project. I really really doubt someone would have raised an issue with adding liblzma or think that the build script would introduce a vulnerability like that. Again, a lot of hindsight talking here, instead of actually looking forward to how something like this could realistically be prevented. Too many dependencies are but it's not suddenly everyone will write their own compression libs.
6. Again, I mean, I don't disagree with you on principle but that is not the lesson from this particular incident. This may be your pet peeve but it wasn't like the integration with libsystemd would have raised anyone's alarm.
8. This is just a thinly veiled way of saying "don't work with anyone of Chinese descent". I don't want to use the R word but you know exactly what I mean. There's no evidence Jia Tan is Chinese anyway, or that this is done by China. We simply don't know right now, and as far as we know they could have used any western sounding name. The core issue here is that the trust was misplaced, and the overworked maintainer didn't try to make sure the other person is a real one (e.g. basic Googling). So what, if you don't work with any Chinese, if someone is called "Ryan Gosling" you automatically trust them?
---
I do agree with point 7.
The only way to armor yourself is to have consistent policies. Would this have happened if there were code reviews and testing?
Consistency is key. At my workplace we routinely bypass branch protections, but we're only responsible for a few customers.
A test resource getting linked into a final build is, itself, a problem - the tooling should absolutely make this difficult, and transparent/obvious when it happens.
But that’s difficult because C never shed the “pile of bash scripts” approach to build engineering… and fundamentally it’s an uphill battle to engineering a reliable system out of a pile of bash scripts.
The oft-discussed problems with undefined behavior, obscure memory/aliasing rules, etc are just the obvious smoke. C is termites all the way down and really shouldn’t be used anymore, it’s just also Too Big To Fail. Like if the world’s most critical infrastructure had been built in PHP.
Perhaps any reuse of libraries in sensitive areas like libsystemd should require a separate copy and more rigorous review? This would allow things like libxv to be 'reused', but the 'safe' versions would require a separate codebase that gets audited updates from the mainline.
But seriously, yes, I think I've seen people dismissing each one of those points. And now we have concrete proof they are real. The fact that somehow an scandal like this didn't happen before due to #1, 2, or 3 is almost incredible... on the meaning that a viable explanation is that somebody is suppressing knowledge somewhere.
Point 8 simply isn't going to happen. And that means that if you want secure OSS, you must pay somebody to look around and verify those things. And the problem with that is this means you are now into the software vendor political dump - anybody that gets big doing that is instantaneously untrustworthy.
Overall, my point is that we need some actual democratic governance on software. Because it's political by nature, and pushing for anarchy works just as well as with any other political body.
It also seems like we need some careful cultural management around trust: enshrine trust-but-verify pervasively to avoid focusing only on, say, Chinese H1-Bs or recent immigrants (whoops, spent all of your time on them and it turns out you missed the Mossad and Bulgarian hackers) and really doubling down on tamper-evidence, which also has the pleasant property of reducing the degree to which targeting OSS developers makes sense.
Combining your 7th point with that one, I’ve been wondering whether you could expand what happened with OpenSSL to have some kind of general OSS infrastructure program where everyone would pay to support a team which prioritizes supporting non-marquee projects and especially stuff like modernizing tool chains, auditing, sandboxing, etc. so basically any maintainer of something in the top n dependencies would have a trusted group to ask for help and be able to know that everyone on that team has gone through background checks, etc.
This is ridiculous, nobody "encourages" every service to depend on it for initialization notifications, you can implement the logic in 10 lines of code or less.
Deleted Comment
Dead Comment
Thanks, Microsoft, I like Azure now.
It seems it is common practice for people to ignore these errors.
https://blogs.fsfe.org/tonnerre/archives/24
Deleted Comment
Absolutely. I've both taken over libraries as a maintainer and given away the responsibility of maintaining a library after only communicating via text, and having no idea who the "real" person is.
> I guess some maintainers of open source projects will be more cautious after this story.
Which is completely the wrong takeaway. It's not the maintainer who is responsible for what people end up pulling into their project, it's up to the people who work on the project. Either you trust the maintainer, or you don't, and when you start to depend on a library, you're implicitly signing up for updating yourself on who you are trusting. For better or worse.
Yes. I’ve joined half a dozen open-source projects of various sizes (from 100 to 30k stars on GitHub) without ever calling anyone; written communication is the standard.
However, knowing a person personally doesn’t necessarily solve the problem.
I used to work on an open source project a long time ago (under a pseudonym) that I do not wish to name here for reasons that’ll become clear shortly. The lead programmer had a co-maintainer who the lead seemed to have known quite well.
The co-maintainer constantly gaslit me, and later, other maintainers, belittled them, criticized them for the smallest of bugs etc. (and not in a Linus Torvalds way, where the rants are educational if you remove the insults) until they left; and was egged on by the lead maintainer as they agreed with the technical substance of these arguments.
Many years later, the co-maintainer attempted a hostile takeover of the project, which did not go as expected, and soon after, multiple private correspondences with other people became public where it became clear that the co-maintainer always wanted to do this, and gaslighting other maintainers was just part of this goal. All of this, despite the fact that the two of them knew each other.
As an open source developer he might have received donations too from the adversary - it's reasonably common for devs to get donations to "say thanks". He might have had voice chats with them, who knows. The emails might be with LEO at the moment but I think its in the public interest for all communications to be released.
- Jia Tan was initially a trustworthy actor that subsequently became malicious (maybe they were paid or compromised somehow)
- Jia Tan was always malicious, but played the long game by starting with legitimate contributions/intent for 1-2 years
How would meeting them for real have any impact?
It's easy to think that they would just have made a video call, but it is a lot harder to lie convincingly over sync videochat than over async text. And a lot harder still to lie in person, and esp over multiple meetings.
Not to say it's impossible, people get scammed in person all the time! But it raises the bar, for sure.
Suppose you have a chat with them and see that they're Chinese. What are your next actions? If you exclude them then that's racist right?
I don't have answers
Lets suppose I create a personal and hobby project. Suddenly RedHat, Debian, Amazon, Google... you name it, decide to put my project as a fundamental dependency of their toolchain, without giving me at least some support in the form of trustable developers. The more cautious I would be is to shut down the project entirely or abandon it, but more probably I would have fallen to Jia Tan tricks.
Also, the phone call and even a face to face meeting wouldn't give you extra security. In what scenario a phone conversation with Jia would expose him, or would make you suspicious enough to not delegate?
What are xz's safer alternatives? And how do you make sure of that?