Could the XZ backdoor been detected with better Git/Deb packaging practices?

I believe the XZ compromise partly stemmed from including binary files in what should have remained a source-only project. From what I remember, well-run projects such as those of the GNU project have always required that all binaries—whether executables or embedded data such as test files—be built directly from source, compiling a purpose-built DSL if necessary. This ensures transparency and reproducibility, both of which might have helped catch the issue earlier.

dijit · 4 months ago

thats not the issue, there will always be prebuilt binaries (hell, deb/rpm are prebuilt binaries).

The issue for xz was that the build system was not hermetic (and sufficiently audited).

Hermitic build environments that can’t fetch random assets are a pain to maintain in this era, but are pretty crucial in stopping an attack of this kind. The other way is reproducible binaries, which is also very difficult.

EDIT: Well either I responded to the wrong comment or this comment was entirely changed. I was replying to a comment that said. “The issue was that people used pre-built binaries” which is materially different to what the parent now says, though they rhyme.

jacquesm · 4 months ago

This is not going to be popular: I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken. It invites all kinds of trouble and makes it next to impossible to really achieve stability and to ensure that all code that is in the build has been verified. Because after you've done it four times the fifth time you won't be looking closely. But if you don't do it automatically but only when you actually need it you will be looking a lot more sharpish at what has changed since you last pulled in the code. Especially for older and stable libraries the consumers should dictate when they upgrade, not some automatic build process. But because we're all conditioned to download stuff because it may have solved some security issue we stopped to think about the security issues associated with just downloading stuff and dumping it into the build process.

mananaysiempre · 4 months ago

The XZ project’s build system is and was hermetic. The exploit was right there in the source tarball. It was just hidden away inside a checked-in binary file that masqueraded as a test for handling of invalid compressed files.

(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)

acka · 4 months ago

My apologies: yes, I edited my comment to try and clarify that I did not mean executable binaries, but rather binary data, such as the test files in the case of XZ.

1oooqooq · 4 months ago

how do you test your software can decompress files created with old/different implementations?

the exploit used the only solution for this problem: binary test payload. there's no other way to do it.

maybe including the source to those versions and all the build stuff to then create them programmatically... or maybe even a second repo that generates signed payloads etc... but its all overkill and would have failed human attention as the attack proved to begin with.

huflungdung · 4 months ago

This was a devops exploit because they used the same env for building the app as they did for the test code. Many miss this entirely and think it is because a binary was shipped.

Ideally a test env and a build env should be entirely isolated should the test code some how modify the source. Which in this case it did.

How did the changes in the binary test files tests/files/bad-3-corrupt_lzma2.xz and tests/files/good-large_compressed.lzma, and the makefile change in m4/build-to-host.m4) manifest to the Debian maintainer? Was there a chance of noticing something odd?

Groxx · 4 months ago

mostly no, from my reading - it was a multi-stage chain of relatively normal looking things that added up to an exploit. helped by the tests involved using compressed data that wasn't human-readable.

you can of course come up with ways it could have been caught, but the code doesn't stand out as abnormal in context. that's all that really matters, unless your build system is already rigid enough to prevent it, and has no exploitable flaws you don't know about.

finding a technical overview is annoyingly tricky, given all the non-technical blogspam after it, but e.g. https://securelist.com/xz-backdoor-story-part-1/112354/ looks pretty good from a skim.

sanjams · 4 months ago

The article references a technical write-up: https://research.swtch.com/xz-script

XorNot · 4 months ago

Compression algorithms are deterministic over fixed data though (possibly with some effort).

There's no good reason to have opaque, non generated data in the repository and it should certainly be a red flag going forwards.

dhx · 4 months ago

There's a few obvious gaps, seemingly still unsolved today:

1. Build environments may not be adequately sandboxed. Some distributions are better than others (Gentoo being an example of a better approach). The idea is that the package specification specifies the full list of files to be downloaded initially into a sandboxed build environment, and scripts in that build environment when executed are not able to then access any network interfaces, filesystem locations outside the build environment, etc. Even within a build of a particular software package, more advanced sandboxing may segregate test suite resources from code that is built so that a compromise of the test suite can't impact built executables, or compromised documentation resources can't be accessed during build or eventual execution of the software.

2. The open source community as a whole (but ultimately in the hands of distribution package maintainers) are not being alerted to and apply caution for unverified high entropy in source repositories. Similar in concept to nothing-up-my-sleeve numbers.[1] Typical examples of unverified high entropy where a supply chain attack can hide payload: images, videos, archives, PDF documents etc in test suites or bundled with software as documentation and/or general resources (such as splash screens in software). It may also include IVs/example keys in code or code comments, s-boxes or similar matrices or arrays of high entropy data which may not be obvious to human reviewers how the entropy is low (such as a well known AES s-box) rather than high and potentially undifferentiated from attacker shellcode. Ideally when a package maintainer goes to commit a new package or package update, are they alerted to unexplained high entropy information that ends up in the build environment sandbox and required to justify why this is OK?

[1] https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

1970-01-01 · 4 months ago

>Can we trust open source software? Yes — and I would argue that we can only trust open source software.

But should we trust it? No!! That's why we're here!

I'm not satisfied with the author's double-standard-conclusion. Trust, but verify does not have some kind of hall pass for OSS "because open-source is clearly better."

Trust, but verify is independent of the license the coders choose.

rcxdude · 4 months ago

Yes, I would say that being able to view the source code and build it yourself is a necessary but not sufficient condition of properly trusting the software. (which is not quite the same thing as it being open source, but it's relatively rare outside of being a very big customer that you can do this for non-open-source code).

What does it matter if you are able to build it all by yourself if you still don't catch the compromised code? That's what is happening here in reality. OSS is now a layer of safety that is being leveraged into a layer of compromise. Caveat emptor!

normie3000 · 4 months ago

> I would say that being able to view the source code and build it yourself is a necessary but not sufficient condition of properly trusting the software.

And certainly a condition of the "verify" step?

With closed-source software, you can (almost) _only_ trust.

1718627440 · 4 months ago

When you get the source code as a big costumer, that is open source. It might even be free software.

octoberfranklin · 4 months ago

Yes of course, and nixpkgs (nixos) already does, although unfortunately not for this particular package.

The XZ backdoor was possible because people stick generated code (autoconf's output), which is totally impractical to audit, into the source tarballs.

In nixpkgs, all you have to do is add `autoreconfHook` to the `nativeBuildInputs` and all that stuff gets regenerated at build time. Sadly this is not the default behavior yet.

ape4 · 4 months ago

Wouldn't the next malware use a different way to embed itself

xmodem · 4 months ago

Why would they bother if we don't act on any of the learnings from this one?

citbl · 4 months ago

The next one probably won't be caught by running noticeably slower than usual.

It was a pure fluke that it got discovered _this early_.

ApolloFortyNine · 4 months ago

If I remember correctly it's days were numbered as soon as that redhat bug report on the valgrind errors piling up was made.

They weren't the ones to find the cause first (that's the person who took a deeper look due to the slowness), but the red flags had been raised.

bluGill · 4 months ago

Maybe - but original ideas are hard, ideas without flaws are rare: there are reasonable odds someone will try this again.

flerchin · 4 months ago

> As of today only 93% of all Debian source packages are tracked in git on Debian’s GitLab instance at salsa.debian.org. Some key packages such as Coreutils and Bash are not using version control at all

This bends my brain a little. I get that they were written before git, but not before the advent of version control.

NewJazz · 4 months ago

Specifically the packaging is not in version control. The actual software is, but the Debian maintainer for whatever reason doesn't use source control for their packaging.

goodpoint · 4 months ago

The author is incorrect. Keeping the packaging files under git is done out of convenience but it does not help for security and reproducibility.

The packages uploaded in Debian are what matters and they are versioned.

crote · 4 months ago

And how are you supposed to verify that the right packages have been uploaded?

The easiest way to verify that is by using a reproducible automated pipeline, as that moves the problem to "were the packaging files tampered with".

How do you verify the packaging files? By making them auditable by putting them in a git repository, and for example having the packager sign each commit. If a suspicious commit slips in, it'll be immediately obvious to anyone looking at the logs.

simoncion · 4 months ago

> I get that they were written before git, but not before the advent of version control.

  git clone https://git.savannah.gnu.org/git/bash.git
  git clone https://git.savannah.gnu.org/git/coreutils.git

Plug the repo name into https://savannah.gnu.org/git/?group=<REPO_NAME> to get a link to browse the repo.

ottoke · 4 months ago

This is the upstream source control. The article talks about the Debian packaging source not being in git (on e.g. salsa.debian.org).

kryptiskt · 4 months ago

Look at the commit log in the bash repo. What good does it do if it notionally is version controlled if the commits look like this:

    2025-07-03 Bash-5.3 distribution sources and documentation bash-5.3 Chet Ramey 896 -103357/+174007

typpilol · 4 months ago

Also why couldn't they start using it now?

aborsy · 4 months ago

Couldn’t the submission to the Debian be possible only under real identities so that people take responsibility for what they submit?

A random person or group nobody has ever seen or knows submitted a backdoor.

0rdinal · 4 months ago

1. How could Debian effectively verify an identity?

2. Some people may want to remain pseudonymous for legitimate reasons.

It’s not straightforward.

The developers (at least important ones) could register with Debian project, just like they would with a company: submit identity and government documents, proof of physical address, bank account, credit card information, IdP account, .. It would operate like an organization.

The lead developers could meet and know each other through regular meetings. Kind of web of trust with in person verification. There are already online meetings in some projects.

sega_sai · 4 months ago

From reading this, it seems that one thing one can do is to be force separation of the build from testing, so the build never has access to binary code that can be injected.