Its always an interesting dynamic: assuming a high trust society pays dividends - Python would be nowhere close the success it has been without PyPI.
But then success attracts trust abusers and forces raising the fences (which comes with higher costs, both direct and indirect).
Direct costs in the people and infrastructure that must be dedicated to the task. Indirect costs in the frictions generated by complicating workflows.
It all points to the need for open source ecosystems to be taken more seriously by the economically able users who most benefit from this amazing development.
> The one project cleared was a project containing obfuscated code, in violation of the PyPI Acceptable Use Policy.
Interesting, I didn’t know that. While I haven’t released anything obfuscated on PyPI, I’ve certainly written Python projects that include obfuscated code by necessity, namely scrapers packing duktape (embedded JS interpreter) and third party obfuscated JS blobs to generate signatures and stuff. I know for a fact there are projects like that on PyPI. I wonder if those are allowed.
(Come to think of it, those probably can be DMCAed if the targeted service provider is sufficiently motivated.)
The still don't even have a way to avoid dependency confusion attacks when using private package repos (other than also registering every single private package name you use on pypi.org). Blows my mind.
Who is "they"? PyPI is an index; it doesn't control your installing client.
(This is a larger issue - or feature, depending on your perspective - with Python packaging. But it's important to understand that PyPI itself can't force `pip` or any other client to pick any particular resolution order between indices.)
If you're concerned about dependency confusion attacks you should host your own index and vet what goes on to it.
But there is a better solution coming, PEP 708 was developed for this and is in prototype on pypi.org, so it's an overstatement to say "don't even have a way to avoid dependency confusion attacks ".
It is, however, a non-trivial problem, and more solutions will likely come over the years, many Python packaging tools like uv and poetry (and likely others) have way to name indexes and pin specific packages to indexes, which appears to be a promising UX.
Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.
Quarantining would prevent anyone from building / installing new copies of the compromised software, so this utility would only help people who were a) monitoring the project, and b) had a local version installed pre-quarantine. That's a pretty narrow scope of users, so now that I type all this out, I'm realizing that the juice is likely not worth the squeeze.
One of my responsibilities is software supply chain security in a financial services org, so this signal would be valuable for vulnerability management of dependencies. I wouldn't call it "threat hunting" per se, but ground truth around threat actor patterns helps us build better defensive systems in this regard. Keeping the bad bits out is way easier than remediating once they've been ingested into systems.
> Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.
It's not a bad idea, let Github know! Their security team is very good from my interactions with them.
Given how widespread PyPI usage is, I'm surprised they only have one full time security staff. I mean I guess it makes sense, usage doesn't always mean they get more donations/money, but damn.
the fact that `pip install` just runs whatever is in `setup.py` is still mind baffling, even if the author weren't mallicious the `setup.py` can still do harm (say delete a file by mistake), there really needs to be an official way of sandbox its running.
Note that it's possible to disable that behavior with `pip install --only-binary :all:`.
This way, pip will fail if a dependency does not provide a `.whl` package, instead of automatically falling back to the "build from source" mode that can lead to arbitrary code execution at install time (via setuptools' `setup.py` or any other build backend mechanism).
However, installing from wheels just protects from arbitrary code execution at install time. If you do not trust the source and integrity of the package you install, you would still be subject to arbitrary code execution at import time.
Therefore, tools and processes to improve package provenance tracing and integrity checking are useful for both kinds of installations.
I think sometimes the problem is coming from accidental typos instead of not trusting, say if one accidentally typed `pip install requests` into `pip install requestss` and if `requestss` is malacious then by the time one noticed the typo the setup.py could have already run to do the harm
It's not good, but it should also not be baffling: it's the exact same thing other ecosystems do (npm with install hooks/scripts, Rust with build.rs, Ruby with gemspecs, etc).
Quarantining projects is just a band-aid. If you’re worried about malware, maybe stop letting random people upload code to the official package index. Or just write better docs so people stop using random packages in the first place.
But then success attracts trust abusers and forces raising the fences (which comes with higher costs, both direct and indirect).
Direct costs in the people and infrastructure that must be dedicated to the task. Indirect costs in the frictions generated by complicating workflows.
It all points to the need for open source ecosystems to be taken more seriously by the economically able users who most benefit from this amazing development.
Interesting, I didn’t know that. While I haven’t released anything obfuscated on PyPI, I’ve certainly written Python projects that include obfuscated code by necessity, namely scrapers packing duktape (embedded JS interpreter) and third party obfuscated JS blobs to generate signatures and stuff. I know for a fact there are projects like that on PyPI. I wonder if those are allowed.
(Come to think of it, those probably can be DMCAed if the targeted service provider is sufficiently motivated.)
Dead Comment
(This is a larger issue - or feature, depending on your perspective - with Python packaging. But it's important to understand that PyPI itself can't force `pip` or any other client to pick any particular resolution order between indices.)
The PyPI and Pip developers of course.
But there is a better solution coming, PEP 708 was developed for this and is in prototype on pypi.org, so it's an overstatement to say "don't even have a way to avoid dependency confusion attacks ".
It is, however, a non-trivial problem, and more solutions will likely come over the years, many Python packaging tools like uv and poetry (and likely others) have way to name indexes and pin specific packages to indexes, which appears to be a promising UX.
Quarantining would prevent anyone from building / installing new copies of the compromised software, so this utility would only help people who were a) monitoring the project, and b) had a local version installed pre-quarantine. That's a pretty narrow scope of users, so now that I type all this out, I'm realizing that the juice is likely not worth the squeeze.
> Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.
It's not a bad idea, let Github know! Their security team is very good from my interactions with them.
This way, pip will fail if a dependency does not provide a `.whl` package, instead of automatically falling back to the "build from source" mode that can lead to arbitrary code execution at install time (via setuptools' `setup.py` or any other build backend mechanism).
However, installing from wheels just protects from arbitrary code execution at install time. If you do not trust the source and integrity of the package you install, you would still be subject to arbitrary code execution at import time.
Therefore, tools and processes to improve package provenance tracing and integrity checking are useful for both kinds of installations.
In their FAQ[1], they mention that they have plans to expand to PyPI as well.
[1]: https://docs.socket.dev/docs/faq