PyPI Blog: Project Quarantine

The still don't even have a way to avoid dependency confusion attacks when using private package repos (other than also registering every single private package name you use on pypi.org). Blows my mind.

woodruffw · 8 months ago

Who is "they"? PyPI is an index; it doesn't control your installing client.

(This is a larger issue - or feature, depending on your perspective - with Python packaging. But it's important to understand that PyPI itself can't force `pip` or any other client to pick any particular resolution order between indices.)

LtWorf · 8 months ago

For all intents and purposes "pip" is the official client. It is referenced in the official documentation https://docs.python.org/3/installing/index.html

IshKebab · 8 months ago

> Who is "they"?

The PyPI and Pip developers of course.

oblvious-earth · 8 months ago

If you're concerned about dependency confusion attacks you should host your own index and vet what goes on to it.

But there is a better solution coming, PEP 708 was developed for this and is in prototype on pypi.org, so it's an overstatement to say "don't even have a way to avoid dependency confusion attacks ".

It is, however, a non-trivial problem, and more solutions will likely come over the years, many Python packaging tools like uv and poetry (and likely others) have way to name indexes and pin specific packages to indexes, which appears to be a promising UX.

the fact that `pip install` just runs whatever is in `setup.py` is still mind baffling, even if the author weren't mallicious the `setup.py` can still do harm (say delete a file by mistake), there really needs to be an official way of sandbox its running.

ogrisel · 8 months ago

Note that it's possible to disable that behavior with `pip install --only-binary :all:`.

This way, pip will fail if a dependency does not provide a `.whl` package, instead of automatically falling back to the "build from source" mode that can lead to arbitrary code execution at install time (via setuptools' `setup.py` or any other build backend mechanism).

However, installing from wheels just protects from arbitrary code execution at install time. If you do not trust the source and integrity of the package you install, you would still be subject to arbitrary code execution at import time.

Therefore, tools and processes to improve package provenance tracing and integrity checking are useful for both kinds of installations.

xgstation · 8 months ago

I think sometimes the problem is coming from accidental typos instead of not trusting, say if one accidentally typed `pip install requests` into `pip install requestss` and if `requestss` is malacious then by the time one noticed the typo the setup.py could have already run to do the harm

woodruffw · 8 months ago

It's not good, but it should also not be baffling: it's the exact same thing other ecosystems do (npm with install hooks/scripts, Rust with build.rs, Ruby with gemspecs, etc).

xgstation · 8 months ago

I know other ecosystems do the same and those are baffling too, especially for the newer created languages like rust, which is why https://internals.rust-lang.org/t/pre-rfc-sandboxed-determin... exists

f1shy · 8 months ago

Notably also common lisp (quicklisp)

pjc50 · 8 months ago

I don't think that makes much of a difference from the risk of bugs in the rest of the package when it's run.

openrisk · 8 months ago

Its always an interesting dynamic: assuming a high trust society pays dividends - Python would be nowhere close the success it has been without PyPI.

But then success attracts trust abusers and forces raising the fences (which comes with higher costs, both direct and indirect).

Direct costs in the people and infrastructure that must be dedicated to the task. Indirect costs in the frictions generated by complicating workflows.

It all points to the need for open source ecosystems to be taken more seriously by the economically able users who most benefit from this amazing development.

They won't pay anything unless they are forced to do so. Basic capitalism brings to externalise costs to society

NeutralCrane · 8 months ago

Perhaps, but can you explain how an alternative to capitalism wouldn’t result in people no paying for a service they don’t have to pay for?

oefrha · 8 months ago

> The one project cleared was a project containing obfuscated code, in violation of the PyPI Acceptable Use Policy.

Interesting, I didn’t know that. While I haven’t released anything obfuscated on PyPI, I’ve certainly written Python projects that include obfuscated code by necessity, namely scrapers packing duktape (embedded JS interpreter) and third party obfuscated JS blobs to generate signatures and stuff. I know for a fact there are projects like that on PyPI. I wonder if those are allowed.

(Come to think of it, those probably can be DMCAed if the targeted service provider is sufficiently motivated.)

They also allow binary packages if you want an easy way of hiding malware.

Dead Comment

toomuchtodo · 8 months ago

Awesome work, kudos to the PyPI team. Will it be possible to receive notifications of projects quarantine as a member of the public?

HanClinto · 8 months ago

Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.

Quarantining would prevent anyone from building / installing new copies of the compromised software, so this utility would only help people who were a) monitoring the project, and b) had a local version installed pre-quarantine. That's a pretty narrow scope of users, so now that I type all this out, I'm realizing that the juice is likely not worth the squeeze.

One of my responsibilities is software supply chain security in a financial services org, so this signal would be valuable for vulnerability management of dependencies. I wouldn't call it "threat hunting" per se, but ground truth around threat actor patterns helps us build better defensive systems in this regard. Keeping the bad bits out is way easier than remediating once they've been ingested into systems.

> Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.

It's not a bad idea, let Github know! Their security team is very good from my interactions with them.

alsodumb · 8 months ago

Given how widespread PyPI usage is, I'm surprised they only have one full time security staff. I mean I guess it makes sense, usage doesn't always mean they get more donations/money, but damn.

spencerchubb · 8 months ago

companies that actually care about security have a more secure solution and don't allow devs to use pypi

cjalmeida · 8 months ago

You’d be surprised by the amount of companies handling critical infrastructure that are OK with using PyPI directly

For example we have it behind a kind of transparent proxy, where you get only packages which were tested and scan by a team of experts.

davidshepherd7 · 8 months ago

Could you give some examples of more secure solutions?

me_vinayakakv · 8 months ago

https://socket.dev/ does a good job in detecting malicious packages in npm.

In their FAQ[1], they mention that they have plans to expand to PyPI as well.

[1]: https://docs.socket.dev/docs/faq

nathanmills · 8 months ago

Quarantining projects is just a band-aid. If you’re worried about malware, maybe stop letting random people upload code to the official package index. Or just write better docs so people stop using random packages in the first place.