Constraints Are Good: Python's Metadata Dilemma

PaulHoule · 9 months ago

Eggs are dying out, pointed out by this 2 year old blog post:

https://about.scarf.sh/post/python-wheels-vs-eggs

The metadata problem is related to the problem that pip had an unsound resolution algorithm based on "try to resolve something optimistically and hope it works when you get stuck and try to backtrack".

I did a lot of research along the line that led to uv 5 years ago and came to the conclusion that installing out of wheels you can set up a SMT problem the same way maven does and solve it right the first time. They had a PEP to publish metadata files for wheels in PyPi but I'd built something before that could suck the metadata out of a wheel with just 3 http range requests. I believed that any given project might depend on a legacy egg and in those cases you can build that egg into a wheel via a special process and store it in a private repo (a must for the perfect Python build system)

the_mitsuhiko · 9 months ago

The metadata problem is unrelated to eggs. Eggs haven’t played much of a role in a long time but the metadata system still exists.

Range requests are used by both uv and pip if the index supports it, but they have to make educated guesses about how reliable that metadata is.

The main problem are local packages during development and source distributions.

PaulHoule · 9 months ago

Back in the case of eggs you couldn't count on having the metadata until you ran setup.py which forced pip to be unreliable because so much stuff got installed and uninstalled in the process of a build.

There is a need for a complete answer for dev and private builds, I'll grant that. Private repos like we are used to in maven would help.

m463 · 9 months ago

https://en.wikipedia.org/wiki/Satisfiability_modulo_theories

TheCleric · 9 months ago

    Maybe the solution will be for tools like uv or poetry to warn if dynamic metadata is used and strongly discourage it. Then over time the users of packages that use dynamic metadata will start to urge the package authors to stop using it.

I wouldn’t bet on this one. I know a lot of python package maintainers who would likely rather kill their project than to adapt to a standard they don’t like. For example see flake8’s stance on even supporting pyproject.toml files which have been the standard for years: https://github.com/PyCQA/flake8/issues/234#issuecomment-8128...

I know because I’m the one that added pyproject.toml support to mypy 3.5 years ago. Python package developers can rival Linux kernel maintainers for resistance to change.

Deleted Comment

lyu07282 · 9 months ago

> The challenge with dynamic metadata in Python is vast, but unless you are writing a resolver or packaging tool, you're not going to experience the pain as much.

But that is by choice, I as a user, am forced to debug this pile of garbage whenever things go wrong, so in a way it's even worse for users. It's a running joke in the machine learning community that the hard part about machine learning is having to deal with python packages.

pdonis · 9 months ago

A lot of the problem seems to be driven by a desire to have editable installs. I personally have never understood why having editable installs is such an important need. When I'm working on a Python package and need to test something, I just run

python -m pip install --user <package_name>

and I now have a local installation that I can use for testing.

the_mitsuhiko · 9 months ago

That would you require to make re-installations if your local app you develop against after every code change. Very few people will want to do that and it’s potentially very slow.

It’s also a step not needed by most other ecosystems.

pdonis · 9 months ago

> It’s also a step not needed by most other ecosystems.

From what I can gather, most other ecosystems don't even have the problem under discussion.

bheadmaster · 9 months ago

Go (a.k.a. Golang), with its network-first import system (i.e. import "example.org/foo/bar"), has solved the problem in a surprisingly simple way. You just add a "replace" directive in a go.mod file and you can point your import (and all child imports) to any directory on the filesystem.

pdonis · 9 months ago

> it’s potentially very slow.

Potentially, perhaps. But it's certainly not for the cases where I use it: a pure python package, whose dependencies are already installed and are not changing (only the package itself is). Under those conditions, the command line I gave takes a couple of seconds to run.

raoulj · 9 months ago

Yeah. It's too slow. Editable installs make application development much faster.

taeric · 9 months ago

I am curious how Python got into this situation. Was it largely taking the path of least resistance to more and more adoption?

I get that Python is, strictly speaking, an older language. But, it isn't like these are at all new considerations.

the_mitsuhiko · 9 months ago

Classic case of lack of constraints early on. Once people use all that power you end up with a mess.