Never mind less slow, how about making it work first?
I'm in a disprivileged location, and it seems pip can only download from pythonhosted.org at a rate of 10-20 kB/s. Worse still, pip downloads timeout and fail extremely quickly.
If I rerun the pip install, instead of resuming the download, it will download the file from the beginning, then timeout and fail again somewhere in the middle.
I tried going through a private VPN hosted on Linode, with similar results.
For the official Python package manager, this is simply unreliable and unacceptable behavior.
You pip install from it instead of pypi. It will in turn download from pypi and give the result to you. It will also keep itself updated, and you can batch download during the nigh packages you assume you will work.
As a result, our entire team always as most packages locally available. Changing machine, location or purging cache didn't mean loosing this benefit.
Besides, pip caching wheels doesn't mean it's not making any requests, so it's still a better experience.
For big companies, I recommend it anyway: it speeds up the entire team work, CI, allow you to publish private packages, etc.
If you can't do that, the next best things is to do "pip download", instead of "pip install", and save the wheels into a hard drive.
Thanks for the tips, I'll definitely look into them.
EDIT: And thanks for the tip from godmode2019. I upvoted you both.
However, they don't change the fact that pip by default assumes a decent Internet connection, with its short timeout and no resume on downloads, and thus is unusable with anything less.
Again, IMHO that's simply unacceptable for the official Python package manager.
I've had ~5 mbps download speeds for the last 5 years in a rural US location. I set up devpi on an old Raspberry Pi hoping to have this same experience, but after several months I disabled it and found it was actually slowing things down. If memory serve, the issue was that the local package lookups were excruciatingly slow (even if done in localhost), and I was surprised to see that devpi would routinely soak up multiple GB of memory and start swapping (I recall a few issues on memory leaks in their repo).
Seemed like a great idea, and perhaps I just needed something beefier than an RPi 3, but it didn't work out for me.
You can change the pypi index URL with `pip config set global.index-url`.
For example, as a Chinese, I often switch to `https://pypi.tuna.tsinghua.edu.cn/simple`. You may need to look up which one is available and faster in your region.
But I shouldn't need to know this, pip should have taken care of picking the best download mirror.
Or it could just support resuming a previous download. I don't actually mind waiting for a slow download or having to rerun pip install a few times, as long as it makes progress each time I run it.
People should be aware that adding additional mirrors (instead of switching mirrors as suggested here) can lead to big slowdowns, as pip has no way to determine whether a given / found wheel is the m best choice without comparing all options, so it will continue to search both locally and all available indices until it can resolve the "best" wheel to download.
The most efficient workflow I've found for pip so far is something along the lines of:
pip-compile # resolve and pin the dependency tree
grep '==' requirements-pinned.txt | xargs -n1 pip wheel # build every wheel in parallel
pip install *.whl # install the pre-built wheels
It may not avoid all the slowness, but when nicely integrated into a build/CI system it can avoid suffering it more than once.
Depends on your platform. If you're using ARM (like AWS Graviton, and possibly Apple M1) you might find that many wheels are not available. That might be different now, my information is 2 years old, but I remember 'pip install numpy' taking ages. Every time someone pushed a change we'd have a 15 minute job clogging the pipes. Luckily I had already created an artifact repo (AWS CodeArtifact) for our own internalpackages, so I added a step to our build process to push locally built wheels, so a painfully slow build would only happen once.
I did have a look at numpy and on my machine the tests did not bloat it as much as you made me believe. The `core/tests/` modules are 3MB and the `__pycache__` doubled it to 6MB. What do you refer to as "test case artifacts"? The modules, pycache, or both?
Also, wrt you statement on pandas; is it the debug symbols that account for the bloat, or the libs themselves?
I had a situation recently where I saved a bunch of time on resolution during install of Python packages by breaking it up into smaller groups of packages to install in separate steps. We have some packages in a private devpi server and some dependencies that can come from any pypi mirror. By forcing the private packages to be in their own install step it drastically cut down the time to resolve and start downloading packages. Does anyone know of comparisons of resolution metrics? I spent several hours of my week waiting on Python packages installation and would love to improve the situation.
On a related note: our NPM install times are even worse: any tips to help there would be welcome.
I like pnpm, which is like npm, but does caching by default, and does some other optimizations. (p is for parallel, I think)
The negative: sometimes it plays bad with some packages. Sometimes you need the (stupidly named) —shamefully-hoist, sometimes even that doesn’t help.
pnpm by default does NOT have the flat structure in node_modules but puts into separate sub-folders; sometimes it plays bad with some dependencies that expect the flat structure (usually webpack-related cludges)
But if you are starting fresh, or can spend some time on debugging, pnpm is amazing.
For those who are wondering what is the benefit over pip freeze: pip freeze dumps you entire venv, while pip compile gets project dependancies, and infer the requirements file from that.
Pip has had ton of development in the last few years, and continues to do so. I'd say stick with pip, unless there's a specific problem you're addressing with pipenv or Poetry.
However, like a sibling comment, I've also heard god things about Poetry. Probably worth giving it a spin somewhere and seeing if my initial thought still holds.
The USP of Hatch is that it uses the latest generation of Python standards and tech under the covers, so you have a unified tool that's less quirky than previous ones. Poetry and pipenv predate some of the improvements in Python packaging, so had to develop some things in their own way.
oh hi, sorry for the late reply. indeed you are right; i was typing from my phone and was a bit too hasty. i am not a fan of poetry after reading these; TLDR poetry does not support PEP621 about pyproject.toml either, and i feel this is the direction the python community has been going towards recently.
also pdm is developed by frostming, who's on PyPA (Python Packaging Authority) and more libraries are extending support for pdm (streamlit just today).
but in general i am not against poetry. it's a good tool and it pushed package managers a fair bit further - providing a superior alternative to conda imho
I'm in a disprivileged location, and it seems pip can only download from pythonhosted.org at a rate of 10-20 kB/s. Worse still, pip downloads timeout and fail extremely quickly.
If I rerun the pip install, instead of resuming the download, it will download the file from the beginning, then timeout and fail again somewhere in the middle.
I tried going through a private VPN hosted on Linode, with similar results.
For the official Python package manager, this is simply unreliable and unacceptable behavior.
You pip install from it instead of pypi. It will in turn download from pypi and give the result to you. It will also keep itself updated, and you can batch download during the nigh packages you assume you will work.
As a result, our entire team always as most packages locally available. Changing machine, location or purging cache didn't mean loosing this benefit.
Besides, pip caching wheels doesn't mean it's not making any requests, so it's still a better experience.
For big companies, I recommend it anyway: it speeds up the entire team work, CI, allow you to publish private packages, etc.
If you can't do that, the next best things is to do "pip download", instead of "pip install", and save the wheels into a hard drive.
EDIT: And thanks for the tip from godmode2019. I upvoted you both.
However, they don't change the fact that pip by default assumes a decent Internet connection, with its short timeout and no resume on downloads, and thus is unusable with anything less.
Again, IMHO that's simply unacceptable for the official Python package manager.
Seemed like a great idea, and perhaps I just needed something beefier than an RPi 3, but it didn't work out for me.
For example, as a Chinese, I often switch to `https://pypi.tuna.tsinghua.edu.cn/simple`. You may need to look up which one is available and faster in your region.
But I shouldn't need to know this, pip should have taken care of picking the best download mirror.
Or it could just support resuming a previous download. I don't actually mind waiting for a slow download or having to rerun pip install a few times, as long as it makes progress each time I run it.
Some discussion: https://discuss.python.org/t/why-does-pip-reach-out-to-indic...
I don't recall if this issue is solved by specifying hashes (or perhaps even pinned versions are adequate) -- I would hope so.
Pip install git+<repo_url>
It still happens from time to time, but if somebody else read this comment, you want to check how many wheels you build before doing so.
Also, you may want to cache those build once and for all.
A lot of maintainers don't pay attention to excluding test cases and artifacts from their packages, leading to ridiculous package size growth.
Last I checked numpy was close to 100MB, a huge chunk of that being test case artifacts.
Django packages all translations for all languages to ever exist.
Pandas bundles pre-built dynamic libraries with debug symbols.
Etc.
Most of my "production" virtualenvs are close to a GB nowadays, which is insane.
I did have a look at numpy and on my machine the tests did not bloat it as much as you made me believe. The `core/tests/` modules are 3MB and the `__pycache__` doubled it to 6MB. What do you refer to as "test case artifacts"? The modules, pycache, or both?
Also, wrt you statement on pandas; is it the debug symbols that account for the bloat, or the libs themselves?
Just installed a quick "data science" like virtualenv, here is the example things I talk about:
$ du -sh scipy 110M scipy $ find scipy -type f -name '.so' -exec strip -s {} \; $ du -sh scipy 76M scipy
$ du -sh django/contrib/admin/locale/ 5.4M django/contrib/admin/locale/
On a related note: our NPM install times are even worse: any tips to help there would be welcome.
The negative: sometimes it plays bad with some packages. Sometimes you need the (stupidly named) —shamefully-hoist, sometimes even that doesn’t help.
pnpm by default does NOT have the flat structure in node_modules but puts into separate sub-folders; sometimes it plays bad with some dependencies that expect the flat structure (usually webpack-related cludges)
But if you are starting fresh, or can spend some time on debugging, pnpm is amazing.
[0] https://github.com/jazzband/pip-tools
However, like a sibling comment, I've also heard god things about Poetry. Probably worth giving it a spin somewhere and seeing if my initial thought still holds.
https://hatch.pypa.io/
The USP of Hatch is that it uses the latest generation of Python standards and tech under the covers, so you have a unified tool that's less quirky than previous ones. Poetry and pipenv predate some of the improvements in Python packaging, so had to develop some things in their own way.
But it is very slow.
However, pdm and the likes are interesting because they are, indeed, PEP 582 compliants, which let you use __pypackages__ like one uses node_modules.
* https://iscinumpy.dev/post/bound-version-constraints/
* https://iscinumpy.dev/post/poetry-versions/
also pdm is developed by frostming, who's on PyPA (Python Packaging Authority) and more libraries are extending support for pdm (streamlit just today).
but in general i am not against poetry. it's a good tool and it pushed package managers a fair bit further - providing a superior alternative to conda imho
Deleted Comment