I have been the release manager for PyPy, an alternative Python interpreter with a JIT [0] since 2015, and have done a lot of work to make it available via conda-forge [1] or by direct download [2]. This includes not only packaging PyPy, but improving on an entire C-API emulation layer so that today we can run (albeit more slowly) almost the entire scientific python data stack. We get very limited feedback about real people using PyPy in production or research, which is frustrating. Just keeping up with the yearly CPython release cycle is significant work. Efforts to improve the underlying technology needs to be guided by user experience, but we hear too little to direct our very limited energy. If you are using PyPy, please let us know, either here or via any of the methods listed in [3].
[0] https://www.pypy.org/contact.html [1] https://www.pypy.org/posts/2022/11/pypy-and-conda-forge.html [2] https://www.pypy.org/download.html [3] https://www.pypy.org/contact.html
Moving to pypy definitely speeded me up a bit. Not as much as I'd hoped, it's probably all about string index into dict and dict management. I may recode into a radix tree. Hard to work out in advance how different it would be: People optimised core datastructs pretty well.
Uplift from normal python was trivial. Most dev time spent fixing pip3 for pypy in debian not knowing what apts to load, with a lot of "stop using pip" messaging.
I’m sure it’s better if you’re deploying an appliance that you hand off and never touch again, but for evolving modern Python servers it’s not well suited.
I still haven't figured out how to beat this dragon. All suggestions welcome!
Deleted Comment
Deleted Comment
There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.
Cool. Is the performance here something you would like to pursue? If so could you open an issue [0] with some kind of reproducer?
[0] https://foss.heptapod.net/pypy/pypy/-/issues
I need to find out how to instrument the seek/add cost of threads against the shared dict under a lock.
My gut feel is that probably if I inlined things instead of calling out to functions I'd shave a bit more too. So saying "slower than expected" may be unfair because there's limits to how much you can speed this kind of thing up. Thats why I wondered if alternate datastructures were a better fit.
its variable length string indexes into lists/dicts of integer counts. The advantage of a radix trie would be finding the record in semi constant time to the length in bits of the strings, and they do form prefix sets.
By definition if you lift something it is going to go up, but what does this mean?
Some engines can't build and deploy all imports.
Some engines demand syntactic sugar to do their work. Pypy doesn't
I'm very curious about where the line is/should be.
[0]: https://numba.pydata.org/
Haven’t used it in a bit mostly because I’ve been working on projects that haven’t had the same bottleneck, or that rely on incompatible extensions.
Thank you for your work on the project!
> that rely on incompatible extensions.
Which ones? Is using conda an option, we have more luck getting binary packages into their build pipelines than getting projects to build wheels for PyPI
I am still working on it but the main issue is psycopg support for now, as I had to install psycopg2cffi in my test environment, but it will probably prevent me from using pypy for running our test suite, because psycopg2cffi does not have the same features and versions as psycopg2. This means either we switch our prod to pypy, which won't be possible because I am very new in this team and that would be seen as a big, risky change by the others, or we keep in mind the tests do not run using the exact same runtime as production servers (which might cause bugs to go unnoticed and reach production, or failing tests that would otherwise work on a live environment).
I think if I ever started a python project right now, I'd probably try and use pypy from the start, since (at least for web development) there does not seem to be any downsides to using it.
Anyways, thank you very much for your hard work !
[1]: https://www.psycopg.org/psycopg3/docs/basic/install.html
With CPython, I was frustrated with how slow it was, and complained about it to the people I was working with, PyPy was a simple upgrade that sped up my code to the point where it was comfortable to work with.
I am still using this library that I wrote
https://paulhoule.github.io/gastrodon/
to visualize RDF data so even if I make my RDF model in Java I am likely to load it up in Python to explore it. I don’t know if they are using PyPy but there is at least one big bank that has people using Gastrodon for the same purpose.
https://paulhoule.github.io/gastrodon/
which makes it very easy to visualize RDF data with Jupyter by turning SPARQL results into data frames.
Here are two essays I wrote using it
https://ontology2.com/essays/LookingForMetadataInAllTheWrong...
https://ontology2.com/essays/PropertiesColorsAndThumbnails.h...
People often think RDF never caught on but actually there are many standards that are RDF-based such as RSS, XMP, ActivityPub and such that you can work on quite directly with RDF tools.
Beyond that I’ve been on a standards committee for ISO 20022 where we’ve figured out, after quite a few years of looking at the problem, how to use RDF and OWL as a master standard for representing messages and schemas in financial messaging. In the project that needed PyPy we were converting a standard represented in EMOF into RDF. Towards the end of last year I figured out the right way to logically model the parts of those messages and the associated schema with OWL. That is on its way of becoming one of those ISO standard documents that unfortunately costs 133 swiss franc. I also figured out that it is possible to do the same for many messages defined with XSLT and I’m expecting to get some work applying this to a major financial standard and I think there will be some source code and a public report on that.
Notably the techniques I use address quite a few problems with the way most people use RDF, most notably many RDF users don’t use the tools available to represented ordered collections, a notable example with this makes trouble is in Dublin Core for document (say book) metadata where you can’t represent the order of the authors of a paper which is something the authors usually care about a great deal. XMP adapts the Dublin Core standard enough to solve this problem, but with the techniques I use you can use RDF to do anything any document database can, though some SPARQL extensions would make it easier.
Create venv and activate it and install packages:
I wanted a similar one-liner that I could use on a fresh Ubuntu machine so I can try out PyPy easily in the same way. After a bit of fiddling, I came up with this monstrosity which should work with both bash and zsh (though I only tested it on zsh):Create venv and activate it and install packages using pyenv/pypy/pip:
Maybe others will find it useful.So if you have PyPy already on your machines;
Was not that bad after all, when my initial thought was that do I need all the above to just initiate the project :DCheck it out if you haven't.. I've been using it for more years than I can count and being able to CD from a PHP project to a Ruby project to a Python project with ease really helps with context switching.
So the good: It apparently now supports Python 3.9? Might want to update your front page, it only mentions Python 3.7.
The bad: It only supports Python 3.9, we use newer features throughout our code, so it'd be painful to even try it out.
https://downloads.python.org/pypy/
Maybe the site is not up to date ?
Personally I don't use PyPy for anything, though I have followed it with interest. Most of the things I need to go faster are numerical, so Numba and Cython seem more appropriate.
Edit; typo