120ms to 30ms: Python to Rust

Almost always when I start prototyping something in python, I wish that I stopped half-way where I am now and switched to something else.

Most recent example - converting huge amount of xml files to parquet. I started very fast with python + pyarrow, but when I realized that parallelizing execution would help enormously, I hit GIL or picking/unpickling/multiprocessing costs.

It did work in python, in the end, but I feel that writing that in Rust/C# (even if I don't know Rust besides tutorials) in the end would be much more performant.

mharig · a year ago

1. Pickling is, as already mentioned by a cocommenter, slow. Do not use it.

2. Do not use loops, use itertools. (Python 3.12 got a nice 'batched' function, btw.)

3. Preallocate memory, where possible.

4. multiprocessing.shared_memory may help.

5. Something like cython may help.

6. Combination of multiprocessing with asyncio (or multithreading) may help.

7. memmap file access might help.

You see, you have a lot of options before you need to learn another language.

dmw_ng · a year ago

> converting huge amount of xml files

> pickling

Sounds like if this is the tooling and the task at hand, about the most complex things that should be passing through the pickler are partitioned lists of filenames rather than raw data. E.g. you can have each partition generate a parquet for combining in a final step (pyarrow.concat_tables() looks useful), or if it were some other format you were working with, potentially sending flat arrays back to the parent process as giant bytestrings or similar

This is not to say the limitations don't suck, just that very often there are simple approaches to avoid most of the pain

Deleted Comment

nomel · a year ago

GIL has been relieved in Python 3.12: https://realpython.com/python312-subinterpreters/

But, I have the same sentiment. Although, I find writing quick C++ extensions (swig is incredible) is a good balance.

alberth · a year ago

You can quickly build prototypes with LuaJIT and it’s also quite performant.

It’s worth giving it a try if you haven’t before.

cedws · a year ago

I recently needed to do something similar and used Apache Arrow’s Go library for Parquet. It’s horrifically slow and somehow manages to leak memory. It’s also undocumented. If anybody knows a good Parquet library for Go please let me know.

bdjsiqoocwk · a year ago

Julia

_niki_s_ · a year ago

Looks like Mojo will fix that

itishappy · a year ago

Right, by switching languages.

jmakov · a year ago

ray.io

That is a lot of text for not determining why the new solution is faster. The only relevant part:

> Before our migration, the old pipeline utilized a C library accessed through a Python service, which buffered and bundled data. This was really the critical aspect that was causing our latency.

How much speed up would there have been if they moved to a Rust wrapper around the same C library?

Using something other than Python is almost always going to be faster. This Reddit post does not give any insights into which aspects of Python lead to small/large performance hits. They show that it was the right solution for them with ample documentation which is great, but they don't provide any generalizable information.

galkk · a year ago

bin_bash · a year ago

I enjoy both languages quite a bit, but a 4x improvement is more of a testament to python than it is to rust imo!

nolroz · a year ago

How do you mean? Would you expect it to be much slower?

flohofwoe · a year ago

Yeah, IME Python can be about 100x slower than a native solution. The original solution was a combination of a C library and Python wrapper code, so 4x makes sense for eliminating the Python part.

gumby · a year ago

What I get from this is that the python interpreter is better than I would have guessed.

dagss · a year ago

Same, until I read "C library accessed through Python"

atiedebee · a year ago

Well, depending on how much python is written around the C library, you'll get a lot of overhead.

There's also the fact that having everything be compiled simultaneously results in the optimizer being able to get a lot more work done.

constantcrying · a year ago

If you are using python as a way to more easily address a C library, it works very well and can be reasonably fast.

The moment you are using "pure" python on large datasets the whole thing starts to crumble.

Talinx · a year ago

Havoc · a year ago

Noticed similar scale of improvement when testing the two on cloud functions.

Just comes down to whether you need speed of building it or speed of program

pjmlp · a year ago

Could be renamed as 120ms to 30ms: Python to "AOT compiled language".

gregors · a year ago

Am I the only one thinking "why wasn't it better?"

Guess I'm never satisfied

jti107 · a year ago

use the right tool for the right job. your software requirements and design document should drive the language and the tools you use.

fullstackchris · a year ago

what i was thinking as well... i love these performance improvement posts but at the same time had to think what kind of choice was it to originally reach for python in the first place if the task was to do a lot of heavy concurrent task management???