HRT's Python fork: Leveraging PEP 690 for faster imports

roadside_picnic · 7 months ago

Interviewed with HRT awhile back. While I didn't get past the final round, their Python internals interview (which I did pass) was an absolute blast to prepare for, and required a really deep dive into implementation specific details of CPython around things like exactly how collisions are handled in dict, details about memory management, etc. Pretty much had to spend a few weeks in the CPython source to prep, and was, for me, worth the interview just to really learn what's going on.

For most teams I would be pretty skeptical of a internal Python fork, but the Python devs at HRT really know their stuff.

nly · 7 months ago

I interviewed with them as well. Something like 6-8 interviews only to be told they then, after that, were circulating my CV amongst teams and didn't have a fit for me...

But yes, like you I had a great experience

sgarland · 7 months ago

I also interviewed with them a couple of years ago, for their Database SRE (AKA DBRE) role. It was going quite well until I discovered that they required you to live within commuting distance of an office, and unfortunately I had just moved away from one. They didn’t require in-person, supposedly, but needed the option, I guess?

I was quite impressed by the interviews, mostly for their pragmatism and skill-fitting. The programming interview wasn’t LC, it was “can you use a language (preferably Python) to parse a CSV and get useful information out of it,” because that’s the skill level the team needs. On the other hand, the Linux and DB interviews were quite in-depth, because again, the team needs those skills.

10/10 would interview again if I’m ever near an office again.

htrp · 7 months ago

when milliseconds mean millions

ActorNightly · 7 months ago

Those days are all over btw.

Most trading firms are past the whole "beat the other guys to buy". Established large investment firms already have all that on lockdown in terms of infrastructure and influence to the extent where they basically just run the stock market at this point (i.e Tesla posts horrible quarter numbers, but stock goes up).

Most of the smaller firms basically try to figure out the patterns of the larger firms and capitalize on that. The timescales have shifted quite a bit.

PufPufPuf · 7 months ago

If that was the case, why use Python in the first place?

shepardrtc · 7 months ago

Honestly if you're a millisecond too slow you might as well not trade at all. From my own experience with trying to get Python to go fast for crypto trading, you can get it pretty fast using Cython - single digit microseconds on an average AWS instance for a simple linear regression was my proudest moment. They're probably pushing it even faster because nanoseconds are where the money's at. Many HFT firms are down in the double digit nanoseconds, I believe. Maybe lower.

ActorNightly · 7 months ago

>Python devs at HRT really know their stuff.

Its a finance firm - i.e scam firm. "We have a fancy trading algorithm that statistically is never going to outperform just buying VOO and holding it, but the thing is if you get lucky, it could".

Scammers are not tech people. And its pretty from their post.

> In Python, imports occur at runtime. For each imported name, the interpreter must find, load, and evaluate the contents of a corresponding module. This process gets dramatically slower for large modules, modules on distributed file systems, modules with slow side-effects (code that runs during evaluation), modules with many transitive imports, and C/C++ extension modules with many library dependencies.

As they should.

The idea that when you type something in the code and then the interpreter just doesn't execute it is how you end up with Java like services, where you have dependency injection chains that are so massive that when the first time everything has to get lazily injected the code takes a massive amount of time to run. Then you have to go figure out where is the initialization code that slows everything down, and start figuring out how to modify your code to make that load first, which leads to a mess.

If your python module takes a long time to load, this is a module problem. There is a reason why you can import submodules of modules directly, and overall the __init__.py in the module shouldn't import all the submodules by default. Structure your modules so they don't do massive initialization routines and problem solved.

Furthermore, because of pythons dynamic nature, you can do run time imports, including imports in functions. In use, whether you import something up at the top and it gets lazily loaded or you import something right when you have to use it has absolutely no difference other than code syntax, and the latter is actually better because you can see what is going on rather than the lazy loading being hidden away in the interpreter.

Or if you really care, you can implement lazy work process inside the modules, so when you import them and use them the first time it works exactly like lazy imports.

To basically spend time building a new interpreter with lazy loading just to be able to have all your import statements up at the top just screams that those devs prefer ideology over practicality.

ladberg · 7 months ago

> Its a finance firm - i.e scam firm. "We have a fancy trading algorithm that statistically is never going to outperform just buying VOO and holding it, but the thing is if you get lucky, it could".

HRT trades their own money so if it didn't beat VOO then they'd just buy VOO. There are no external investors to scam.

mhh__ · 7 months ago

> "We have a fancy trading algorithm that statistically is never going to outperform just buying VOO and holding it, but the thing is if you get lucky, it could".

You wish lol. How do you think they pay for all the developers?

Firms like HRT don't even take outsider money, they don't really need to.

And besides, we don't get paid for beating stocks, a lot of funds will do worse than equities in a good year for the latter, the whole point is that you're benchmarked to the risk free rate because your skill is in making money while being overall market neutral. So you rarely take a drawdown anywhere near as badly as equities.

As a service this is often a portfolio diversification tool for large allocators rather than something they put all the money into.

It is true however that some firms are basically just rubbish beta vehicles that probably should in an ideal world shut down.

idohft · 7 months ago

> Its a finance firm - i.e scam firm. "We have a fancy trading algorithm that statistically is never going to outperform just buying VOO and holding it, but the thing is if you get lucky, it could". > Scammers are not tech people. And its pretty from their post.

It would be great if you included any sort of evidence or argument.

Reading on to the other comments, it looks like you're throwing out a lot of accusations and claims. I don't know what you think you know, but from the looks of it, you don't really know HRT's business. I don't really these days, but I knew it years ago, and it's not from taking client money or arbitrage or some weird scam. It's not magic but the world of algo trading isn't a ponzi scheme.

82716f12 · 7 months ago

> "We have a fancy trading algorithm that statistically is never going to outperform just buying VOO and holding it, but the thing is if you get lucky, it could".

How they make $8B/y underperforming VOO?

Reference: https://www.businessinsider.com/hudson-river-trading-hrt-8-b...

mgaunard · 7 months ago

Prop trading firms usually have returns higher than 25%, which is way higher than holding the S&P.

You're confusing prop shops and hedge funds.

Danjoe4 · 7 months ago

Runtime imports are a maintenance nightmare and can quickly fragment a codebase. Static analysis of imports is so desirable that it is almost always worth the initialization performance hit. Tradeoffs.

cjj_swe · 7 months ago

[flagged]

tracnar · 7 months ago

While I see the usefulness of lazy imports, it always seemed a bit backward to me for the importer to ask for lazy import, especially if you make it an import keyword rather than a Python flag. Instead I'd expect the modules to declare (and maybe enforce) that they don't have side effects, that way you know they can be lazily imported, and it opens the door for more optimizations, like declaring the module immutable. That links to the performance barrier of Python due to its dynamic nature as discussed in https://news.ycombinator.com/item?id=44809387

Of course that doesn't solve the overhead of finding the modules, but that could be optimized without lazy import, for example by having a way to pre-compute the module locations at install time.

instig007 · 7 months ago

> it always seemed a bit backward to me for the importer to ask for lazy import, especially if you make it an import keyword rather than a Python flag

Exactly this. There must be zero side effects at module import time, not just for load times, but because the order of such effects is 1) undefined, 2) heavily dependent on a import protocol implementation, and 3) poses safety and security nightmares that Python devs don't seem to care much about until bad things happen at the most inconvenient time possible.

> Of course that doesn't solve the overhead of finding the modules, but that could be optimized without lazy import, for example by having a way to pre-compute the module locations at install time.

1) opt for https://docs.python.org/3/reference/import.html#replacing-th...

2) pre-compute everything in CI by using a solution from (1) and doing universal toplevel import of the entire Python monorepo (safe, given no side effects).

3) This step can be used to scan all toplevel definitions too, to gather extra code meta useful for various dynamic dispatch at runtime without complex lookups. See for example: https://docs.pylonsproject.org/projects/venusian/en/latest/i...

3) put the result of (2) and (3) as a machine-readable dump, read by (1) as the alternative optimised loading branch.

4) deploy (3) together with your program.

tracnar · 7 months ago

For optimizing the module finding, using a custom import hook was indeed what I had in mind!

Spivak · 7 months ago

> This process gets dramatically slower for … modules on distributed file systems, modules with slow side-effects

Oh no. Look I'm not saying you're holding it wrong, it's perfectly valid to host your modules on what is presumably NFS as well as having modules with side effects but what if you didn't.

I've been down this road with NFS (and SMB if it matters) and pain is the only thing that awaits you. It seems like they're feeling it. Storing what is spiritually executable code on shared storage was a never ending source of bugs and mysterious performance issues.

zzzeek · 7 months ago

Gonna call this an antipattern. Do you need all those modules imported in every script ? Well then you save nothing on loadup time, the time will be spent regardless. Does every script not need those imports ? Well they shouldn't be importing those things and this small set of top level imports should be curated into a better, more fine grained list (and if you want to write tools, you can certainly identify these patterns using tooling similar to that which you wrote for LazyImports).

its-summertime · 7 months ago

    import argparse
    parser = argparse.ArgumentParser()
    parser.parse_args()
    import requests

Is an annoying bodge that a programmer should not have to think about, as a random example

sunshowers · 7 months ago

There are often large programs where not every invocation imports every module.

The lazy import approach was pioneered in Mercurial I believe, where it cut down startup times by 3x.

mdaniel · 7 months ago

Or, here's an idea: don't write a CLI on the hot path of a developer's flow in a scripting language. No wonder it lost out

spicybright · 7 months ago

For personal one file utility scripts, I'll sometimes only import a module on a code path that needs it. And make it global if the scope gets in the way.

It's dirty, but speeds things up vs putting all imports at the top.

fabioz · 7 months ago

It'd have been really nice to have that PEP in as it'd have helped me not have to write local imports everywhere.

As it is, top-level imports IMHO are only meant to be used for modules required to be used in the startup, everything else should be a local import -- getting everyone convinced of that is the main issue though as it really goes against the regular coding of most Python modules (but the time saved to start up apps I work on does definitely make it worth it).

theLiminator · 7 months ago

Yeah, imo that's the way that python should've worked in the first place.

Import-time side effects are definitely nasty though and I wonder what the implications on all downstream code would be. Perhaps a lazy import keyword is a better way forward.

nasretdinov · 7 months ago

I wonder how much can be saved by using a local file system for imports though. In my testing just a mere presense of a home directory on NFS already dramatically slows down imports (by ~10x) due to Python searching for modules in home directory too by default.

gjvc · 7 months ago

to prevent this, set PYTHONNOUSERSITE=1 will prevent searching for modules in ~/.local/ (for convenience, try calling python through a wrapper in your project, say bin/run-python, and there you can set all the python-specific environment variables you need, set at the time of execution and not have to worry about setting them in the user's shell etc)

nasretdinov · 7 months ago

Thanks, yeah I know that it works, what I meant is that it may be quite easy to compare the module import times with and without that env variable to see how much impact an NFS home directory has (and it's a lot), and possibly draw similar conclusions about the distributed file system behaviour in general too

davidteather · 7 months ago

The author interviewed me and talked about this project, so it was cool seeing a blog post posted about it

rsyring · 7 months ago

> we support the Steering Council in their rejection of PEP 690—the implicit lazy imports are not a good fit for upstream due to the same, subtle bugs we encountered during our migration. However, as time permits, we hope to propose a revised lazy imports PEP that introduces an explicit lazy keyword, e.g. lazy import foo or lazy from foo import bar. This approach will satisfy migration and compatibility concerns, allow users to opt-in gradually, and enable all Python users to reap the speed benefits of lazy imports in a safe way.