Nuitka: An extremely compatible Python compiler

This project is super impressive. It makes me wonder why there's not a similarly mature JavaScript AOT compiler implementation for use in game development or other places where you want performance better than a VM and you're not allowed to JIT.

A small recommendation for these release pages would be to link to the tag in whatever source host you're using! I just want to see the code.

tylerhou · 4 years ago

The main slowdown in JavaScript is because it is inherently so dynamic — all function calls are "virtual," properties are string lookups, you need a lot of indirection (boxing) to store objects. Without actually running the program on some input, it is infeasible to determine what those indirections will resolve to. So static compilation is useless (or would produce instructions very similar to what an interpreter would execute). That's why we usually interpret the code (and JIT it to optimize places where the program is not actually dynamic).

Theoretically you could "precompile" JavaScript by running your code on some known input and caching the VM's state. Then you can "hydrate" your VM with that state. But this would only speed up the VM warmup (which is not that long) and requires significant browser/VM buy-in, so it is probably infeasible.

The only way to produce better performance in JavaScript is to restrict it to a less-dynamic subset (e.g. objects must have static keys) and add at least some static typing so that a compiler can resolve some references ahead of time. That's why asm.js was a thing. But asm.js has been superseded by WASM, which better in most regards.

vlovich123 · 4 years ago

Have you seen the work Fastly discussed about their AOT compiler? Seems like an interesting approach that maybe could be used to preprocess code into a more efficient form before running it in the cloud. The challenge is that JS engines are primarily focused on the browser and such an optimization opportunity isn’t interesting there - the majority of development is done by browser vendors.

maattdd · 4 years ago

There is, for example https://bellard.org/quickjs/

eatonphil · 4 years ago

That's an interpreter written in C and without a JIT. Not the same as an AOT compiler.

https://github.com/bellard/quickjs/blob/master/quickjs-opcod...

The point isn't just not having a JIT. You can run v8 without a JIT. The point is good performance without a JIT.

Deleted Comment

pjmlp · 4 years ago

Not really what you are after, but it does use AOT (via generated C++ code).

https://arcade.makecode.com/

https://makecode.com/language

eatonphil · 4 years ago

> In case of microcontrollers, PXT programs are compiled in the browser to ARM Thumb assembly, and then to machine code, resulting in a file which is then deployed to the microcontroller, usually via USB mass-storage interface.

Looks like the only compilation strategy is directly to ARM Thumb assembly or did I miss something?

kevin_thibedeau · 4 years ago

You'd be stuck with float numerics everywhere because there's no fallback to recompile if an attempt to use integer fails.

gnulinux · 4 years ago

compile both, dispatch dynamically in runtime.

Scarbutt · 4 years ago

Because it will require tons of resources to make it more performant that the current JS JITs and it will probably fail. So what you see is some games just embedding V8.

Wow, this got my attention. I use Python for web application development. A 2-3x speedup would be very interesting for higher-load deployments.

BiteCode_dev · 4 years ago

The best feature of nuitka, and its main goal, is not speed gain. That's the cherry on top.

The main goal is getting a standalone executable that works with c extensions.

synergy20 · 4 years ago

that's correct, nuitka is to package python into one single executable, it's an 'installer' per se.

antman · 4 years ago

Does it also package the extensions?

spenczar5 · 4 years ago

What sort of web applications do you work on where Python interpreter speed is the limiting factor? Usually, those applications are constrained by network throughput and context switches and page faults.

Eikon · 4 years ago

This argument is so tiring. That's part of why websites are so slow despite having insane hardware at their disposal.

At my company, for a web app we run in production, we strive to get every response out of our infrastructure in under a millisecond, everything above that except for a few select endpoints is considered as a bug. By using sensible technology choices, it's not even that hard to do. A RDMS like postgres can answer to queries in microseconds.

Nice bonus, we can provide real time computation features our competitors could only dream of, just by not using dog-slow technology.

Our customers are not techies and you know what? When they use our product, the first comment is usually "wow, it's so fast".

takeda · 4 years ago

If you are interested in speed you might get more from Cython (not to mistake it with CPython) or mypyc. The catch is that you need to specify types (in Cython you otherwise won't get speed gain, and mypyc will refuse to compile).

I remember reading an article somebody was describing that instead of even adopting project to work with Cython he/she outright started their project in Cython and found it beneficial.

qeternity · 4 years ago

You're extremely unlikely to get much, if any, speed up on a large, general codebase.

The 2-3x improvements are on specific micro benchmarks.

dec0dedab0de · 4 years ago

if you're just looking to speed things up PyPy, and Cython are both good options

vips7L · 4 years ago

Graal Python is really fast too, but I'm not sure how it plays with C extensions.

hultner · 4 years ago

Nuitka is a wonderful project which in my opinion doesn't get enough attention.

I first found it back in 2015 when I worked at a company where we built a python based desktop application as a part of our industrial control system. Nuitka provided better performance then pyinstaller/cx_freeze while it was still simple to work with. Back then there were still a few incompatibilities but I've followed the project throughout the years and it's mature like a fine wine since.

asicsp · 4 years ago

Some of the past discussions:

* https://news.ycombinator.com/item?id=8771925 (2014, 135 comments)

* https://news.ycombinator.com/item?id=10994267 (2016, 52 comments)

* https://news.ycombinator.com/item?id=15354613 (2017, 60 comments)

aarchi · 4 years ago

Nuitka looks like a traditional ahead-of-time compiler using SSA form[0] and is written in Python. I’d be interested in seeing performance comparisons with PyPy, which uses the second Futamura protection and is written in a dialect of Python.

[0]: https://nuitka.net/doc/developer-manual.html#ssa-form-for-nu...

It's slower than pypy, but starts faster and results in a standalone executable. Also, it's compatible with most c extensions.

igouy · 4 years ago

https://pybenchmarks.org/u64q/benchmark.php?test=all&lang=nu...

"Python Interpreters Benchmarks"

tudelo · 4 years ago

Thanks for the link. Don't know if you are affiliated but there is a lot of link rot going on there with external links.

keithasaurus · 4 years ago

If you're already using Python's type hints, I'd suggest checking out mypyc instead. If your code is already type-checked by mypy, you don't need to do much more than run `mypyc` to get a performance boost.

lennoff · 4 years ago

Tangentially related question: could Nuitka target WebAssembly, given that according to their overview page, they translate Python into C? Usually when it comes to Python -> WebAssembly, the biggest problem is the lack of GC in the WebAssembly spec (as far as I understand), and I'm wondering if this would be an issue for Nuitka as well...

codeflo · 4 years ago

Of course you can build a GC in WebAssembly. You might have to avoid the native stack and lose some performance that way, but that shouldn’t be too bad I think.

One problem is that this GC doesn’t interact with the browser’s GC in any way. So you have painful memory management interactions when (for example) a DOM event handler references a WebAssembly object which in turn holds a reference to a browser object, possibly with cycles (so simple reference counting isn’t enough).

You can fall back to manual memory management here, but that’s painful to use. To make this work seamlessly, you need a way to trace references across both heaps in one swoop. Last I checked, there was standardization work underway to enable that.

Yes but pyodide is probably better suited for that and can load uncompiled python code.

See :

https://github.com/pyodide/pyodide

And an example featuring a pure client side jupyter instance :

https://news.ycombinator.com/item?id=28377550

It would be neat if the big companies like Dropbox, Instagram, Google, Oracle, Shopify, Stripe, whatnot building better Python/JavaScript/Ruby implementations would start building program analysis libraries for Python/JavaScript/Ruby so that more implementations could get this for free.

For example, the homepage of Nuitka says they only just added support for constant folding and propagation. That's such low hanging fruit it's crazy they have to do that for themselves.

You can imagine generic libraries for Python/JavaScript/Ruby that will turn the AST alone into its most optimal form and then let some other backend worry about code generation or VM implementation.

shadowfox · 4 years ago

An issue, though not an unsolvable one, is that you probably need a standardized representation of the AST to write these optimization passes against; it is not uncommon for compiler writers to disagree on what representation is the best for what purpose.

Then there is the question of optimizations that are (typically) easier/possible only at the code-generation phase.

All of this is solvable of course, but there needs to be some will to do so (or in the case of companies, enough commercial benefit).

zeckalpha · 4 years ago

LLVM IR?

philpem · 4 years ago