Nuitka is a wonderful project which in my opinion doesn't get enough attention.
I first found it back in 2015 when I worked at a company where we built a python based desktop application as a part of our industrial control system. Nuitka provided better performance then pyinstaller/cx_freeze while it was still simple to work with. Back then there were still a few incompatibilities but I've followed the project throughout the years and it's mature like a fine wine since.
Nuitka looks like a traditional ahead-of-time compiler using SSA form[0] and is written in Python. I’d be interested in seeing performance comparisons with PyPy, which uses the second Futamura protection and is written in a dialect of Python.
This project is super impressive. It makes me wonder why there's not a similarly mature JavaScript AOT compiler implementation for use in game development or other places where you want performance better than a VM and you're not allowed to JIT.
A small recommendation for these release pages would be to link to the tag in whatever source host you're using! I just want to see the code.
The main slowdown in JavaScript is because it is inherently so dynamic — all function calls are "virtual," properties are string lookups, you need a lot of indirection (boxing) to store objects. Without actually running the program on some input, it is infeasible to determine what those indirections will resolve to. So static compilation is useless (or would produce instructions very similar to what an interpreter would execute). That's why we usually interpret the code (and JIT it to optimize places where the program is not actually dynamic).
Theoretically you could "precompile" JavaScript by running your code on some known input and caching the VM's state. Then you can "hydrate" your VM with that state. But this would only speed up the VM warmup (which is not that long) and requires significant browser/VM buy-in, so it is probably infeasible.
The only way to produce better performance in JavaScript is to restrict it to a less-dynamic subset (e.g. objects must have static keys) and add at least some static typing so that a compiler can resolve some references ahead of time. That's why asm.js was a thing. But asm.js has been superseded by WASM, which better in most regards.
Have you seen the work Fastly discussed about their AOT compiler? Seems like an interesting approach that maybe could be used to preprocess code into a more efficient form before running it in the cloud. The challenge is that JS engines are primarily focused on the browser and such an optimization opportunity isn’t interesting there - the majority of development is done by browser vendors.
> In case of microcontrollers, PXT programs are compiled in the browser to ARM Thumb assembly, and then to machine code, resulting in a file which is then deployed to the microcontroller, usually via USB mass-storage interface.
Looks like the only compilation strategy is directly to ARM Thumb assembly or did I miss something?
Because it will require tons of resources to make it more performant that the current JS JITs and it will probably fail. So what you see is some games just embedding V8.
If you're already using Python's type hints, I'd suggest checking out mypyc instead. If your code is already type-checked by mypy, you don't need to do much more than run `mypyc` to get a performance boost.
Tangentially related question: could Nuitka target WebAssembly, given that according to their overview page, they translate Python into C? Usually when it comes to Python -> WebAssembly, the biggest problem is the lack of GC in the WebAssembly spec (as far as I understand), and I'm wondering if this would be an issue for Nuitka as well...
Of course you can build a GC in WebAssembly. You might have to avoid the native stack and lose some performance that way, but that shouldn’t be too bad I think.
One problem is that this GC doesn’t interact with the browser’s GC in any way. So you have painful memory management interactions when (for example) a DOM event handler references a WebAssembly object which in turn holds a reference to a browser object, possibly with cycles (so simple reference counting isn’t enough).
You can fall back to manual memory management here, but that’s painful to use. To make this work seamlessly, you need a way to trace references across both heaps in one swoop. Last I checked, there was standardization work underway to enable that.
It would be neat if the big companies like Dropbox, Instagram, Google, Oracle, Shopify, Stripe, whatnot building better Python/JavaScript/Ruby implementations would start building program analysis libraries for Python/JavaScript/Ruby so that more implementations could get this for free.
For example, the homepage of Nuitka says they only just added support for constant folding and propagation. That's such low hanging fruit it's crazy they have to do that for themselves.
You can imagine generic libraries for Python/JavaScript/Ruby that will turn the AST alone into its most optimal form and then let some other backend worry about code generation or VM implementation.
An issue, though not an unsolvable one, is that you probably need a standardized representation of the AST to write these optimization passes against; it is not uncommon for compiler writers to disagree on what representation is the best for what purpose.
Then there is the question of optimizations that are (typically) easier/possible only at the code-generation phase.
All of this is solvable of course, but there needs to be some will to do so (or in the case of companies, enough commercial benefit).
What sort of web applications do you work on where Python interpreter speed is the limiting factor? Usually, those applications are constrained by network throughput and context switches and page faults.
This argument is so tiring. That's part of why websites are so slow despite having insane hardware at their disposal.
At my company, for a web app we run in production, we strive to get every response out of our infrastructure in under a millisecond, everything above that except for a few select endpoints is considered as a bug. By using sensible technology choices, it's not even that hard to do. A RDMS like postgres can answer to queries in microseconds.
Nice bonus, we can provide real time computation features our competitors could only dream of, just by not using dog-slow technology.
Our customers are not techies and you know what? When they use our product, the first comment is usually "wow, it's so fast".
If you are interested in speed you might get more from Cython (not to mistake it with CPython) or mypyc. The catch is that you need to specify types (in Cython you otherwise won't get speed gain, and mypyc will refuse to compile).
I remember reading an article somebody was describing that instead of even adopting project to work with Cython he/she outright started their project in Cython and found it beneficial.
I first found it back in 2015 when I worked at a company where we built a python based desktop application as a part of our industrial control system. Nuitka provided better performance then pyinstaller/cx_freeze while it was still simple to work with. Back then there were still a few incompatibilities but I've followed the project throughout the years and it's mature like a fine wine since.
* https://news.ycombinator.com/item?id=8771925 (2014, 135 comments)
* https://news.ycombinator.com/item?id=10994267 (2016, 52 comments)
* https://news.ycombinator.com/item?id=15354613 (2017, 60 comments)
[0]: https://nuitka.net/doc/developer-manual.html#ssa-form-for-nu...
"Python Interpreters Benchmarks"
A small recommendation for these release pages would be to link to the tag in whatever source host you're using! I just want to see the code.
Theoretically you could "precompile" JavaScript by running your code on some known input and caching the VM's state. Then you can "hydrate" your VM with that state. But this would only speed up the VM warmup (which is not that long) and requires significant browser/VM buy-in, so it is probably infeasible.
The only way to produce better performance in JavaScript is to restrict it to a less-dynamic subset (e.g. objects must have static keys) and add at least some static typing so that a compiler can resolve some references ahead of time. That's why asm.js was a thing. But asm.js has been superseded by WASM, which better in most regards.
https://github.com/bellard/quickjs/blob/master/quickjs-opcod...
The point isn't just not having a JIT. You can run v8 without a JIT. The point is good performance without a JIT.
Deleted Comment
https://arcade.makecode.com/
https://makecode.com/language
Looks like the only compilation strategy is directly to ARM Thumb assembly or did I miss something?
One problem is that this GC doesn’t interact with the browser’s GC in any way. So you have painful memory management interactions when (for example) a DOM event handler references a WebAssembly object which in turn holds a reference to a browser object, possibly with cycles (so simple reference counting isn’t enough).
You can fall back to manual memory management here, but that’s painful to use. To make this work seamlessly, you need a way to trace references across both heaps in one swoop. Last I checked, there was standardization work underway to enable that.
See :
https://github.com/pyodide/pyodide
And an example featuring a pure client side jupyter instance :
https://news.ycombinator.com/item?id=28377550
For example, the homepage of Nuitka says they only just added support for constant folding and propagation. That's such low hanging fruit it's crazy they have to do that for themselves.
You can imagine generic libraries for Python/JavaScript/Ruby that will turn the AST alone into its most optimal form and then let some other backend worry about code generation or VM implementation.
Then there is the question of optimizations that are (typically) easier/possible only at the code-generation phase.
All of this is solvable of course, but there needs to be some will to do so (or in the case of companies, enough commercial benefit).
The main goal is getting a standalone executable that works with c extensions.
At my company, for a web app we run in production, we strive to get every response out of our infrastructure in under a millisecond, everything above that except for a few select endpoints is considered as a bug. By using sensible technology choices, it's not even that hard to do. A RDMS like postgres can answer to queries in microseconds.
Nice bonus, we can provide real time computation features our competitors could only dream of, just by not using dog-slow technology.
Our customers are not techies and you know what? When they use our product, the first comment is usually "wow, it's so fast".
I remember reading an article somebody was describing that instead of even adopting project to work with Cython he/she outright started their project in Cython and found it beneficial.
The 2-3x improvements are on specific micro benchmarks.