An anecdote from a non-JS JIT, but similar: I once spent a summer working on a game engine with a couple others where the host language was LuaJIT.
It started out great. The iteration cycles were incredibly short, performance was competitive, the code was portable, and we could do wild metaprogramming nonsense that was still fast. If you haven't worked with LuaJIT, its C FFI is also incredible!
As we started scaling though, the wheels fell off the wagon. We'd add a new feature and suddenly the game wouldn't run at interactive framerates. One time, it was the 'unpack' function, which would trigger a JIT trace abort. We would drop from 12ms frames to 100ms frames. I wrote a length-specialized version that didn't abort and moved on.
Another time, it was calling Lua's 'pairs' method (iterator over a map). Okay, so we can't do that, or a few other things that made Lua productive before.
The other problem we hit was GC predictability being impossible. We tried to mitigate it by using native data structures through the C FFI, taking control over the GC cycle to run it once or twice per frame, etc. In the end, like the JIT problem, we weren't writing Lua at the end, we were writing... something else. It wasn't maintainable.
That summer ruined dynamic languages for me. I didn't really want to be writing C or C++ at the time. I ended up picking up Rust, which was predictable and still felt high-level, and the Lua experience ended up getting me my current job.
Thanks for the story details, quite interesting. In the end this was unfortunately a case of picking the wrong tool for the job. Don’t use JITed / GC languages when you‘ve got hard realtime requirements.
Don’t build a datastore on the JVM if you care about tail latencies, you‘ll be fighting the GC forever (see Cassandra). Don’t rely on auto-vectorisation in your inner loop if possible, one tiny change could bring that house of cards crashing down.
I‘d be interested in how your team ended up picking that tech stack. Was it a „rational“ weighing of options with pros and cons? Was it „eh it‘ll be alright“? Was it personal preference and/or prior experience?
I'm working with Lua right now(gopherlua) as a scripting option for real-time gaming. I've done similar things to your story in the past with trying to make Lua the host for everything and I'm well aware of the downsides, but I have a requirement of maintaining readable, compatible source(as in PICO-8's model) - and Lua is excellent at that, as are other dynamic languages, to the point where it's hard to consider anything else unless I build and maintain the entire implementation. So my mitigation strategy is to do everything possible to make the Lua code remain in the glue code space, which means that I have to add a lot of libaries.
I'm also planning to add support for tl, which should make things easier on the in-the-large engineering side of things - something dynamic languages are also pretty awful at.
You might still run into GC problems, but none of the Go-based Luas (built on Go, rather than binding to another Lua library) I am aware of have a JIT built in.
GP is talking about LuaJIT, you're talking about Lua. Lua has lower performances but should be completely predictable, GC aside (not sure what its GC scheme is), so it's a very different situation.
The OpenJDK JVM (aka Hotspot) addresses both issues: control [1] and monitoring [2] (there are built-in compilation and deoptimization events emitted to the event stream). You can also compile methods in advance [3], and inspect the generated machine code when benchmarking [4]. You can even compile an entire application ahead-of-time [5] to produce a native binary.
The article [incorrectly] equates all JITs with the author's experience with V8. The article even states "faster than python, but slower than Java" which makes no sense because Java is a JITted language.
True, but few people care with ZGC [1] well on its way to having worst-case latencies of under 1ms (as soon as this year) on heaps up to 16TB in size. We're getting to the point where jitter due to GC is no larger than jitter introduced by the OS, so the only real cost is footprint.
This article has a number of issues. JS with JIT is waaay faster than Python. Not “between python and java” as purported. Second, generalizing jits as “un-ergonomic” seems silly given that what’s being specifically looked at is benchmarking. But what makes this claim ridiculous is that nothing is easy to benchmark. Even native code is hard to profile and this is literally my day job. If the JIT makes your code that much faster, this strikes me as a pretty suspect complaint
I think that by “between python and java” they meant “faster than python and slower than java”. I think Java still beats JS unless you get lucky.
You’re totally right that benchmarking and profiling is hard even for native code. I think this post fetishizes whether or not a piece of code got JITed a little too much. Maybe the author had a bad time with microbenchmarks. There’s this anti pattern in the JS world to extract a small code sample into a loop and see how fast it goes - something that C perf hackers usually know not to do. That tactic proves especially misleading in a JIT since JITs salivate at the sight of loops.
> I think that by “between python and java” they meant “faster than python and slower than java”. I think Java still beats JS unless you get lucky.
Yeah, that is what they meant, but it is a little misleading. Javascript's performance is usually within a single digit multiple of java's, whereas python is often significantly slower. Javascript is somewhere between Java and an abacus too.
That was not a general statement about Javascript performance. The entire article is about the unpredictability of the JIT. When the JIT hits a bad codepath then it really does perform like python or when it hits a good code path (most of the time) then it performs like java. This unpredictability is what is causing the issues, not the general performance.
It is, but that’s not saying a lot and it’s still far far slower than Go/Java/C#, and it also isn’t compatible with lots of existing Python code. Such as anything that would like to talk to a Postgres database. Pypy is great, but it can’t fix a broken ecosystem and general lack of leadership in the Python community. :(
(Broken ecosystem = lots of things don’t work with lots of other things; package managers are another example; Mypy another)
>JS with JIT is waaay faster than Python. Not “between python and java” as purported.
You are correct that JS is not between Python and Java. Python is faster than Java which is faster than JS. Though some people seem to think calling APIs written in C is "not Python" but if the ecosystem provides the library and I call it from Python then it's Python enough for me!
What you're implying is that the "speed" of a language is purely defined by the speed of the C FFI? This view is absolutely ridiculous and I don't think you would find a single other programmer on this green earth that would agree with you.
As someone who's been writing a lot of JavaScript, Go, and a handful of other languages for a while, I feel this. In Go, I can basically know what's going to happen when I write a function. This operation will read from the stack, these instructions will be run, and I can take a peek at the assembly if I'm not sure (though I've developed a pretty good feel for what Go will do without needing that). I can benchmark it and know that the performance I see on my machine will be the performance when I ship this bit of functionality into production, barring hardware differences.
In JavaScript, it's a black box. I know some constructs might deoptimize functions when run on Wednesdays because I read them on a blog published in 2018 that's _probably_ still accurate. In my benchmark running on Node 12.14.1 on Windows this seems to be true. But then who knows if it'll be the same thing in production, and it might 'silently' change later on.
JavaScript in V8 is incredibly fast these days, but I find it much easier to write optimal code in Go.
It's really no different from native compiler optimizations, which are also mysterious and always changing, except for one key aspect: when recording timings to compare against other timings, you can control whether optimizations are turned on or off, to remove that variable from the comparison.
Thankfully, with a native compiler you don't have to deal with your JIT warming up, or some other code causing your function to get deoptimized, or subtle JIT trace aborts. :(
Javascript lets you work at a pretty high level, and tries to fill in a reasonable implementation, but you can't be sure how its choices will perform. Go (like C) forces you to dictate the exact implementation in painful detail without abstracting away much of anything, which might be a reasonable tradeoff for very hot paths but not the entire system.
This seems a little unfair, given while the compiler is an even bigger pedant than I am, Rust does allow you to abstract away quite a lot of stuff and still get performant results.
> if your economics are such that servers are a bigger cost than payroll
Sorry, and I may be oversimplifying the author's situation, but this really sounds like a case where you need to not be using JS for your server. On the client you don't have much choice, but on the client the pure-JS performance rarely gets tight enough to warrant this degree of micro-optimization work.
Author makes some good points - it would be great if the JIT were more profiler-friendly - but I have to question a little bit how important it actually is, the way the use-cases line up.
The profiling and tooling integration of Intellij with Java has probably been the biggest thing keeping me from using another ecosystem for my personal hacking
When someone realized you pay a lot for cycles at the server, but you pay nothing for cycles at the client, that was the moment the world-wide web began to die.
This is baseless mudslinging. The bottleneck on the client is virtually never the cycles consumed by the actual running of the actual app's JS code. It's usually, in order starting with the most common:
- Piles of ads/analytics scripts which have no motivation not to slow down the page
- Reflow; i.e. needlessly many elements on the page causing the browser to do extra work calculating layout
- The initial loading and JIT-ing time of a needlessly heavy JS bundle
A JIT is just another cache, like memory. Yes, it’s hard to predict, but not fundamentally that different from caching in any language. It does mean perf tests have to be end-to-end and match real-world loads, but it doesn’t mean it’s “impossible” at all, it means you need to measure.
Is this a real problem? I’ve been profiling my JS for years and never actually run into a mysterious problem where some important code I profiled was way way slower in prod than when I was profiling. Has that happened for you? How often does this happen? I take it as an assumption that profiling is something you mostly do on inner loops & hot paths in the first place. I mean, I profile everything to look for bottlenecks, but I don’t spend much time optimizing the cold paths.
> Get notified about deopts in hot paths
Studying the reasons for de-opts help you know in advance when it might happen. If you avoid those things, do-opts won’t happen, and you don’t need notifications.
For example, ensure you don’t modify/add/delete keys in any objects, make sure your objects are all the same shape in your hot path, don’t change the type of any properties, and you’re like 90% there, right?
> statically require that a given section can & does get optimized [...] compile likely sections in advance to skip warmup
While these don’t exist in V8, it’s maybe worth mentioning that the Google Closure compiler does help a little bit, it ensures class properties are present and initialized, which can help avoid de-opts.
Hey Node/bluebird person here: You want to run Node with --teace-opt and --trace-deopt and --allow-natives-syntax with %OptimizeFunctionOnNextCall before benchmarking.
It started out great. The iteration cycles were incredibly short, performance was competitive, the code was portable, and we could do wild metaprogramming nonsense that was still fast. If you haven't worked with LuaJIT, its C FFI is also incredible!
As we started scaling though, the wheels fell off the wagon. We'd add a new feature and suddenly the game wouldn't run at interactive framerates. One time, it was the 'unpack' function, which would trigger a JIT trace abort. We would drop from 12ms frames to 100ms frames. I wrote a length-specialized version that didn't abort and moved on.
Another time, it was calling Lua's 'pairs' method (iterator over a map). Okay, so we can't do that, or a few other things that made Lua productive before.
The other problem we hit was GC predictability being impossible. We tried to mitigate it by using native data structures through the C FFI, taking control over the GC cycle to run it once or twice per frame, etc. In the end, like the JIT problem, we weren't writing Lua at the end, we were writing... something else. It wasn't maintainable.
That summer ruined dynamic languages for me. I didn't really want to be writing C or C++ at the time. I ended up picking up Rust, which was predictable and still felt high-level, and the Lua experience ended up getting me my current job.
Don’t build a datastore on the JVM if you care about tail latencies, you‘ll be fighting the GC forever (see Cassandra). Don’t rely on auto-vectorisation in your inner loop if possible, one tiny change could bring that house of cards crashing down.
I‘d be interested in how your team ended up picking that tech stack. Was it a „rational“ weighing of options with pros and cons? Was it „eh it‘ll be alright“? Was it personal preference and/or prior experience?
OpenJDK's ZGC [1] is well on its way to have worst-case latencies of under 1ms (as soon as this year) on heaps up to 16TB in size.
[1]: https://wiki.openjdk.java.net/display/zgc/Main
I've hit fewer snags with LuaJIT, but they're definitely there. Really wish Mike Pall had written that hyperblock scheduler before retiring...
I'm also planning to add support for tl, which should make things easier on the in-the-large engineering side of things - something dynamic languages are also pretty awful at.
[1]: https://docs.oracle.com/en/java/javase/14/vm/compiler-contro...
[2]: https://docs.oracle.com/en/java/javase/14/jfapi/why-use-jfr-...
[3]: https://docs.oracle.com/en/java/javase/14/docs/specs/man/jao...
[4]: http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html
[5]: https://www.graalvm.org/docs/reference-manual/native-image/
[1]: https://wiki.openjdk.java.net/display/zgc/Main
The real problem with the GC is that it leads people to believe that they don't need to be considerate about how they are using memory.
You’re totally right that benchmarking and profiling is hard even for native code. I think this post fetishizes whether or not a piece of code got JITed a little too much. Maybe the author had a bad time with microbenchmarks. There’s this anti pattern in the JS world to extract a small code sample into a loop and see how fast it goes - something that C perf hackers usually know not to do. That tactic proves especially misleading in a JIT since JITs salivate at the sight of loops.
Yeah, that is what they meant, but it is a little misleading. Javascript's performance is usually within a single digit multiple of java's, whereas python is often significantly slower. Javascript is somewhere between Java and an abacus too.
(Broken ecosystem = lots of things don’t work with lots of other things; package managers are another example; Mypy another)
You are correct that JS is not between Python and Java. Python is faster than Java which is faster than JS. Though some people seem to think calling APIs written in C is "not Python" but if the ecosystem provides the library and I call it from Python then it's Python enough for me!
In JavaScript, it's a black box. I know some constructs might deoptimize functions when run on Wednesdays because I read them on a blog published in 2018 that's _probably_ still accurate. In my benchmark running on Node 12.14.1 on Windows this seems to be true. But then who knows if it'll be the same thing in production, and it might 'silently' change later on.
JavaScript in V8 is incredibly fast these days, but I find it much easier to write optimal code in Go.
Go’s compiler is kind of stupid in this sense.
You can actually look at the assembly generated by V8 and you can trace de-opts. The workflow is just so awful, you're not going to want to do it.
Sorry, and I may be oversimplifying the author's situation, but this really sounds like a case where you need to not be using JS for your server. On the client you don't have much choice, but on the client the pure-JS performance rarely gets tight enough to warrant this degree of micro-optimization work.
Author makes some good points - it would be great if the JIT were more profiler-friendly - but I have to question a little bit how important it actually is, the way the use-cases line up.
- Piles of ads/analytics scripts which have no motivation not to slow down the page
- Reflow; i.e. needlessly many elements on the page causing the browser to do extra work calculating layout
- The initial loading and JIT-ing time of a needlessly heavy JS bundle
Deleted Comment
Byte code interpreters do that? That would be surprising. Programs are represented with 'lines' in byte code?
Things he wants, let's look at SBCL, a Common Lisp implementation (see http://sbcl.org ):
> compile short bits to native code quickly at runtime -> that's done by default
> identify whether a section is getting optimized -> we tell the compiler and the compiler gives us feedback on the optimizations performed or missed
> know anything about the native code that’s being run in a benchmark -> we can disassemble it, some information can be asked
> statically require that a given section can & does get optimized -> done via declarations and compilation qualities
> compile likely sections in advance to skip warmup -> by default
> bonus: ship binaries with fully-compiled programs -> dump the data (which includes the compiled code) to an executable
A JS frontend for SBCL could be nice...
Why should we need to hand compile special versions of JavaScript VMs, just to get (decompile ....)?
kdb+/q has a bytecode compiler and each line is turned into bytecode and run, line by line. Also the performance is as good as C, most of the time.
Is this a real problem? I’ve been profiling my JS for years and never actually run into a mysterious problem where some important code I profiled was way way slower in prod than when I was profiling. Has that happened for you? How often does this happen? I take it as an assumption that profiling is something you mostly do on inner loops & hot paths in the first place. I mean, I profile everything to look for bottlenecks, but I don’t spend much time optimizing the cold paths.
> Get notified about deopts in hot paths
Studying the reasons for de-opts help you know in advance when it might happen. If you avoid those things, do-opts won’t happen, and you don’t need notifications.
For example, ensure you don’t modify/add/delete keys in any objects, make sure your objects are all the same shape in your hot path, don’t change the type of any properties, and you’re like 90% there, right?
> statically require that a given section can & does get optimized [...] compile likely sections in advance to skip warmup
While these don’t exist in V8, it’s maybe worth mentioning that the Google Closure compiler does help a little bit, it ensures class properties are present and initialized, which can help avoid de-opts.