If i'm understanding the repository correctly, it looks like each language reads from a file, does some I/O printing to console, then computes the value, then some more console printing and exits.
In my opinion, the comparisons could be better if the file I/O and console printing were removed.
I'm not sure why the contents of rounds.txt isn't just provided as some kind of argument instead of read in from a file. Given all the other boilerplate involved, I would have expected it to be trivial to add relevant templating.
I'm not too familiar with the JVM so perhaps I'm missing something here: how would that help? The file is tiny, just a few bytes, so I'd expect the main slowdown to come from system call overhead. With non-mmap file I/O you've got the open/read/close trio, and only one read(2) should be needed, so that's three expensive trips into kernel space. With mmap, you've got open/stat/mmap/munmap/close.
Memory-mapped I/O can be great in some circumstances, but a one-time read of a small file is one of the canonical examples for when it isn't worth the hassle and setup/teardown overhead.
- There's a stark jump of times going from ~200ms to ~900ms. (Rust v1.92.0 being an in-between outlier.)
- C# gets massive boost (990->225ms) when using SIMD.
- But C++ somehow gets slower when using SIMD.
- Zig very fast*!
- Rust got big boost (630ms->230ms) upgrading v1.92.0->1.94.0.
- Nim (that compiles to C then native via GCC) somehow faster than GCC-compiled C.
- Julia keeps proving high-level languages can be fast too**.
- Swift gets faster when using SIMD but loses much accuracy.
- Go fastest language with own compiler (ie not dependent to GCC/LLVM).
- V (also compiles to C) expected it (appearing similar) be close to Nim.
- Odin (LLVM) & Ada (GCC) surprisingly slow. (Was expecting them to be close to Zig/Fortran.)
- Crystal slowest LLVM-based language.
- Pure CPython unsurpassable turtle.
Curious how D's reference compiler (DMD) compares to the LLVM/GCC front-ends, how LFortran to gfortran, and QBE to GCC/LLVM. Also would like to see Scala Native (Scala currently being inside the 900~1000ms bunch).
* Note that uses `@setFloatMode(.Optimized)` which according to docs is equivalent to `--fast-math` but only D/Fortran use this flag (C/C++ do not).
** Uses `@fastmath` AND `@simd`. The comparison supposedly is for performance on idiomatic code and for Julia SIMD is a simple annotation applied to the loop (and Julia may even auto do it) but should still be noted because (as seen in C# example) it can be big.
Looking closer at the benchmarks, it seems that C# benchmark is not using AOT, so Go and even Java GraalVM get here an unfair advantage (when looking at the non SIMD versions). There is a non trivial startup time for JIT.
Sorry, I can't seem to edit my answer anymore, but I was mistaken, C# version is using AOT. But the are other significant differences here:
> var rounds = int.Parse(File.ReadAllText("rounds.txt"));
> var pi = 1.0D;
> var x = 1.0D;
> for (var i = 2; i < rounds + 2; i++) {
> x = -x;
> pi += x / (2 \* i - 1);
> }
> pi \*= 4;
> Console.WriteLine(pi);
For example, if we change the type of 'rounds' variable here from int to double (like it is also in Go version), the code runs significantly faster on my machine.
After seeing Swift's result I had to look into the source to confirm that yes, it was not written by someone who knows the language.
But this is a good benchmark results that demonstrate what performance level can you expect from every language when someone not versed in it does the code porting. Fair play
Startup time doesn’t seem to be factored in correctly, so any language that uses a bytecode (e.g. Java) or is compiling from source (e.g. Ruby, Python, etc.) will look poor on this. If the kids of applications that you write are ones that exit after a fraction of a second, then sure, this will tell you something. But if you’re writing server apps that run for days/weeks/months, then this is useless.
Python took 86 seconds, if I'm reading it correctly. I can see your point holding for a language like Java, but most of Python's time spent cannot have been startup time, but actual execution time.
Reading the fine print, the benchmark is not just the Leibniz formula like it says in the chart title. It also includes file I/O, startup time, and console printing:
> Why do you also count reading a file and printing the output?
> Because I think this is a more realistic scenario to compare speeds.
Which is fine, but should be noted more prominently. The startup time and console printing obviously aren't relevant for something like the Python run, but at the top of the chart where runs are a fraction of a second it probably accounts for a lot of the differences.
Running the inner loop 100 times over would have made the other effects negligible. As written, trying to measure millisecond differences between entire programs isn't really useful unless someone has a highly specific use case where they're re-running a program for fractions of a second instead of using a long-running process.
This sort of thing is pretty meaningless unless the code is all written by people who know how to get performance out of their languages (and they're allowed to do so). Did you use the right representation of the data? Did you use the right library? Did you use the right optimization options? Did you choose the fast compiler or the slow one? Did you know there was a faster or slower one? If you're using fancy stuff, did you use it right?
I did the same sort of thing with the Seive of Eratosthenes once, on a smaller scale. My Haskell and Python implementations varied by almost a factor of 4 (although you could argue that I changed the algorithm too much on the fastest Python one). OK, yes, all the Haskell ones were faster than the fastest Python one, and the C one was another 4 times faster than the fastest Haskell one... but they were still over the place.
It's true this is a microbenchmark and not super informative about "Big Problems" (because nothing is). But it absolutely shows up code generation and interpretation performance in an interesting way.
Note in particular the huge delta between rust 1.92 and nightly. I'm gonna guess that's down to the autovectorizer having a hole that the implementation slipped through, and they fixed it.
The delta there is because the Rust 1.92 version uses the straightforward iterative code and the 1.94-nightly version explicitly uses std::simd vectorization. Compare the source code:
> Note in particular the huge delta between rust 1.92 and nightly. I'm gonna guess that's down to the autovectorizer having a hole that the implementation slipped through, and they fixed it.
The benchmark also includes startup time, file I/O, and console printing. There could have been a one-time startup cost somewhere that got removed.
The benchmark is not really testing the Leibniz loop performance for the very fast languages, it's testing startup, I/O, console printing, etc.
The Clojure version is not AOT'd, so it's measuring startup + compiler time. When properly compiled it should be comparable to the Java implementation.
In my opinion, the comparisons could be better if the file I/O and console printing were removed.
Memory-mapped I/O can be great in some circumstances, but a one-time read of a small file is one of the canonical examples for when it isn't worth the hassle and setup/teardown overhead.
Dead Comment
- C++ unsurpassable king.
- There's a stark jump of times going from ~200ms to ~900ms. (Rust v1.92.0 being an in-between outlier.)
- C# gets massive boost (990->225ms) when using SIMD.
- But C++ somehow gets slower when using SIMD.
- Zig very fast*!
- Rust got big boost (630ms->230ms) upgrading v1.92.0->1.94.0.
- Nim (that compiles to C then native via GCC) somehow faster than GCC-compiled C.
- Julia keeps proving high-level languages can be fast too**.
- Swift gets faster when using SIMD but loses much accuracy.
- Go fastest language with own compiler (ie not dependent to GCC/LLVM).
- V (also compiles to C) expected it (appearing similar) be close to Nim.
- Odin (LLVM) & Ada (GCC) surprisingly slow. (Was expecting them to be close to Zig/Fortran.)
- Crystal slowest LLVM-based language.
- Pure CPython unsurpassable turtle.
Curious how D's reference compiler (DMD) compares to the LLVM/GCC front-ends, how LFortran to gfortran, and QBE to GCC/LLVM. Also would like to see Scala Native (Scala currently being inside the 900~1000ms bunch).
* Note that uses `@setFloatMode(.Optimized)` which according to docs is equivalent to `--fast-math` but only D/Fortran use this flag (C/C++ do not).
** Uses `@fastmath` AND `@simd`. The comparison supposedly is for performance on idiomatic code and for Julia SIMD is a simple annotation applied to the loop (and Julia may even auto do it) but should still be noted because (as seen in C# example) it can be big.
For the sub-second compiled languages, it's basically a benchmark of startup times, not performance in the hot loop.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
C# is using CoreCLR/NativeAOT. Which does not use GCC or LLVM also. Its compiler is more capable than that of Go.
But this is a good benchmark results that demonstrate what performance level can you expect from every language when someone not versed in it does the code porting. Fair play
Makes the benchmarks game 100 lines seem like major apps.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Deleted Comment
> Why do you also count reading a file and printing the output?
> Because I think this is a more realistic scenario to compare speeds.
Which is fine, but should be noted more prominently. The startup time and console printing obviously aren't relevant for something like the Python run, but at the top of the chart where runs are a fraction of a second it probably accounts for a lot of the differences.
Running the inner loop 100 times over would have made the other effects negligible. As written, trying to measure millisecond differences between entire programs isn't really useful unless someone has a highly specific use case where they're re-running a program for fractions of a second instead of using a long-running process.
I did the same sort of thing with the Seive of Eratosthenes once, on a smaller scale. My Haskell and Python implementations varied by almost a factor of 4 (although you could argue that I changed the algorithm too much on the fastest Python one). OK, yes, all the Haskell ones were faster than the fastest Python one, and the C one was another 4 times faster than the fastest Haskell one... but they were still over the place.
It's true this is a microbenchmark and not super informative about "Big Problems" (because nothing is). But it absolutely shows up code generation and interpretation performance in an interesting way.
Note in particular the huge delta between rust 1.92 and nightly. I'm gonna guess that's down to the autovectorizer having a hole that the implementation slipped through, and they fixed it.
https://github.com/niklas-heer/speed-comparison/blob/master/...
https://github.com/niklas-heer/speed-comparison/blob/master/...
The benchmark also includes startup time, file I/O, and console printing. There could have been a one-time startup cost somewhere that got removed.
The benchmark is not really testing the Leibniz loop performance for the very fast languages, it's testing startup, I/O, console printing, etc.