>The Ruby implementation has a subtle mistake which causes signficantly more work than it needs to.
To be fair, I do not think that is a "mistake" as such. I have written Ruby professionally for 6 years or so and have committed to several Ruby open source projects and haven't seen an innocus `nil` sitting at the end of a loop, to prevent array allocation.
The argument would be fair, if it wasn't idiomatic Ruby.
More like - knowing internals of a language will allow one to gain more performance out of it. That has been true for almost every programming language, but general speaking the goal of a VM based language is to not require that _specialized_ knowledge.
It's idiomatic Ruby in a very particular case that likely was explicitly chosen to demonstrate such a dramatic effect.
You're _usually_ not returning implicit arrays from loops in production code. Parallel assignments, when they're used, are almost always in the first line of an initialize method, not the returned line of an enumerable block.
I don’t think there’s any language, interpreted or otherwise, with the goal that knowing its internals won’t help you gain more performance. I mean, that would be nearly impossible.
Ruby code, perhaps more than most code, is written for readability and “beauty”. It’s a part of Ruby culture that I greatly appreciate. But if you care about performance, you will act differently, regardless of language. And the whole point of this code is to show that if you care about performance above all else, there’s of plenty of room to maneuver in interpreted Ruby.
That's an interesting question, how does the YJIT perform using the original code? Does it find the optimizations that it results in the same gain such that you don't actually need to personally know the optimization?
>I don’t think there’s any language, interpreted or otherwise, with the goal that knowing its internals won’t help you gain more performance.
It is, indeed, a fundamental goal of ruby that there are multiple ways to write the same thing, and that the programmer should not need to understand nuances of the compiler.
"I need to guess how the compiler works. If I'm right, and I'm smart enough, it's no problem. But if I'm not smart enough, and I'm really not, it causes confusion. The result will be unexpected for an ordinary person. This is an example of how orthogonality is bad." -matz 2003.
+1. I love golang, because for the most part, there is only 1 way to do something. With ruby, there are a billion ways to do the same thing, with some being slower than others.
I've just started learning Go as a very long time Rubyist. I really enjoy both languages for very different reasons. In Ruby, I can write code that really makes me happy to read. Enumerable is just wonderful. You can go a long way in Ruby without writing a single if statement. It's great. If I'm working on a solo-project, it's the language I'd choose every time. But working with inexperienced or people who "know" Ruby, but never adopted "the Ruby way" is a nightmare. Ruby code, written poorly, can be extremely brutal to follow. When the great deal of freedom Ruby offers isn't handled responsibly, a hot mess can ensue.
Go is the opposite. It's great, as you say, because it's dirt simple. It's a brutalist get-the-job-done kind of language, and I think if I were to start a company working with other engineers, I'd absolutely choose Go for that reason. It's easy to read. It's easy to reason about. And there's very little implicitness in it.
I had a very hot loop (runtime of 15 minutes) that I wanted to speed up. I profiled it over and over again in IntelliJ, identifying every possible allocation I could eliminate. When I was done there were zero allocations in the hot loop and the thing ran something like 4x faster than it had previously.
At that point, looking at the code, I realized that what I had written was very similar to how I'd have implemented it in Rust—I allocated a bunch of structs upfront and then "borrowed" them into the various parts of the algorithm. Since it was so close in style to Rust anyway, I decided to port it over and see if I could get any more performance out of it by being closer to the metal.
It turned out that the difference in performance between the Rust version and the Java version was statistically insignificant. I tried a few different optimization settings but didn't manage to get Rust to be any faster than Java's JIT.
This was eye opening to me. Now that most runtimes have JIT compilers, I suspect that far more important than choosing the right language is deeply understanding how the language you're working with works under the hood so you can eliminate hot spots and unnecessary allocations.
When you say Java's JIT, I presume you mean C2? I'd be curious to hear how it performs with Graal. If what you ran into was truly side effect free, either the OpenJDK or GraalVM teams would be interested in seeing your use case for further optimization. You're right, this isn't something an application developer should have to think about with a good JIT compiler in place, but JITs are complicated and having real world code samples that aren't performing well are incredibly useful to the compiler devs.
Note that there are plenty of Java JIT's to chose from, depending on the JVM implementation.
To pick on your example, back when Android used Dalvik as a plain interpreter, followed by a basic JIT method tracing, they implemented floating point math on native code.
YJIT and Ruby 3.3 have really impressed me as well. Their VM engineers are clearly doing something right.
Related to Ruby perf, I still hear folks worried about rails “not being able to scale”. Let me say something controversial (and clearly wrong): Rails is the only framework that has proven it _can_ scale. GitHub, Shopify, AirBnb, Stripe all use rails and have scaled successfully. Very few other frameworks have that track record.
There’s plenty of reasons to not use rails, but scaling issues doesn’t feel like a strong one to me.
But, for the sake of truth:
- AirBnB migrated from Rails to a micro-services architecture (which I think, they regretted doing too early - I read that somewhere I believe)
- Stripe never used Rails: they use Ruby (and Sinatra for the Web part - i.e. dashboard).
But it's true that Github and Shopify both use and scaled Rails monoliths. There are showing the way :)
As the author of the library referenced in the linked post, I'd like to add a small clarification, in that the example in the README.md was not one specifically cherry-picked to demonstrate an unrealistic advantage, but was chosen simply because it's a widely recognized example of a CPU bound algorithm. In reality, this example actually does more to demonstrate one of the major weaknesses of this library, which is the significant overhead exhibited by the FFI interface between Ruby and Crystal for trivial operations.
This particular example crossed this painful divide 1 million times. I found it interesting that despite this disadvantage, the Crystal implementation was still able to take the lead over my identical, naively written Ruby implementation (warts and all). As the author of this post points out, for trivial operations crossing the interface at high frequency, finely tuned Ruby will easily take the lead!
That said, I still believe there are times where having the ability to write and interface with a performant, precompiled language (that is somewhat familiar to the average Rubyist) in an ergonomic way that avoids the need to context switch can be beneficial.
Sure, performance is unlikely to match a finely tuned (but arguably more difficult to maintain) C or Rust extension and ergonomics are unlikely to match an approach that sticks to pure Ruby, but it exposes a new middle ground, which at times, may just hit the right spot!
I'd imagine realistic examples of where this type of library could be useful might include:
- Providing an easy way to expose and use high-quality Crystal shards from within your Ruby program.
- Allowing you to easily write performant CPU or memory-intensive procedures for which reusable native libraries do not exist, and where the majority of the overall execution time can be spent within Crystal.
- As a way to glue several different smaller Crystal shared objects together into a single application using Ruby glue code, allowing you to avoid some of the high compile times you might typically see with a large monolithic binary.
I would definitely not suggest this library has any business:
- Blindly replacing swaths of Ruby methods, without any tangible performance metrics to back this decision.
- Replacing code that is already highly performant in pure Ruby (whether that's code that lends itself well to being JIT'd, is backed by an existing native library etc.)
Funnily enough, if you take a look at the commit history of the project, you'll notice that last week I actually replaced the referenced example with one that better demonstrates a performance difference (even compared against YJIT) and crosses the FFI divide only once. This came as a result of having to introduce a Reactor to get the library to play nice in multi-threaded Ruby applications, which regrettably added even more overhead to the FFI interface and further hammers home the point that this library is not going to perform well in cases where you need to jump between Crystal and Ruby at high frequency.
No, it's not. Maybe only faster than when I bash it jokingly, using hyperbole (it won't take eternity for Ruby to do things that C# computes in a moment, only a half of it)
On the content of the post:
"Now it’s Ruby that’s 5 times faster than Crystal!!! And 20x faster than our original version. Though most likely that’s some cost from the FFI, or something similar, though that does seem like a surprising amount of overhead."
There are tools to provide a definitive answer to this, and no, FFI is not a silver bullet solution to slowness of interpreted (or JIT compiled but dynamically typed) languages.
Honestly it’s been my experience with Ruby that it’s Rails that can potentially be slow. Ruby is quite fast and even has a JIT option. Rails is by design opinionated, and for some cases I’ve found that I’ve had to work extra hard to ensure performance. That means refactoring code in slightly non traditional ways and having a deeper understanding of how Rails works under the hood (esp in the ORM). So if you think you can just use Ruby+Rails out of the box in its simplest form without experience and depth of understanding: yes it might be slow. But like with all things, you can go quite far with care and experience.
It’s not even that rails is necessarily slow, it’s more the way you use it that is slow. If you tie all your businesses logic to the database and commit complicated changes in transactions, sure it will be terribly slow.
What’s wrong with complicated (I’m not sure what that means - large numbers of rows updated? Disparate rows updated?) transactions? Depending on your RDBMS (and what it’s running on, config options, etc) this may or may not be slow.
To be fair, I do not think that is a "mistake" as such. I have written Ruby professionally for 6 years or so and have committed to several Ruby open source projects and haven't seen an innocus `nil` sitting at the end of a loop, to prevent array allocation.
The argument would be fair, if it wasn't idiomatic Ruby.
More like - knowing internals of a language will allow one to gain more performance out of it. That has been true for almost every programming language, but general speaking the goal of a VM based language is to not require that _specialized_ knowledge.
It's idiomatic Ruby in a very particular case that likely was explicitly chosen to demonstrate such a dramatic effect.
You're _usually_ not returning implicit arrays from loops in production code. Parallel assignments, when they're used, are almost always in the first line of an initialize method, not the returned line of an enumerable block.
Ruby code, perhaps more than most code, is written for readability and “beauty”. It’s a part of Ruby culture that I greatly appreciate. But if you care about performance, you will act differently, regardless of language. And the whole point of this code is to show that if you care about performance above all else, there’s of plenty of room to maneuver in interpreted Ruby.
It is, indeed, a fundamental goal of ruby that there are multiple ways to write the same thing, and that the programmer should not need to understand nuances of the compiler.
"I need to guess how the compiler works. If I'm right, and I'm smart enough, it's no problem. But if I'm not smart enough, and I'm really not, it causes confusion. The result will be unexpected for an ordinary person. This is an example of how orthogonality is bad." -matz 2003.
Go is the opposite. It's great, as you say, because it's dirt simple. It's a brutalist get-the-job-done kind of language, and I think if I were to start a company working with other engineers, I'd absolutely choose Go for that reason. It's easy to read. It's easy to reason about. And there's very little implicitness in it.
It's certainly easy to think of situations where they do matter, but unless your project is FaaS (Fibonacci As A Service) probably not.
Are you aware that you're referencing the Python mantra with that? Feel free to Google it, it's from 2004.
There should be one-- and preferably only one --obvious way to do it.
I had a very hot loop (runtime of 15 minutes) that I wanted to speed up. I profiled it over and over again in IntelliJ, identifying every possible allocation I could eliminate. When I was done there were zero allocations in the hot loop and the thing ran something like 4x faster than it had previously.
At that point, looking at the code, I realized that what I had written was very similar to how I'd have implemented it in Rust—I allocated a bunch of structs upfront and then "borrowed" them into the various parts of the algorithm. Since it was so close in style to Rust anyway, I decided to port it over and see if I could get any more performance out of it by being closer to the metal.
It turned out that the difference in performance between the Rust version and the Java version was statistically insignificant. I tried a few different optimization settings but didn't manage to get Rust to be any faster than Java's JIT.
This was eye opening to me. Now that most runtimes have JIT compilers, I suspect that far more important than choosing the right language is deeply understanding how the language you're working with works under the hood so you can eliminate hot spots and unnecessary allocations.
To pick on your example, back when Android used Dalvik as a plain interpreter, followed by a basic JIT method tracing, they implemented floating point math on native code.
Nowadays the JIT takes care of it,
https://developer.android.com/reference/android/util/FloatMa...
Unfortunely it has taken us several decades to catch up to Lisp and BASIC (the original Dartmouth BASIC), were already offering in the 1970's.
Related to Ruby perf, I still hear folks worried about rails “not being able to scale”. Let me say something controversial (and clearly wrong): Rails is the only framework that has proven it _can_ scale. GitHub, Shopify, AirBnb, Stripe all use rails and have scaled successfully. Very few other frameworks have that track record.
There’s plenty of reasons to not use rails, but scaling issues doesn’t feel like a strong one to me.
But, for the sake of truth: - AirBnB migrated from Rails to a micro-services architecture (which I think, they regretted doing too early - I read that somewhere I believe) - Stripe never used Rails: they use Ruby (and Sinatra for the Web part - i.e. dashboard).
But it's true that Github and Shopify both use and scaled Rails monoliths. There are showing the way :)
Citation? Seems like a pretty extraordinary claim.
Deleted Comment
This particular example crossed this painful divide 1 million times. I found it interesting that despite this disadvantage, the Crystal implementation was still able to take the lead over my identical, naively written Ruby implementation (warts and all). As the author of this post points out, for trivial operations crossing the interface at high frequency, finely tuned Ruby will easily take the lead!
That said, I still believe there are times where having the ability to write and interface with a performant, precompiled language (that is somewhat familiar to the average Rubyist) in an ergonomic way that avoids the need to context switch can be beneficial. Sure, performance is unlikely to match a finely tuned (but arguably more difficult to maintain) C or Rust extension and ergonomics are unlikely to match an approach that sticks to pure Ruby, but it exposes a new middle ground, which at times, may just hit the right spot!
I'd imagine realistic examples of where this type of library could be useful might include:
- Providing an easy way to expose and use high-quality Crystal shards from within your Ruby program.
- Allowing you to easily write performant CPU or memory-intensive procedures for which reusable native libraries do not exist, and where the majority of the overall execution time can be spent within Crystal.
- As a way to glue several different smaller Crystal shared objects together into a single application using Ruby glue code, allowing you to avoid some of the high compile times you might typically see with a large monolithic binary.
I would definitely not suggest this library has any business:
- Blindly replacing swaths of Ruby methods, without any tangible performance metrics to back this decision.
- Replacing code that is already highly performant in pure Ruby (whether that's code that lends itself well to being JIT'd, is backed by an existing native library etc.)
Funnily enough, if you take a look at the commit history of the project, you'll notice that last week I actually replaced the referenced example with one that better demonstrates a performance difference (even compared against YJIT) and crosses the FFI divide only once. This came as a result of having to introduce a Reactor to get the library to play nice in multi-threaded Ruby applications, which regrettably added even more overhead to the FFI interface and further hammers home the point that this library is not going to perform well in cases where you need to jump between Crystal and Ruby at high frequency.
On the content of the post: "Now it’s Ruby that’s 5 times faster than Crystal!!! And 20x faster than our original version. Though most likely that’s some cost from the FFI, or something similar, though that does seem like a surprising amount of overhead."
There are tools to provide a definitive answer to this, and no, FFI is not a silver bullet solution to slowness of interpreted (or JIT compiled but dynamically typed) languages.
Deleted Comment