Since Ruby 3, the automatic coercion of keywords to a hash—the second example underneath "Passing Hash to Functions" in this post—is considered a legacy style and is generally frowned upon in new code. That is to say, code like the second call to `foo` here:
One of the strongest arguments for avoiding this sugar is that it makes the code more brittle in the face of future changes. In particular, in the example above, if we add a new keyword argument to `foo`, then any call which omitted the curly braces will break, while calls which used them will keep working fine:
# added a new keyword arg here
def foo(kwargs = {}, frob: false)
kwargs
end
foo({k: 1}) # still ok: `frob` defaults to false
foo(k: 1) # ArgumentError: no keyword: :k
However, the application utility of a self-populating lazily-evaluated lookup structure goes further. A hash with a default function works great as a simple caching wrapper for all manner of results, for example when talking to slow or fine-grained APIs.
It is just a hash map with a few common functions defined; hash maps are occasionally useful, but what is all the praise about?
The mentioned "simple" bit is arguable: as a language's building block, a hash map is relatively complex and specific, since those can be built out of lists (or trees, though lists/arrays may be preferable for efficiency), which can be built out of pairs (product types, cons cells, tuples; unless going for that efficiency, though it can still be pretty efficient with trees). Maybe it is one of those "simple versus easy" mix-ups.
Well yeah, it's "simple" (or easy) to use, definitely not simple to implement. But having an easy-to-use hashmap is par for the course for most newer languages - not only Ruby, but also PHP (associative arrays), JS (objects), Go (built-in map type) etc.
This is a lovely overview. Hash is a great example of how delightful it can be to program in Ruby.
One more technique worth noting is the chained functional style of using Hash, which you can do in Ruby because Hash inherits from Enumerable. If you're prototyping a script to do some data-cleaning, this makes it easy to build up your pipeline and iterate on it. For example:
foobar = { ...your data here... }
foobar.map do |k, v|
# ...
# do some transformation here
# ...
# and then return the replacement key/value for this entry
[key, new_value]
end.select do |k, v|
# do some filtering here, e.g.:
is_foobarable?(k, v)
end.map do |k, v|
# ...
# do some expensive transformation here on the smaller data set
# ...
[key, newer_value]
end.to_h
(Note that you have to call #to_h at the end since the Enumerable functions will coerce the hash into an array of arrays.)
Now your code literally shows the pipeline that your data is falling through — and each of these steps can be side-effect-free, with no mutations to the original foobar structure.
Unless things have changed and Ruby has stream fusion now, this is bad advice for scale. You are iterating over a fat object multiple times. Even if its uglier its much better in this case to create an empty array/hash, iterate over with #each and #<< to the hash.
I worked at the largest Rails shop in the world and this would be rejected in code review.
Edited to add more detail: the only method you need to write to implement Enumerable is #each. Every step of your pipeline here is _another_ call to #each. Just do it once.
> I worked at the largest Rails shop in the world and this would be rejected in code review.
Not sure if this means GitHub or Shopify. Until earlier this year I worked at GitHub for a decade, leaving as a principal engineer, primarily writing Ruby.
This would not be rejected at code review there unless the Hash had e.g. millions of values and, even then, it might not be a meaningful performance problem in context.
If the Hash is very small and will always be: readability trumps Big-O "performance" when n is very small.
Am I being pedantic and appealing to authority? Yup but, well, you started it and I hate to see helpful "Ruby is nice" comments like the grandparent get crapped on for no good reason.
There's #lazy to turn things into a lazy enumerator, to be iterated over when you so desire with e.g #force.
If you're going to iterate over an accumulator variable, use the for keyword instead of each, it's faster.
Alternatively, one can use .reduce({}) { |h, (k, v)| ... h } or .each.with_object({}) { |(k, v), h| ... } which makes the block not close over an external variable, and makes the assignment to that variable "atomic" (wrt the hash construction, the variable will only contain the final result, that is if a final variable is needed at all, which it may not with implicit return of the last value)
I wish it was typed though. So many times I’ve seen a function that takes a hash of “options” or “config” and have no idea what that actually contains. Even for official rails methods it’s often complex to know what the possible options are. Some of them seem almost internal with how obscure they are.
RBS+steep to the rescue! We typed our configuration this way for ddtrace Ruby. On the external (set) side it makes it very easy to explore configuration in an editor, on the internal (get) side it makes us ensure we don't make mistakes.
I've been in Python/Django for about a year now and I really miss Ruby's hash vs the dict.
`.dig(:key, :key, :etc)` is so nice to find deeply nested data without blowing up.
One thing I don't miss is knowing whether a hash's keys are strings vs symbols. While it's easily solvable, I've definitely lost time only to smack myself that I need to use a str but was always feeding a sym and swore that this should be a sym based hash.
I don't know ruby and if it's the same, but if you're not using a library like funcy or toolz for a nested get helper, you can do `dict.get('key', {}).get('key2', {}).get('key3')`. Not the prettiest, but can do in a pinch.
It should be noted that ** operator works like .merge, and also accounts for the order of key definition in a given hash declaration (whatever is declared earlier gets overwritten if a key with same name is used in the same hash declaration later).
Hash is so powerful in Ruby that people often overuse them.
One of the most common issues I found on Ruby code-bases is to not create classes to represent their domain and simply use hashes everywhere.
The downside is that a hash has no shape. It can (and will) be anything you want it to be, often causing havoc once the system grows.
Checks for keys everywhere. Almost all statements use the safe navigation because you never know what shape you're dealing with. Multiple places performing the same map/reduce/filter/etc. All because people stick to hashes a bit too long.
Amen. This is an issue at the company I work at. Common typos when looking up has keys will return nil - this has a tendency to silently keep working and blow up with a runtime error further down the chain. I am trying to insist on using .fetch to force an exception.
When the company switches to 3.2 I will insist on everyone using the new Data class for value objects rather than hashes.
This is perhaps the biggest killer feature of typescript: that your object literals (which are basically hashes) can have types. Or possibly interfaces.
Deleted Comment
a = { 1 => 'a', 2 => 'b' }
[1, 2, 3].map(&a)
#=> ['a', 'b', nil]
fib = Hash.new {|hash, key| hash[key] = key < 2 ? key : hash[key-1] + hash[key-2] }
Example: fib[123] # => 22698374052006863956975682
Makes use of memoization.
Deleted Comment
But now I'm wondering where that would be a better solution than just using the `Hash#values_at` method...?
The mentioned "simple" bit is arguable: as a language's building block, a hash map is relatively complex and specific, since those can be built out of lists (or trees, though lists/arrays may be preferable for efficiency), which can be built out of pairs (product types, cons cells, tuples; unless going for that efficiency, though it can still be pretty efficient with trees). Maybe it is one of those "simple versus easy" mix-ups.
One more technique worth noting is the chained functional style of using Hash, which you can do in Ruby because Hash inherits from Enumerable. If you're prototyping a script to do some data-cleaning, this makes it easy to build up your pipeline and iterate on it. For example:
(Note that you have to call #to_h at the end since the Enumerable functions will coerce the hash into an array of arrays.)Now your code literally shows the pipeline that your data is falling through — and each of these steps can be side-effect-free, with no mutations to the original foobar structure.
I worked at the largest Rails shop in the world and this would be rejected in code review.
Edited to add more detail: the only method you need to write to implement Enumerable is #each. Every step of your pipeline here is _another_ call to #each. Just do it once.
Not sure if this means GitHub or Shopify. Until earlier this year I worked at GitHub for a decade, leaving as a principal engineer, primarily writing Ruby.
This would not be rejected at code review there unless the Hash had e.g. millions of values and, even then, it might not be a meaningful performance problem in context.
If the Hash is very small and will always be: readability trumps Big-O "performance" when n is very small.
Am I being pedantic and appealing to authority? Yup but, well, you started it and I hate to see helpful "Ruby is nice" comments like the grandparent get crapped on for no good reason.
If you're going to iterate over an accumulator variable, use the for keyword instead of each, it's faster.
Alternatively, one can use .reduce({}) { |h, (k, v)| ... h } or .each.with_object({}) { |(k, v), h| ... } which makes the block not close over an external variable, and makes the assignment to that variable "atomic" (wrt the hash construction, the variable will only contain the final result, that is if a final variable is needed at all, which it may not with implicit return of the last value)
One tiny tip.. :) you can pass a block to `.to_h`, so instead of using `.map` + `.to_h`:
.. it can be simplified to:Methods like "except" (https://docs.ruby-lang.org/en/3.2/Hash.html#method-i-except) or "fetch" (raising an error on missing key) are very convenient to write defensive data processing code!
Similarly, in Elixir, I use Maps a lot for the same type of jobs (https://hexdocs.pm/elixir/1.15.4/Map.html), with similar properties.
`.dig(:key, :key, :etc)` is so nice to find deeply nested data without blowing up.
One thing I don't miss is knowing whether a hash's keys are strings vs symbols. While it's easily solvable, I've definitely lost time only to smack myself that I need to use a str but was always feeding a sym and swore that this should be a sym based hash.
One of the most common issues I found on Ruby code-bases is to not create classes to represent their domain and simply use hashes everywhere.
The downside is that a hash has no shape. It can (and will) be anything you want it to be, often causing havoc once the system grows.
Checks for keys everywhere. Almost all statements use the safe navigation because you never know what shape you're dealing with. Multiple places performing the same map/reduce/filter/etc. All because people stick to hashes a bit too long.
When the company switches to 3.2 I will insist on everyone using the new Data class for value objects rather than hashes.