Ruby's hash is a Swiss-army knife

# added a new keyword arg here def foo(kwargs = {}, frob: false) kwargs end foo({k: 1}) # still ok: `frob` defaults to false foo(k: 1) # ArgumentError: no keyword: :k

(def fib (lazy-cat [0 1] (map + fib (rest fib)))) => (take 10 fib) (0 1 1 2 3 5 8 13 21 34) Explanation: 0 1 1 2 3 5 ; this is fib + 1 1 2 3 5 8 ; this is (rest fib) --------------- 1 2 3 5 8 13 ; this is (map + fib (rest fib)) ; and the sequence needs to be initialized with (lazy-cat [0 1] ...

This is a lovely overview. Hash is a great example of how delightful it can be to program in Ruby.

One more technique worth noting is the chained functional style of using Hash, which you can do in Ruby because Hash inherits from Enumerable. If you're prototyping a script to do some data-cleaning, this makes it easy to build up your pipeline and iterate on it. For example:

    foobar = { ...your data here... }
    foobar.map do |k, v|
      # ...
      # do some transformation here
      # ...
      # and then return the replacement key/value for this entry
      [key, new_value]
    end.select do |k, v|
      # do some filtering here, e.g.:
      is_foobarable?(k, v)
    end.map do |k, v|
      # ...
      # do some expensive transformation here on the smaller data set
      # ...
      [key, newer_value]
    end.to_h

(Note that you have to call #to_h at the end since the Enumerable functions will coerce the hash into an array of arrays.)

Now your code literally shows the pipeline that your data is falling through — and each of these steps can be side-effect-free, with no mutations to the original foobar structure.

my_new_account0 · 2 years ago

Unless things have changed and Ruby has stream fusion now, this is bad advice for scale. You are iterating over a fat object multiple times. Even if its uglier its much better in this case to create an empty array/hash, iterate over with #each and #<< to the hash.

I worked at the largest Rails shop in the world and this would be rejected in code review.

Edited to add more detail: the only method you need to write to implement Enumerable is #each. Every step of your pipeline here is _another_ call to #each. Just do it once.

mikemcquaid · 2 years ago

> I worked at the largest Rails shop in the world and this would be rejected in code review.

Not sure if this means GitHub or Shopify. Until earlier this year I worked at GitHub for a decade, leaving as a principal engineer, primarily writing Ruby.

This would not be rejected at code review there unless the Hash had e.g. millions of values and, even then, it might not be a meaningful performance problem in context.

If the Hash is very small and will always be: readability trumps Big-O "performance" when n is very small.

Am I being pedantic and appealing to authority? Yup but, well, you started it and I hate to see helpful "Ruby is nice" comments like the grandparent get crapped on for no good reason.

lloeki · 2 years ago

There's #lazy to turn things into a lazy enumerator, to be iterated over when you so desire with e.g #force.

If you're going to iterate over an accumulator variable, use the for keyword instead of each, it's faster.

Alternatively, one can use .reduce({}) { |h, (k, v)| ... h } or .each.with_object({}) { |(k, v), h| ... } which makes the block not close over an external variable, and makes the assignment to that variable "atomic" (wrt the hash construction, the variable will only contain the final result, that is if a final variable is needed at all, which it may not with implicit return of the last value)

dhotson · 2 years ago

Yeah, that's a great demonstration of the technique.

One tiny tip.. :) you can pass a block to `.to_h`, so instead of using `.map` + `.to_h`:

    h.map { |k,v| [k, v] }.to_h

.. it can be simplified to:

    h.to_h { |k,v| [k, v] }

Hash is so powerful in Ruby that people often overuse them.

One of the most common issues I found on Ruby code-bases is to not create classes to represent their domain and simply use hashes everywhere.

The downside is that a hash has no shape. It can (and will) be anything you want it to be, often causing havoc once the system grows.

Checks for keys everywhere. Almost all statements use the safe navigation because you never know what shape you're dealing with. Multiple places performing the same map/reduce/filter/etc. All because people stick to hashes a bit too long.

arooaroo · 2 years ago

Amen. This is an issue at the company I work at. Common typos when looking up has keys will return nil - this has a tendency to silently keep working and blow up with a runtime error further down the chain. I am trying to insist on using .fetch to force an exception.

When the company switches to 3.2 I will insist on everyone using the new Data class for value objects rather than hashes.

cies · 2 years ago

Ruby newly added type system can also help here. For starters, it'd be nice to know what type(s) the keys and values can be.

Draiken · 2 years ago

That's what POROs are there for. I truly don't want a faux type system only to then keep using hashes for everything...

mcv · 2 years ago

This is perhaps the biggest killer feature of typescript: that your object literals (which are basically hashes) can have types. Or possibly interfaces.

andolanra · 2 years ago

Since Ruby 3, the automatic coercion of keywords to a hash—the second example underneath "Passing Hash to Functions" in this post—is considered a legacy style and is generally frowned upon in new code. That is to say, code like the second call to `foo` here:

    def foo(kwargs = {})
      kwargs
    end
    
    foo({k: 1})  # ok: passing hash argument
    foo(k: 1)    # ok: keywords coerced to hash

One of the strongest arguments for avoiding this sugar is that it makes the code more brittle in the face of future changes. In particular, in the example above, if we add a new keyword argument to `foo`, then any call which omitted the curly braces will break, while calls which used them will keep working fine:

This is touched on in the blog post describing the extensive changes made to keywords in Ruby 3: https://www.ruby-lang.org/en/news/2019/12/12/separation-of-p...

Klonoar · 2 years ago

This might be me having huffed too many types lately, but I feel like I would want that to break.

Adding a parameter that has a sensible default value? I don't think I'd want that to break anything.

software_writer · 2 years ago

You're right, thanks for the demo and sharing the link, really appreciated. I'll update the article to mention this.

Deleted Comment

faitswulff · 2 years ago

My favorite little-known fact about Ruby hashes is that they respond to `to_proc` and can be used as procs. For example, you can do this:

a = { 1 => 'a', 2 => 'b' }

[1, 2, 3].map(&a)

#=> ['a', 'b', nil]

nix-zarathustra · 2 years ago

One of the most beautiful things in Ruby that I have ever seen is this fibonacci code.

  fib = Hash.new do |k, v|
    next 1 if v == 0 || v == 1
    k[v-1] + k[v-2]
  end

devoutsalsa · 2 years ago

With caching...

   fib = Hash.new do |k, v|
     next 1 if v == 0 || v == 1

     unless k.key? v
       k[v] = k[v-1] + k[v-2]
     end

     k[v]
   end

hit8run · 2 years ago

Let’s make it a one liner :D

fib = Hash.new {|hash, key| hash[key] = key < 2 ? key : hash[key-1] + hash[key-2] }

Example: fib[123] # => 22698374052006863956975682

Makes use of memoization.

StackOverlord · 2 years ago

You can also recursively define fib as a lazy sequence that is the pairwise sum of fib and fib shifted by one. (Clojure, from Rosetta Code).

inopinatus · 2 years ago

Or an Ackermann:

    A = Hash.new { |a,(m,n)| a[[m,n]] = m==0 ? n+1 : n==0 ? a[[m-1,1]] : a[[m-1, a[[m, n-1]]]] }

    A[[3,4]] #=> 125

    A.inspect #=> ... long

However, the application utility of a self-populating lazily-evaluated lookup structure goes further. A hash with a default function works great as a simple caching wrapper for all manner of results, for example when talking to slow or fine-grained APIs.

I don't quite understand how this code works. Where does the `nil` come from? What operation are we performing on 3 that causes it to return `nil`?

breckenedge · 2 years ago

1, 2, and 3 are being passed as lookups to the a hash. 3 is undefined on the hash, hence nil.

cyclotron3k · 2 years ago

That's cool, I didn't know that!

But now I'm wondering where that would be a better solution than just using the `Hash#values_at` method...?

defanor · 2 years ago

It is just a hash map with a few common functions defined; hash maps are occasionally useful, but what is all the praise about?

The mentioned "simple" bit is arguable: as a language's building block, a hash map is relatively complex and specific, since those can be built out of lists (or trees, though lists/arrays may be preferable for efficiency), which can be built out of pairs (product types, cons cells, tuples; unless going for that efficiency, though it can still be pretty efficient with trees). Maybe it is one of those "simple versus easy" mix-ups.

rob74 · 2 years ago

Well yeah, it's "simple" (or easy) to use, definitely not simple to implement. But having an easy-to-use hashmap is par for the course for most newer languages - not only Ruby, but also PHP (associative arrays), JS (objects), Go (built-in map type) etc.

q7xvh97o2pDhNrh · 2 years ago

block_dagger · 2 years ago

Ruby’s Hash is probably the handiest data structure I’ve ever encountered. Thanks Matz.

thibaut_barrere · 2 years ago

Definitely! As a matter of fact, this is the default data structure I use when writing Ruby ETL code (e.g. https://github.com/thbar/kiba/wiki).

Methods like "except" (https://docs.ruby-lang.org/en/3.2/Hash.html#method-i-except) or "fetch" (raising an error on missing key) are very convenient to write defensive data processing code!

Similarly, in Elixir, I use Maps a lot for the same type of jobs (https://hexdocs.pm/elixir/1.15.4/Map.html), with similar properties.

code_biologist · 2 years ago

Kiba looks like a really cool framework, thanks for posting it!

Gigachad · 2 years ago

I wish it was typed though. So many times I’ve seen a function that takes a hash of “options” or “config” and have no idea what that actually contains. Even for official rails methods it’s often complex to know what the possible options are. Some of them seem almost internal with how obscure they are.

RBS+steep to the rescue! We typed our configuration this way for ddtrace Ruby. On the external (set) side it makes it very easy to explore configuration in an editor, on the internal (get) side it makes us ensure we don't make mistakes.

irjustin · 2 years ago

I've been in Python/Django for about a year now and I really miss Ruby's hash vs the dict.

`.dig(:key, :key, :etc)` is so nice to find deeply nested data without blowing up.

One thing I don't miss is knowing whether a hash's keys are strings vs symbols. While it's easily solvable, I've definitely lost time only to smack myself that I need to use a str but was always feeding a sym and swore that this should be a sym based hash.

I don't know ruby and if it's the same, but if you're not using a library like funcy or toolz for a nested get helper, you can do `dict.get('key', {}).get('key2', {}).get('key3')`. Not the prettiest, but can do in a pinch.

Thanks! i didn't know about these and will check them out for sure. Tired of `if key in hash:` nested layers.

j_crick · 2 years ago

It should be noted that ** operator works like .merge, and also accounts for the order of key definition in a given hash declaration (whatever is declared earlier gets overwritten if a key with same name is used in the same hash declaration later).