On the flip side, you could actually fit more data in-memory than with non-columnar methods, since the storage is column-by-column, it compresses very well. For example boolean values are stored as bitmaps in this implementation, strings could be stored in a hash map so there's only one string of a type that kept in memory, even if you have millions of rows.
I was building a small multiplayer game in Go. Started with a channel fan-out but (for no particular reason) wanted to see if we can do better. Put together this tiny event bus to test, and on my i7-13700K it delivers events in 10-40ns, roughly 4-10x faster than the plain channel loop, depending on the configuration.