The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions

I was curious how long it would take with Polars (for scale), apparently 33s: https://github.com/Butch78/1BillionRowChallenge/tree/main

I’m kind of interested in the opposite problem, what is the simplest solution using a well known library/db that approaches the fastest hand optimized solution to this problem?

sharno · a year ago

That’s the question worth asking imo. I was wondering how fast is the idiomatic Java solution

benhoyt · a year ago

On my machine the simple/idiomatic "baseline" Java solution (https://github.com/gunnarmorling/1brc/blob/main/src/main/jav...) takes 2m0s, compared to 1m45s for my simple Go version.

lucianbr · a year ago

71 seconds

https://questdb.io/blog/billion-row-challenge-step-by-step/

KingOfCoders · a year ago

Java is often slightly faster than Go, has similar (perhaps, older, better optimized Map) constructs, perhaps better GC (older, more optimized), though I don't think the GC is a challenge, has slower startup times - so I'd say roughly the same as the idiomatic Go version?

gigatexal · a year ago

Where’s the source data I’d like to attempt ingesting this and processing it with DuckDb.

llimllib · a year ago

Here’s a thread on results with duckdb, I don’t mean to discourage you taking a shot at all though: https://github.com/gunnarmorling/1brc/discussions/39

schu · a year ago

Instructions on how to create it can be found here:

https://github.com/gunnarmorling/1brc?tab=readme-ov-file#run...

The Python version:

https://github.com/gunnarmorling/1brc/blob/main/src/main/pyt...

nhinck3 · a year ago

In the original 1BRC, it's a python script that generates the data.

geysersam · a year ago

Sounds very reasonable. In the blog post about 20s were shaved off by assuming we don't need complicated string parsing. An of the shelf library can't make that assumption so they will always have to pay the extra cost.

392 · a year ago

True in general but some (especially libs aimed at larger datasets processed in batch) are taking advantage of benchmarks like this to do things like: Try the fast way, if it works great Try the slow way, if above fails This makes the slow path 2x slower at worst (and you can advise to always use the slow way with optional params) but the fast path can be 10x faster

jsmith99 · a year ago

I’m surprised at the poor performance of python here. For reference there are several very brief R examples which are just 2-3 seconds. Eg http://blog.schochastics.net/posts/2024-01-08_one-billion-ro...

cmdlineluser · a year ago

Are you talking about the 2nd table in the Benchmark section?

It seems they are not running against the full dataset:

> Moving on to the 100 million file to see if size makes a difference.

  ggplot2::autoplot(reorderMicrobenchmarkResults(bench1e8))

One would also have to run both approaches on the same hardware for a meaningful comparison?

I love the nerdery around 1BRC. My axe to grind is that unless you do dangerous stuff DBs are just as fast, less complicated, and more resilient to data updates than application code [0]. Do more in the database!

0: https://geraldonit.com/2024/01/31/1-billion-row-challenge-in...

giantrobot · a year ago

I agree with doing more in the database, you're closer to the data (disk/disk cache/L2 cache) than the application code is. At the same time I get really nervous around doing work in the database because you have to be really disciplined that the in-database code (functions, views, etc) match the code in source control. Also that your testing/QA database contains all the same code and enough test data to actually exercise the performance bounds of that code.

With application code I can easily step through it in a debugger and verify the deployed code matches what's in the repo. It's more difficult to do in the database because it requires more organizational discipline.

camgunz · a year ago

Yeah you need some different tools when working in data. I've been recommending dbt [0] as kind of the on-ramp for SREs to data work. Among other things it allows you to keep your work in a VCS and has a testing framework.

[0]: getdbt.com

riku_iki · a year ago

actually that guy can now be sued, per TOS it is illegal to publish benchmarks for Oracle DB.

Rexxar · a year ago

I'm not sure this part of the TOS is valid in many jurisdictions. But there is a better reason to not publish benchmarks: they do not deserve free advertisement. We should just collectively forget they exist and use other tools.

bbkane · a year ago

I found this super interesting - especially as all the data I've written code to manipulate has been small enough that I haven't needed to optimize my code, so I've never had to think in this direction.

I think my favorite part was the very first section, where he got baseline measurements with `cat`, `wc`, and friends. I wouldn't have thought to do that and its such an easy way to get a perspective on what's "reasonable".

jasonwatkinspdx · a year ago

It also underscores just how insane raw disk bandwidth is on modern ssds. Most software is not designed around a world where you have gigabytes a second of sequential scan on a laptop.

walth · a year ago

I believe this test is run several times and the data set fits in page cache.

timetopay · a year ago

A few months ago, I had to quickly bang out a script to output about 20 million lines of text, each the output of a hash function. My naive solution took more than a few minutes - simple optimizations such as writing every 10k lines cut the time significantly. Threading would have helped quite a bit as well.

latchkey · a year ago

I hate to "me too", but you also nailed that analysis.

Dead Comment

faizshah · a year ago

michae2 · a year ago

For anyone looking for more examples of 1BRC in Go, we had a friendly competition at work and collected the results here: https://github.com/dhartunian/1brcgo/

In addition to the loop-unrolling and bit-twiddling tricks that also show up in the fastest Java and C++ versions, some Go-specific things I learned were:

- unsafe.Pointer can be used to read memory without bounds checks

- many functions in the bytes and bits packages in the standard library are written in assembly

- debug.SetGCPercent and SetMemoryLimit to turn off GC

- runtime.LockOSThread to lock a goroutine to a thread

- print is slightly faster than fmt.Printf (but writes to stderr)

Oh, I'd missed those solutions, thanks. You guys got way more hard core than I did -- nice work! Looking forward to reading the code for those solutions this week.

Update: for reference, Jason Chu's solution (https://github.com/dhartunian/1brcgo/blob/494eabd6ea958cc193...) seems to be the fastest on my machine, and runs in about 1.3s!

I think we all ended up using unsafe, though there were some solutions without mmap. It would have been interesting if we had adhered to the same constraints you did!

markoman · a year ago

Could you say why you find using memory-mapped files to be a portability issue? Thanks.

Exuma · a year ago

Damn, that is fine work. I know nothing and am humbled

hoten · a year ago

> I find this report confusing. Why does if items[hashIndex].key == nil show as taking 5.01s, but the call to bytes.Equal shows as only 390ms. Surely a slice lookup is much cheaper than a function call? If you are a Go performance expert and can help me interpret it, I’m all ears!

These two lines are both conditionals, so the time reported is sensitive to branch mispredictions. If the timings are not intuitive based on the complexity of the associated lines, then it may be explained by the data being not very predictable and the branch predictor having a bad time.

Yeah, someone emailed me privately after they'd dug into this. They mentioned that "items[hashIndex]" was a significant source of cache misses. They used "perf record -e cache-misses" and found it was the largest source of cache misses. They also found (by digging into the assembly) that the "bytes.Equal" line has some sort of source-line attribution issue.

thangalin · a year ago

Back in 2010, I used PostgreSQL for a web app that queried 270 million rows of climate data from Environment Canada:

https://www.youtube.com/watch?v=10KEr3sEG80

I wanted to see how the temperature was changing over time for specific regions using a map-based interface. The following chart was particularly eye-opening:

https://www.youtube.com/watch?v=iEtvf9xzRB4&t=164s

The software won a couple of awards and was heavily optimized to produce reports in under a minute. Kudos to the author for getting a parse time of a billion records down to mere seconds.

fizx · a year ago

It's worth noting that if you're messing around with large text files from the CLI, awk, grep, etc will be an order-of-magnitude faster if you opt out of unicode parsing.

I'm pretty confident adding LC_ALL=C to the awk solution would get it easily under a minute.

It's a good point, but I believe because of the "-b" option (characters as bytes), using LC_ALL=C doesn't make a difference.

verytrivial · a year ago

I just want to go on record that given the simplistic (i.e fun) problem, a shell developer would have had the answer to a first, specific set of billion rows done while all the other language were still putting on their shoes.