It was a fun puzzle though and I'm surprised I didn't know it already. Thanks for sharing.
Specifically this is work related to implementing large dataset support for the dedupe library[1]. It's valuable to be able to effectively de-duplicate messy datasets. That's about as much as I can share.
1. https://numpy.org/doc/stable/reference/generated/numpy.uniqu...
2. https://github.com/dedupeio/dedupe/blob/main/dedupe/clusteri...
* Creating a mutable snapshot of the entire codebase takes a second or two.
* Builds are perfectly reproducible, and happen on build clusters. Entire C++ servers with hundreds of thousands of lines of code can be built from scratch in a minute or two tops.
* The build config language is really simple and concise.
* Code search across the entire codebase is instant.
* File history loads in an instant.
* Line-by-line blame loads in a few seconds.
* Nearly all files in supported languages have instant symbol lookup.
* There's a consistent style enforced by a shared culture, auto-linters, and presubmits.
* Shortcuts for deep-linking to a file/version/line make sharing code easy-peasy.
* A ton of presubmit checks ensure uniform code/test quality.
* Code reviews are required, and so is pairing tests with code changes.