To summarize something from a little over a year after I joined there: Cloudflare was building out a way to ship logs from its edge to a central point for customer analytics and serving logs to enterprise customers. As I understood it, the primary engineer who built all of that out, Albert Strasheim, benchmarked the most likely serialization options available and found Cap'n Proto to be appreciably faster than protobuf. It had a great C++ implementation (which we could use from nginx, IIRC with some lua involved) and while the Go implementation, which we used on the consuming side, had its warts, folks were able to fix the key parts that needed attention.
Anyway. Cloudflare's always been pretty cost efficient machine wise, so it was a natural choice given the performance needs we had. In my time in the data team there, Cap'n Proto was always pretty easy to work with, and sharing proto definitions from a central schema repo worked pretty well, too. Thanks for your work, Kenton!
Albert here :) Decision basically came down to the following: we needed to build a 1 million events/sec log processing pipeline, and had only 5 servers (more were on the way, but would take many months to arrive at the data center). So v1 of this pipeline was 5 servers, each running Kafka 0.8, a service to receive events from the edge and put it into Kafka, and a consumer to aggregate the data. To squeeze all of this onto this hardware footprint, I spent about a week looking for a format that optimized for deserialization speed, since we had a few thousand edge servers serializing data, but only 5 deserializing. Capnproto was a good fit :)
Anyway. Cloudflare's always been pretty cost efficient machine wise, so it was a natural choice given the performance needs we had. In my time in the data team there, Cap'n Proto was always pretty easy to work with, and sharing proto definitions from a central schema repo worked pretty well, too. Thanks for your work, Kenton!