Readit News logoReadit News
throwbadubadu · a year ago
"Demystifying" is a big word for what the original docs document quite well, and is also not like you couldn't read and understand that in few hours, if you are not totally foreign to protocol design and serialization? This post gives even much less information?!
foooorsyth · a year ago
Yeah, this page is quite clear: https://protobuf.dev/programming-guides/encoding/

Nothing mystical about it

jviotti · a year ago
I did a lot of research on binary serialization at the University of Oxford. One of the papers I published is a comprehensive review of existing JSON-compatible serialization formats (https://arxiv.org/abs/2201.02089). It touches on Protocol Buffers (and >10 other formats) and I'm analyzing the resulting hexadecimals close to how the OP is doing.

I also published a space-efficiency benchmark of those same formats (https://arxiv.org/abs/2201.03051) and ended up creating https://jsonbinpack.sourcemeta.com as a proposed technology that does binary serialization of JSON using JSON Schema.

m3047 · a year ago
I like the notion of fig. 3 but it doesn't seem to capture evolution of uses over time.
thadt · a year ago
As a counterpoint to the horror stories, I've had a few relatively good experiences with protocol buffers (not gRPC). On one project, we had messages that needed to be used across multiple applications, on a microcontroller, on an SBC running Python, in an Android app, in a web service, and on web UI frontend. Being able to update a message definition in one place, and have it spit out updated code in half a dozen languages while allowing for incremental rollout to the various pieces was very handy.

Sure - it wasn't all guns and roses, but overall it rocked.

ainar-g · a year ago
To be fair, if that's what you need ProtoBuf isn't the only option. Cap'n Proto[1], JSON Schema[2], or any other well supported message-definition language could probably achieve that as well, each with their own positives and negatives.

[1]: https://capnproto.org/

[2]: https://json-schema.org/

palata · a year ago
Big fan of Cap'n Proto here, but to be fair it doesn't support as many languages as Protobuf/gRPC yet.
jviotti · a year ago
I'm currently building a Protocol Buffers alternative that uses JSON Schema instead: https://jsonbinpack.sourcemeta.com/. It was proven on research to be as or more space-efficient than any considered alternative (https://arxiv.org/abs/2211.12799).

However, it is still heavily under development and not ready for production use. Definitely looking for GitHub Sponsors or other type of funding to support it :)

fullstop · a year ago
This is what I used it for, and it was great. Especially if you were feeding data to a third party and had to agree on an exchange format anyway.
bairen · a year ago
We built a backend heavily using protobufs/grpc and I highly regret it.

It ads an extra layer of complexity most people don't need.

You need to compile the protobufs and update all services that use them.

It's extra software for security scans.

Regular old http 1 rest calls should be the default.

If you are having scaling problem only then should you consider moving to grpc.

And even then I would first consider other simpler options.

ants_everywhere · a year ago
Personally I'll never go back to REST because you lose important typing information. The complexity doesn't go away. In the RPC/codegen paradigm the complexity is in the tooling and is handled by machines. In REST, it's in the minds of your programmers and gets sprinkled throughout the code.

> You need to compile the protobufs and update all services that use them.

You need to update all the services when you change your REST API too right? At least protobufs generates your code automatically for you, and it can do it as part of your build process as soon as you change your proto. Changes are backwards compatible so you don't even need to change your services until they need to change.

bairen · a year ago
Its silly to think protobufs code gen is a advantage. I can take a json object/xml/csv from any API and plug it into a website that will spit out models in any language I want.

The only real advantage of grpc and protobufs have are speed and reduced data transmission.

And hey fair enough man if those are your bottle necks.

throwbadubadu · a year ago
Sounds really like you used them for the wrong use case. If you are in need of a binary compact serialization, they are not prefect (is there any) but fair enough.
bairen · a year ago
We ended up wrapping them in envoy so that our UI can convert the grpc to regular old http 1. And that's where they get the most use.

And by doing that we've added extra layers and it ended up slower than it would have been had we just used regular rest.

Further more now we need to keep evoy up to date.

Occasionally they break their API on major versions. Their config files are complicated and confusing.

So, imo, grpc should only be used for service to service communication where you don't want to share the code with a UI and speed and throughput is very very important.

And speed of http 1 rarely is the bottleneck.

Ferret7446 · a year ago
You don't need to compile the protobufs. The alternative, for all serialization formats, is to either load the schema dynamically, or write the handling logic manually yourself (or write your own generator/compiler).

gRPC supports HTTPv1 and can be mapped to a RESTful API (e.g. https://google.aip.dev/131).

m3047 · a year ago
It was also used for Farsight's tunnelled SIE called NMSG. I wrote a pure python protobuf dissector implementation for use with Scapy (https://scapy.readthedocs.io/en/latest/introduction.html) for dissecting / tasting random protobuf traffic. I packaged it with an NMSG definition (https://github.com/m3047/tahoma_nmsg).

I re-used the dissector for my Dnstap fu, which has since been refactored to a simple composable agent (https://github.com/m3047/shodohflo/tree/master/agents) based on what was originally a demo program (https://github.com/m3047/shodohflo/blob/master/examples/dnst...) because "the people have spoken".

Notice that the demo program (and by extension dnstap_agent) convert protobuf to JSON: the demo program is "dnstap2json". It's puzzlingly shortsighted to me that the BIND implementation is not network aware it only outputs to files or unix sockets.

The moment I start thinking about network traffic / messaging the first question in my mind is "network or application", or "datagram or stream"? DNS data is emblematic of this in the sense that the protocol itself supports both datagrams and streams, recognizing that there are different use cases for distributed key-value store. JSON seems punctuation and metadata-heavy for very large amounts of streaming data, but a lot of use cases for DNS data only need a few fields of the DNS request or response so in practice cherry picking fields to pack into a JSON datagram works for a lot of classes of problems. In my experience protobuf suffers from a lack of "living off the land" options for casual consumption, especially in networked situations.

EGreg · a year ago
Why not just use cap’n’proto? It seems superior on every metric and has very impressive vision.

Honestly the biggest failing for those guys was not making a good Javascript implementation. Seems C++ aint enough these days. Maybe emcscripten works? Anyone tried it ?

https://news.ycombinator.com/item?id=25585844

kenton - if you’re reading this - learn the latest ECMAScript or Typescript and just go for it!

kentonv · a year ago
> kenton - if you’re reading this - learn the latest ECMAScript or Typescript and just go for it!

I mean, if I had infinite time, I'd love to! (Among infinite other projects.)

But keep in mind Cap'n Proto is not something I put out as a product. This confuses people a bit, but I don't actually care about driving Cap'n Proto adoption. Rather, Cap'n Proto is a thing I built initially as an experiment, and then have continued to develop because it has been really useful inside my other projects. But that means I only work on the features that are needed by said other projects. I welcome other people contributing the things they need (including supporting other languages) but my time is focused on my needs.

My main project (for the past 7 years and foreseeable future) is Cloudflare Workers, which I started and am the lead engineer of. To be blunt, Workers' success pays me money, Cap'n Proto's doesn't. So I primarily care about Cap'n Proto only to the extent it helps Cloudflare Workers.

Now, the Workers Runtime uses Cap'n Proto heavily under the hood, and Workers primarily hosts JavaScript applications. But, the runtime itself is written in C++ (and some Rust), and exposing capnp directly to applications hasn't seemed like the right product move, at least so far. We did recently introduce an RPC system, and again it's built on Cap'n Proto under the hood, but the API exposed to JavaScript is schemaless, so Cap'n Proto is invisible to the app:

https://blog.cloudflare.com/javascript-native-rpc

We've toyed with the idea of exposing schemaful Cap'n Proto as part of the Workers platform, perhaps as a way to communicate with external servers or with WebAssembly. But, so far it hasn't seemed like the most important thing to be building. Maybe that will change someday, and then it'll become in Cloudflare's interest to have really good Cap'n Proto libraries in many languages, but not today.

garaetjjte · a year ago
>Maybe emcscripten works?

It does with minor hacks, I have C++ application compiled with Emscripten using CapnProto RPC over WebSockets. That is, if you are mad enough to write webapps in C++...

My gripe with CapnProto is that it is inconvenient to use it as internal applications structures, either you write boilerplate to convert from/to application objects, or deal with clunky Readers, Builders, Orphanages, etc. But again, I probably gone too far by storing CapnProto objects inside database.

ssahoo · a year ago
Reddit moved to gRPC and protobuff from Thrift a couple years ago. I wonder how it is going for them. https://old.reddit.com/r/RedditEng/comments/xivl8d/leveling_...
conaclos · a year ago
For the ones looking for a minimal and conservative binary format, there is BARE [1]. It is in the process of standardization.

[1] https://baremessages.org/

martinky24 · a year ago
Why would one use BARE over the handful of more well known serialization libraries at this point? For example, what makes it stand out over protobuf? It's not jumping out to me.