I was curious how this performed server-side versus protobufjs. (what we're currently using) I hastily wired it up to protobufjs's benchmark suite. (https://github.com/protobufjs/protobuf.js/tree/master/bench) The suite is pretty ancient so getting buf-compiled esm added was a challenge.
Granted, the benchmark is created for protobufjs and they probably optimize against it. Protobuf-ES was about 5.1x slower than protobufjs for encoding and 14.8x slower than protobufjs for decoding.
This was run on my M1 with node 16.14; not particularly scientific, etc, etc.
No, but anecdotally the others are probably a lot faster. And you can do even better than that with zero-copy serializations.
But protobuf has very wide support and is decent enough in js, at least server-to-server. Having a schema is very valuable, and there are substantial size wins over JSON, even with gzip.
I gave up on protobufs years ago. The protobuf team has no idea how to write PHP and JS libraries. I got segfaults from using the PHP extension. The built-in toJSON would return invalid JSON (missing braces for binary types). Ridiculous stuff.
I really just prefer to use JSON for everything. It's much easier to debug and observe traffic (browser Network tab). I like JSON-RPC, very simple spec (basically one page long). I don't like REST.
All that said, I'm really glad to see the community take things into their own hands.
It's much easier to debug and observe traffic (browser Network tab).
The DX for JSON things is much better. The UX for protobufs is much better (faster, less data over the wire, etc). Which you optimize for is up to you, but there isn't a straightforward "Use this tech because it's the best one."
I've always wondered about this. Firstly, I'm fairly sure clientside JSON parsing is significantly faster than protobuf decoding but even data over the wire: JSON can be pretty compressible so surely the gains here are going to be marginal. Surely never enough benefits to UX to warrant the DX trade off, right?
protobufs have a great property of having a schema (and then generating code). Which means that it's pretty easy to setup a system where accidental change of API fails CI tests for mobile apps and web.
This is doable with JSON, but I've never seen a JSON based setup actually work well at catching these kind of regressions.
Assuming your developer time is contained improved DX often also leads to better UX (more features). So even if you are optimizing for UX you may well be better with JSON.
I don't develop in JS so can't comment on DX there, but I've found the DX to be pretty good when using protobuf in other languages.
That's mostly been down to having IDE autocompletion for data structures and fields once the protobuf code's been generated.
For many JSON APIs I've worked with there's only been human readable documentation, making them more error prone to work with (e.g. having to either craft JSON manually for requests, or writing a client library if one doesn't already exist).
I think protobuf really works well on the backend and specifically with compiled languages like Go or C++ as per seen by the usage at Google and adoption of gRPC for Go based cloud tooling. Beyond that it's a huge failure. The generated code and usage for other languages is not idiomatic. In fact it's a hindrance and you can see that by the lack of adoption except by the largest orgs who are enforcing it using some sort of grpc-web bridge with types for the frontend. Ultimately you can just convert proto to OpenApi specs and do a much better job at custom client libs with that.
I'm not a frontend dev. Most of my time was spent on the backend but what I'll say is I much prefer the fluidity and dynamic nature of JavaScript and the built in ability to deal with JSON that naturally become objects. All the type stuff is easy to do but with docs you can get away with not needing it.
My feeling. Protobuf lives on for gRPC server side stuff but for everywhere else OpenApi is winning.
JSON parsing is a minefield, especially in cross-platforms scenarios (language and/or library). You won't encounter those problems on toy project or simple CRUD applications. For example, as soon as you deal with (u)int64 where values are greater than 2^53, a simple round-trip to javascript can wreak silent havoc.
Protobuf support for google's first-class citizen languages is usually very good, i.e. C++, Java, Python and Go. For other languages, it depends on each implementation.
As always, each protocol/data format has it's place. You need to maximize the amount of data you send in each packet? Then protobuf is better than JSON. Need to support large amount of clients without any fuzz? Then JSON is better. Wanna pass around data you don't know the schema of? JSON again.
Contexts matters, there is no silver bullets, everything has trade offs and so on, and so on.
JSON messages in a compressed websocket stream are surprisingly tiny. Bigger than compressed protobuf packets but not by much, and much smaller than uncompressed protobuf packets.
Honestly, gzipped json is likely much smaller than uncompressed protobuf.
If you were going to use a binary protocol, why choose one that has no partial parsing/toc these days. There are much better alternatives IMO (flatbuffers being one of them)
> Wanna pass around data you don't know the schema of? JSON again.
This is a false flag. If you don't know the schema on the receiving (or sending, for that matter) side, then you can't do anything with the data, other than pass it on. If you _do_ know what it looks like, then it has an implicit schema whether you call it a schema or not.
At the time, we needed interop with C. So that's why we chose protobufs. But it was a nightmare to work with in other languages. Including C++ for cross platform desktop apps where cross compiling became a problem too.
JSON in C is unfortunately way harder than in other modern languages (e.g. Go which makes it a breeze with struct tags and a great stdlib).
The problem I see with JSON is its limited set of “native” types. I really wish it had specified support for proper numeric types (int, uint, various widths) and not just doubles. A timestamp type would be great as well.
What I really like about Protocol Buffers is that you must write a schema to get started. No more JSON.stringify anything. Everything else sucks though.
Hi there, I am the primary maintainer of the PHP library as of the last few years. I have heard that there used to be a lot of crashes; the code was almost completely rewritten in 2020 and is in a much better state now. If you find a segfault and you have a repro, file a bug and we will fix it.
I recommend Capnproto. Parsing time is zero, you can pretend you're a Microsoft programmer in the early 90s and just use the in-RAM struct as your wire format. Maybe it doesn't make sense for in-browser JS applications (though WASM is a different story) but for IPC and RPC in the general case, all parsing and unparsing does is generate waste heat.
ALWAYS favor a binary format unless you have a really good reason otherwise.
Capnproto is designed by Kenton, a former Google engineer who did a lot of work with protobufs at Google. I see Capnproto as the spiritual successor of protobuf, fixing many issues in protobufs.
Also, Capnproto is quite extensively used in some Cloudflare products.
I like protobufs but I was also disappointed at the JS protobuf options. I disliked both the JS object representation and RPC transport.
grpc-web in particular requires an Envoy proxy which seems absurdly heavyweight. I ended up using Twirp because Buf connect wasn't yet released or planned.
I rolled my own JS representation. The major differences from Connect:
- Avoid undefined if the message is not present on the wire and use an empty instance of the object instead. For recursive types, find the minimal set of fields to initialize as undefined instead of empty.
- Transparently promote some protobuf types, like google.protobuf.Timestamp to a proper Instant type (from js-joda or similar library). This makes a surprisingly large difference on reducing the number of jumps from the UI to the API.
A subjective opinion, but it's much easier to read some documentation and checking maybe an OpenAPI spec than having to deal with protobuf.
You also have solutions like GraphQL that define a schema, or you can publish some kind of schema (a good thing to do) but use JSON instead of a binary format.
Protobuf also does not declare its schema. Message parsers can be generated from a schema, but that's also true for REST over JSON. Even ad hoc REST APIs often have better self-declaration of resource types than protobuf.
(I still like protobuf, but the schemas are a terrible reason to like it.)
But you can do automatic validation fairly easily with JSON Schema. You don't need to choose a binary format to get validation.
The principle benefit is that you can use the schema to define the data format, which means you can pack the data in more tightly (you don't need a byte to say "this is an object" if you know that the input data must be an object at this point). That's a big benefit in certain situations, but if you're using this sort of stuff just to get validation then you're probably better off using JSON Schema and having a wire transfer format that you can read easily without additional tools.
The link you included shows that protobufs are at least 15% better for all users, and as much as 57% better for cases where the data is small. Doesn't that mean for 100% of users it will actually make a difference?
Your users might not care about the difference but it will be there.
You don't optimize things for the cases when they are fast. (Unless the gain is a couple of orders of magnitude; certainly not for a 50% speedup.)
The 15% gain is the one that matters. On practice, it comes at the expense of a more complex (thus larger, negating some of it) and less reliable system. It is very rare that this trade-off is worth it.
protobuf is much more concise and readable than OAS. You can define API contracts in protobuf and still serve JSON APIs via the standard-ish gRPC/JSON transcoding enabled by google.api annotations.
This only makes sense if you have a server that someone else put together that for some reason only speaks protobuf. I'm not aware of any language ecosystem that has protocol buffers but no json support, so if you're building a server from scratch this isn't a good reason to use protobufs.
And if you are faced with a server that only speaks protobuf, the same question applies to the original devs: why did they make that decision?
For non-niche use cases that is a bad developer experience.
If you are designing your own solution that uses protobuf instead of JSON say goodbye to a range of useful tools that the whole industry uses. From testing to automation it will be harder at every step, and you will have to find custom solutions instead of usual no-customization solution that works OOTB with JSON.
It is a good way to frustrate your developers and generate sometimes brittle solutions related to testing/automation/infrastructure.
I am using it for sending data between game server and client. Encoding the messages in JSON would be just silly, although I wonder what is the standard in the game industry.
we use it at https://woogles.io for pretty much all communication (server-to-server and client-to-server). I do loathe dealing with the JS aspect of it and am very excited to move over to protocol-es after reading this article (and shaving off a ton of repeated code and generated code).
I keep trying to understand and use protobuf but every time I look at it and its API (this article included) I get more confused and have absolutely no idea how to implement it.
I can't tell whether I'm just dumb or a really terrible developer, or if the docs or the thing itself is really hard to use?
2. The protoc should generate code as part of your build (try not to check in generated proto code if at all possible).
3. Use generated code to output bytes/parse bytes (this depends on your HTTP/RPC library).
The other trick is that you should use the exact same (!) schema file for your frontend and backend projects. This means that changing it should trigger regeneration of generated code for your clients and servers and then run CI on them.
So if you accidentally introduce a breaking API change, the CI for broken client will fail before you deploy it.
> The other trick is that you should use the exact same (!) schema file for your frontend and backend projects. This means that changing it should trigger regeneration of generated code for your clients and servers and then run CI on them.
You do not need to have the exact same schema file, in fact protobuf is carefully designed to avoid needing this. You need to follow some rules about what to do when fields are added or removed:
* Generally, roll out the server side first then, once that is complete, start rolling out the client afterwards.
* If a field is added (on the server side), make sure that it can be ignored on the client side, so old clients are not impacted. For example, don't add a "units" field that changes the meaning of existing "temperature" field (previously had to be fahrenheit, now can be celsius or fahrenheit). Instead add a separate field "temperature_celsius" and send both. (You can always remove the old one later on the server if new clients don't need it and you have 100% finished roll out of clients.) Note that receiving unexpected field data is not an error in protobuf, so the extra field won't cause any problems so long as it's not a problem at application level.
* You can equally remove a field so long as the client isn't relying on it (in this case you may need to roll out client update first). More accurately (with proto3 syntax) it will appear as empty/zero so this needs to be OK.
* You can't change a field's type e.g. from integer to double (or from one message type to another, but just adding a field to a message according to the above is OK). If you want to do that, go through a controlled process of adding a new field with the new type you want then removing the old field.
* You are free to reorganise the order fields appear in the proto file but don't renumber the fields - the field number is what defines it in the binary encoding. In particular, if you remove field number 2 (for example) you should leave a gap (fields 1, 3, 4,... remaining) rather than renumbering the remaining ones to be contiguous.
Depending on the application, it is often actually a good idea to have a completely separate copy of the proto file in the client and server applications, with the client proto typically lagging behind the server one.
I can empathize. I was the same way at first. What is it that you find confusing? Perhaps we can help clear it up or link you to helpful documentation (or improve our own docs).
Maybe what could be added is a debug header when using grpc. If it is present, the proto schema is sent with each request / response. Then the tooling can be enhanced to look for this.
I suspect this would not be much heavier than json so it could be always left on for those who are ok with the overhead.
Protobufjs is good, but I can't use it because it's only a protobuf library, not a gRPC library. I end up having to use grpc-web, with all the problems it comes with.
I was hoping Buf could solve that problem... Maybe in the future! :)
The same reason along with the fact that you had to generate code, as well as usually needing to convert it to a class afterward was the reason I wrote my own typescript-native binary serializer[0] (mostly based on C-FFI for compatibility) a few years ago.
Shameless plug to my project Phero [0]. It’s a bit like gRPC but specifically for full stack TypeScript projects.
It has a minimal API, literally one function, with which you can expose your server’s functions. It will generate a Typesafe SDK for your frontend(s), packed with all models you’re using. It will also generate a server which will automatically validate input & output to your server.
One thing I’ve seen no other similar solution do is the way we do error handling: throw an error on the server and catch it on the client as if it was a local error.
As I said, it’s only meant for teams who have full stack TypeScript. For teams with polyglot stacks an intermediate like protobuf or GraphQL might make more sense. We generate a TS declaration file instead.
Granted, the benchmark is created for protobufjs and they probably optimize against it. Protobuf-ES was about 5.1x slower than protobufjs for encoding and 14.8x slower than protobufjs for decoding.
This was run on my M1 with node 16.14; not particularly scientific, etc, etc.
My gut feeling is anyway that PB in JS is likely a failure against just using JSON (which the JS runtime implements in efficient native code).
But protobuf has very wide support and is decent enough in js, at least server-to-server. Having a schema is very valuable, and there are substantial size wins over JSON, even with gzip.
I really just prefer to use JSON for everything. It's much easier to debug and observe traffic (browser Network tab). I like JSON-RPC, very simple spec (basically one page long). I don't like REST.
All that said, I'm really glad to see the community take things into their own hands.
The DX for JSON things is much better. The UX for protobufs is much better (faster, less data over the wire, etc). Which you optimize for is up to you, but there isn't a straightforward "Use this tech because it's the best one."
I've always wondered about this. Firstly, I'm fairly sure clientside JSON parsing is significantly faster than protobuf decoding but even data over the wire: JSON can be pretty compressible so surely the gains here are going to be marginal. Surely never enough benefits to UX to warrant the DX trade off, right?
This is doable with JSON, but I've never seen a JSON based setup actually work well at catching these kind of regressions.
That's mostly been down to having IDE autocompletion for data structures and fields once the protobuf code's been generated.
For many JSON APIs I've worked with there's only been human readable documentation, making them more error prone to work with (e.g. having to either craft JSON manually for requests, or writing a client library if one doesn't already exist).
I'm not a frontend dev. Most of my time was spent on the backend but what I'll say is I much prefer the fluidity and dynamic nature of JavaScript and the built in ability to deal with JSON that naturally become objects. All the type stuff is easy to do but with docs you can get away with not needing it.
My feeling. Protobuf lives on for gRPC server side stuff but for everywhere else OpenApi is winning.
See http://seriot.ch/projects/parsing_json.html
Protobuf support for google's first-class citizen languages is usually very good, i.e. C++, Java, Python and Go. For other languages, it depends on each implementation.
Contexts matters, there is no silver bullets, everything has trade offs and so on, and so on.
If you were going to use a binary protocol, why choose one that has no partial parsing/toc these days. There are much better alternatives IMO (flatbuffers being one of them)
This is a false flag. If you don't know the schema on the receiving (or sending, for that matter) side, then you can't do anything with the data, other than pass it on. If you _do_ know what it looks like, then it has an implicit schema whether you call it a schema or not.
JSON in C is unfortunately way harder than in other modern languages (e.g. Go which makes it a breeze with struct tags and a great stdlib).
What I really like about Protocol Buffers is that you must write a schema to get started. No more JSON.stringify anything. Everything else sucks though.
ALWAYS favor a binary format unless you have a really good reason otherwise.
Also, Capnproto is quite extensively used in some Cloudflare products.
grpc-web in particular requires an Envoy proxy which seems absurdly heavyweight. I ended up using Twirp because Buf connect wasn't yet released or planned.
I rolled my own JS representation. The major differences from Connect:
- Avoid undefined if the message is not present on the wire and use an empty instance of the object instead. For recursive types, find the minimal set of fields to initialize as undefined instead of empty.
- Transparently promote some protobuf types, like google.protobuf.Timestamp to a proper Instant type (from js-joda or similar library). This makes a surprisingly large difference on reducing the number of jumps from the UI to the API.
If using compression the size is in the same ballpark (protobuf can be between 20% and 50% smaller). For 99% of users it should not make a difference. https://nilsmagnus.github.io/post/proto-json-sizes/#gzipped-...
You also have solutions like GraphQL that define a schema, or you can publish some kind of schema (a good thing to do) but use JSON instead of a binary format.
(I still like protobuf, but the schemas are a terrible reason to like it.)
The principle benefit is that you can use the schema to define the data format, which means you can pack the data in more tightly (you don't need a byte to say "this is an object" if you know that the input data must be an object at this point). That's a big benefit in certain situations, but if you're using this sort of stuff just to get validation then you're probably better off using JSON Schema and having a wire transfer format that you can read easily without additional tools.
The link you included shows that protobufs are at least 15% better for all users, and as much as 57% better for cases where the data is small. Doesn't that mean for 100% of users it will actually make a difference?
Your users might not care about the difference but it will be there.
Actually realizing that speed up for your users will take time away from delivering features.
Engineering is a trade off, always will be.
You don't optimize things for the cases when they are fast. (Unless the gain is a couple of orders of magnitude; certainly not for a 50% speedup.)
The 15% gain is the one that matters. On practice, it comes at the expense of a more complex (thus larger, negating some of it) and less reliable system. It is very rare that this trade-off is worth it.
Deleted Comment
And if you are faced with a server that only speaks protobuf, the same question applies to the original devs: why did they make that decision?
If you are designing your own solution that uses protobuf instead of JSON say goodbye to a range of useful tools that the whole industry uses. From testing to automation it will be harder at every step, and you will have to find custom solutions instead of usual no-customization solution that works OOTB with JSON.
It is a good way to frustrate your developers and generate sometimes brittle solutions related to testing/automation/infrastructure.
Deleted Comment
Otherwise personally json wins
I can't tell whether I'm just dumb or a really terrible developer, or if the docs or the thing itself is really hard to use?
1. Your schema is the source of truth.
2. The protoc should generate code as part of your build (try not to check in generated proto code if at all possible).
3. Use generated code to output bytes/parse bytes (this depends on your HTTP/RPC library).
The other trick is that you should use the exact same (!) schema file for your frontend and backend projects. This means that changing it should trigger regeneration of generated code for your clients and servers and then run CI on them.
So if you accidentally introduce a breaking API change, the CI for broken client will fail before you deploy it.
You do not need to have the exact same schema file, in fact protobuf is carefully designed to avoid needing this. You need to follow some rules about what to do when fields are added or removed:
* Generally, roll out the server side first then, once that is complete, start rolling out the client afterwards.
* If a field is added (on the server side), make sure that it can be ignored on the client side, so old clients are not impacted. For example, don't add a "units" field that changes the meaning of existing "temperature" field (previously had to be fahrenheit, now can be celsius or fahrenheit). Instead add a separate field "temperature_celsius" and send both. (You can always remove the old one later on the server if new clients don't need it and you have 100% finished roll out of clients.) Note that receiving unexpected field data is not an error in protobuf, so the extra field won't cause any problems so long as it's not a problem at application level.
* You can equally remove a field so long as the client isn't relying on it (in this case you may need to roll out client update first). More accurately (with proto3 syntax) it will appear as empty/zero so this needs to be OK.
* You can't change a field's type e.g. from integer to double (or from one message type to another, but just adding a field to a message according to the above is OK). If you want to do that, go through a controlled process of adding a new field with the new type you want then removing the old field.
* You are free to reorganise the order fields appear in the proto file but don't renumber the fields - the field number is what defines it in the binary encoding. In particular, if you remove field number 2 (for example) you should leave a gap (fields 1, 3, 4,... remaining) rather than renumbering the remaining ones to be contiguous.
Depending on the application, it is often actually a good idea to have a completely separate copy of the proto file in the client and server applications, with the client proto typically lagging behind the server one.
I suspect this would not be much heavier than json so it could be always left on for those who are ok with the overhead.
Win win?
I was hoping Buf could solve that problem... Maybe in the future! :)
[0]: https://github.com/i404788/honeybuf
It has a minimal API, literally one function, with which you can expose your server’s functions. It will generate a Typesafe SDK for your frontend(s), packed with all models you’re using. It will also generate a server which will automatically validate input & output to your server.
One thing I’ve seen no other similar solution do is the way we do error handling: throw an error on the server and catch it on the client as if it was a local error.
As I said, it’s only meant for teams who have full stack TypeScript. For teams with polyglot stacks an intermediate like protobuf or GraphQL might make more sense. We generate a TS declaration file instead.
[0] https://github.com/phero-hq/phero
https://trpc.io/docs/v10/quickstart