How Turborepo is porting from Go to Rust

> By serializing to JSON, a format with robust support in both Go and Rust, we were able to minimize our FFI surface area, and avoid a whole class of cross-platform, cross-language bugs.

There’s been a lot of discussion of the marshaling and unmarshaling costs of Json (and really any other text format). Should we be looking more strongly at binary formats, and if those have better outcomes than Json?

I’m guessing they didn’t want to introduce more change to the Go code than necessary, but I’m wondering if people have any horror stories to share about any of the binary formats available?

azdle · 2 years ago

Protobuf is incredibly annoying to use with rust (my usecase was specifically rust to-rust) because of how limited protobuf's type system is. For instance there is no way to represent an `Option` type because protobuf has decided that if something is no present, then it is really some default value instead. The only workaround I found was to have both `foo` and `is_fo_set` fields, which required having duplicate copies of every datatype one for (de)serialization and one for actually using in my codebase, manually implementing changes between them. Don't get me wrong, there's nothing show-stopping, but it was definitely a death by a thousand cuts situation for me.

Though, I know that another team at the same company loved it, but they were all in on java (and groovy and kotlin). As I understand it protobuf meshes with Java's type system much better (and probably Go's too?).

---

I did some benchmarking on various formats/rust libraries when we were deciding on what the protocol should be used for ^ and, IIRC, JSON fared better than you'd have expected. My assumption is that JSON just has had way more eyes/hands on the implementation than anything else. The things that I can remember beating it were (1) bincode, (2) cbor, and (3) protobuf, then JSON was #4.

In retrospect I wish we'd have gone with bincode, but we weren't sure if we were going to need some java code to interact with this service at the time and, at least at the time, bincode was rust-only (and possibly not even stable between compiler releases?). It would have been much faster to develop with and there was a whole bunch of overhead from protobuf that didn't really do anything for us since we were talking over a unix domain socket within a single system.

bminor13 · 2 years ago

proto3 messsage fields allow for detecting set vs. not set; other field types (repeated, map, int32, string, bool, enum, etc.) have this "default value if not set" issue. The canonical way of handling this is to use wrapper messages (because one can detect if the wrapper is not set), and there are "well-known" canned messages/protos one can import and use without writing their own: https://protobuf.dev/reference/protobuf/google.protobuf/

Whether the codegen/libraries for a particular language provides a more idiomatic binding for these well-known wrappers is up to the implementation - for example, golang libraries have conveniences added for well known libraries: https://pkg.go.dev/google.golang.org/protobuf/types/known. Rust libraries may have the same; I'm not as familiar with the ecosystem there.

foobiekr · 2 years ago

On optional.. this was a regression in proto that is somewhat helped by https://github.com/protocolbuffers/protobuf/blob/main/docs/f... ; I have no idea whether protobuf for rust has started taking advantage of this.

JSON is awful in every way.

square_usual · 2 years ago

Have you tried FlatBuffers? And if you did, did it perform worse than JSON? The idea of deserialize-on-read is tempting to me, but I'd like to see real-world cases of it working out.

rvcdbn · 2 years ago

recent versions of proto3 have added back the “optional” keyword that can be used on any field. see: https://github.com/protocolbuffers/protobuf/blob/main/docs/f...

ithkuil · 2 years ago

Yes it is unsurprising that an IDL designed to be used as an interchange format between different languages (and other requirements such as backward compatibility etc) doesn't perfectly match your specific language type system.

An idiomatic Go structure does not map well to an idiomatic rust structure, and similarly other languages.

athrowaway3z · 2 years ago

iirc https://github.com/tokio-rs/prost has the proper proto3 'optional' -> Option<T> support

wrs · 2 years ago

It depends on what you’re marshaling, and how much. Anyway, in my experience, the biggest practical disadvantage of JSON is not performance, it’s that people tend not to put a schema validation/migration layer on top of it, which leads over time to chaos. Many of the popular binary formats have schema validation built in.

packetlost · 2 years ago

Text formats have innate issues when it comes to deserialization: you don't know the size of each field of your data until you scan the entire field. This makes deserializers generally more prone to DOS attacks. Many binary formats must necessarily store the size of fields as metadata, which means you might not need to deserialize the whole message to interpret parts of it. Further, JSON doesn't support streaming natively. These are mostly unique use-cases with marginal improvements, but it can matter.

throwawaymaths · 2 years ago

Totally insane that people don't put validation since jsonschema is quite nice

adameasterling · 2 years ago

A few months ago, I did some analysis on JSON data in our platform, and discovered that more than two-thirds of bytes were field names. Two thirds! That’s a lot of potentially unnecessary extra bytes.

In my case, I was thinking about storage costs for JSON data; but thinking about this use case: Isn’t it true that the CPU would have to spend basically three times as much time making FFI calls? Assuming that practically speaking, field names make up 2/3rds of the bytes in their real-world JSON data.

evntdrvn · 2 years ago

JSON compresses very well, fortunately

importantbrian · 2 years ago

Isn't this the point of things like protobuf and thrift? It seems like everyone beyond a certain scale sends data around in some binary format rather than JSON.

bluejekyll · 2 years ago

Yes, I’m wondering if people have bad experiences with CBOR, thrift, flatbuffers, protobuf, messagepack, etc. There are a huge number. I think Protobuf might be the best choice across all languages at this point, maybe?

But that’s my question, if you choose a binary format, what are the issues to look out for?

Example, I work with Java, Rust, Go, and Python at work. I have specifically had issues with Avro (not my choice) which is really well supported in Java, but not much else.

alias_neo · 2 years ago

I would have suggested ProtoBuf too.

That said, there was a comment in their post about not wanting to tightly couple the Go and Rust since they weren't sure they could accurately represent their structures in both langs.

I don't know anything about this turbo tool, but it seems relatively low-throughput, and they intend to migrate it all to Rust anyway, so it's probably not a big deal.

That said; I'd have done this with binary comms and not JSON, ProtoBuf or not.

iamcalledrob · 2 years ago

I'm using protobuf for exactly this use-case, but I'd say that it's clearly designed as a network format first, and makes trade-offs as such.

For example, it prioritizes backwards and forwards compatibility -- which is not a concern for IPC where you control both ends. So no real optional or required fields. Comparing structures (Go) is awkward and uses reflection. Structs embed a mutex and preserve bytes for unknown fields etc...

MessagePack seems to make a better set of tradeoffs by comparison.

jamil7 · 2 years ago

They don't specify what types of issues they had with FFI which makes me somewhat dubious of the outcome here. There are tools to help paper over the rough edges of calling code through FFI, at least with Rust. Maybe the reasoning is more to do with the Go side.

bluejekyll · 2 years ago

From my experience, the issue is generally related to passing complex data structures across FFI boundaries. For this, adopting a serialization format is nicer than say deconstructing each object into proper parameters, which really end up being very similar code to that which is generated with the serialization libraries.

Their previous blog post (linked in the first sentence) about why the switch from Go to Rust is insightful.

I'm currently learning Go but have had my eye on Rust as well and have been wondering about the strengths and weaknesses of each language and how they compare and where each language is best applied.

The linked blog confirms my first impression that Go is well suited for networking applications and simplicity whereas Rust is well suited for OS/low-level applications.

Does anyone here have any experience using Go and Rust? What are your thoughts on the two languages?

jerf · 2 years ago

"Their previous blog post (linked in the first sentence) about why the switch from Go to Rust is insightful."

I was actually a bit astonished by it. The most trenchant concrete example they could give is... Go's standard library abstraction around file permissions is a bit inconvenient for their use case? This is easily fixed in Go in about a day without a complete rewrite; you just write some new stuff directly against the relevant syscall libraries and use build tags to control the different OS builds.

Just because an abstraction exists doesn't mean you have to use it! It is not a requirement of Go that you must use that particular abstraction, it's just something provided by the standard library. We once wrote a library for Perl using the openat and other *at functions because we needed the security guarantees, and it worked fine, despite the "standard library" not working that way.

My read is that they're basically doing this because they want to, not because Go forced them to. That's fine. There's nothing wrong with that, if they want to pay the price for migration. (Were someone proposing it to me in real life I'd expect a better reason than "I don't like how it handles file permissions in the standard library", though.) However, as an external observer using it as a grounds to decide it's much weaker than may meet the eye; I wouldn't overprivilege it.

The real reason to prefer Rust here would be something on the lines of "We're doing so much crazy concurrent stuff that we need the guarantees that Rust provides via compiler but Go only provides via common practices." The latter can carry you a long ways but it does eventually give out and become insufficient for a codebase, and when that's the case Rust becomes one of the short list of options. "Go common practices" is actually pretty high up on the set of "ways to do reasonable concurrency", reaching above that requires a pretty significant shift to Erlang/Elixir, Haskell, or Rust, and given the goals of the project that would basically leave Rust as the only acceptable performance choice. It's possible that this would qualify for them, though I'm not sure... from what it sounds like they're doing, a task management system that ensures that only one worker is doing a given task and everyone else waits on the relevant output without redoing it would be the core of their architecture, and even if you use that thousands of times per execution that doesn't necessarily mean the code is complex internally.

returningfory2 · 2 years ago

That article definitely buried the lede. A fair reading of the article is that the only reason they are switching from Go to Rust is because their dev team wants to write in Rust. Which is fine! But I think they could have been more open about that rather than inventing a strange complaint about how Go handles file permissions.

geodel · 2 years ago

Agree.

I have a mental category for these posts:

We saved some of cloud / server side cost with Rust, so you user can spend more on local desktop /cloud cost with Javascript

Sometimes cost savings can be imaginary but since Rust is cool it is all fine.

Mesopropithecus · 2 years ago

Depending on your background, learning either will be beneficial in that you'll learn about a C-like memory model. My experience is that Go is way easier to get started with (both in terms of the language and the libraries). It has a certain concurrency model baked into the language (which in Rust is available in the standard library). It's kind of a lower-level Python. Also I'd say that if you're not familiiar with either language, Go programs are probably easier to read, which can make a difference if you're not working alone on a project.

Rust takes a bit more effort to initially get results, but once you get there, you work in a language that gives you expressive ways to create abstraction (this is a matter of taste maybe), generates generally faster code, and does not incur overhead by garbage collection. Also, there are ways to write asynchronous code in an elegant way. The price is that you spend more time learning about structuring programs in a way that the compiler accepts. Once you are past that point, you'll be at least as productive as in Go.

I used to dabble with Go and like it, but once I had to write more code, I found it more tedious compared to Rust. But as I said, more readable to the uninitiated, that was one of the Go design goals.

diarrhea · 2 years ago

Rust is also poison for the mind. Sweet, sweet poison.

I’m currently back to Python and losing my mind about doing error handling, as well as enforcing correct usage of my library API on the call site. Doing these things well (best?) in Rust is baked into that language’s DNA (Result, Option, newtype pattern, type state pattern, …). It’s painful to go without once you’ve seen the light.

ar_lan · 2 years ago

I have no experience with Rust, but have been using Go professionally for ~5 years now. For a lot of the flak that Go gets (and praise that Rust gets), the Go standard library (and supporting ecosystem) indeed is fantastic for anything relating to a web server.

When I analyzed Rust around the same time, I noted the Rust standard library [1] did not have HTTP support, and it wasn't a first class consideration. I think Hyper [2] was around, but I've never analyzed it deeply (though based on GitHub stars it seems to be popular). Protobuf [3] is also extremely easy to work with in Go.

Given the differences in standard library and applications I see created in both ecosystems, your analysis seems right, though you can probably develop network(ed) applications in both fairly well at this point.

[0]: https://pkg.go.dev/std

[1]: https://doc.rust-lang.org/std/

[2]: https://github.com/hyperium/hyper/

[3]: https://github.com/golang/protobuf

juancampa · 2 years ago

These days Axum is the most common way of building an HTTP server. It sits atop Hyper and it’s very composable thanks to Tower. It supports Websockets out of the box too.

throwaway894345 · 2 years ago

Yeah, and I genuinely want to like Rust and I pick it up a few times every year (and have done for the last decade), but each time I get burned by trying to do something that would be trivial (and safe!) in Go or C.

Most recently, I'm trying to build a non-allocating, minimally copying `Lines` iterator which reads from an internal `Reader` into an internal buffer and then yields slices of the internal buffer on each call to `next` (each slice represents a single line, and it's an error if a line exceeds the capacity of the internal buffer); however, as far as I can tell, this is unworkable without some unsafety because the signature is `fn next(self: &mut Lines<'a>) -> Result<&'a [u8]>` which doesn't work because Rust thinks the mutable reference to self must outlive 'a, and explicitly setting the mutable self reference to 'a violates the trait. If I forego the trait and just make a thing with a `next()` method, then I can't use it in a loop without triggering some multiple mutable borrows error (each loop iteration constitutes a mutable borrow and for some reason these borrows are considered to be concurrent). The only thing I can think to do is have a `scan()` method that finds the next newline and notes its location inside the `Lines` struct and a separate `line()` method that actually fetches the resulting slice from the buffer.

I don't run into this in C or Go, and for all of the difficulty of battling the borrow checker, I'm not getting any extra safety (in this case).

za3faran · 2 years ago

> Does anyone here have any experience using Go and Rust? What are your thoughts on the two languages?

I've worked on an extremely large golang monorepo, but I don't have production Rust experience. They're vastly different languages that they shouldn't really be compared. The only commonality they have is that they both compile to native code, that's about it. Other than that, golang is basically a python/ruby/perl/php competitor, whereas Rust is a C++ competitor.

didntcheck · 2 years ago

Yeah I'm quite confused at how obsessed people are with comparing these two languages specifically. IMO it would make most sense to compare Go to C# or a JVM language. People seem to be thinking that the (good) idea of having it compile to a single executable out of the box means it's in the same class of languages as C, when it's still a managed language - it just bundles the runtime

bluefishinit · 2 years ago

> Does anyone here have any experience using Go and Rust? What are your thoughts on the two languages?

Rust has a much better type system, Go has a much better standard library. The problems with Rust are that the async solutions are shifting sands and that most people "cheat" by just shoving everything into a Box on the heap instead of properly using lifetimes on the stack. Both of those lead to a lot of inconsistency and sort of defeat the purpose of using Rust in the first place.

The main problem with Go is verbose error handling, it may also be too high level for some systems level tasks Rust may be better suited to.

sophacles · 2 years ago

Theres lots of purposes of using rust in the first place.

Some people want the performance.

Some people want the safety guarantees and the checking the compiler does for correctness.

Some people want the expressive type system (and combined with the above, the ability to encode constraints into data types).

I don't think that people who choose based on the third but don't need super tight performance are using it wrong, they just have a different set of priorities than those that are using it for very high performance.

mcqueenjordan · 2 years ago

Yes, experience in both. I'm considerably more inclined to use Rust for any use case, but I'll freely admit that I'm biased because it appeals to my sensibilities more.

IMO the most compelling use case to use Go over Rust is if you're writing a networked service or networking code.

sophacles · 2 years ago

Having used both Go and Rust for networked and networking code, I'd say it's a pretty nuanced set of tradeoffs between the two.

I've written a couple protocol handlers in rust (that is parse the packet and handle the logic before giving the data to some other bit of code) and I found Rust really nice for that - real enums, pattern matching, and the use of traits can make interacting with binary protocols (particularly of the state machine variety) really nice with ergonomic interfaces. Meanwhile go's net/IP type (etc) often leave me frustrated.

On the other hand, goroutines and channels are such a nice way to handle a lot of message passing around a server application that I find myself reaching for mpsc channels and using tokio tasks like clunkier goroutines in rust often.

I think I'd draw the line more along "how often I need to work with byte arrays that I want to turn into semantic types" - if I'm going down to the protocol level and not doing a ton of high level logic I'll reach for rust, if it's more high level handling of requests/responses (http for example) I'll often reach for go.

(like the parent, I'm more inclined to reach for Rust over go if all else is equal, but I enjoy go too).

phamilton · 2 years ago

I have done both. Currently moving a number of microservices from node/ruby/golang to a rust monolith. Slow and steady progress.

Our biggest struggle with golang was that our Ruby/FP backgrounds just clashed with golang's style. It wasn't terrible, but we always felt it was more verbose and clunkier to compose and reuse things than we expected. I'm sure generics are making this better, but even with generics the general feel of the type system just felt off to us.

Our monolith is http/graphql + postgres in Rust and we like it. Some learning curve for the team, but most of our "application code" is pretty easy Rust and people can jump in and work on it quickly enough. Most work is defining in/out types and implementing a few key traits for them. Our client engineers are comfortable doing it. Our "app infra code" is a little more technical, but we've appreciated the strong type system as we've built out pubsub systems and other important bits of infra.

Ultimately though, it's just a culture thing. I'd recommend one of jvm/rust/golang to anyone trying to build apps that handle significant traffic. Which of those 3 is just whichever matches up to your team and their personalities.

bluejekyll · 2 years ago

I don’t think comparing Go and Rust is particularly useful. The languages have different goals. I don’t want to start a flame war, but I prefer Rust to Go for a lot of reasons, mainly the lack of runtime, the type system, as well as more compile time safety guarantees.

super_flanker · 2 years ago

I have used Go for my day job, I'd say you can get the job done in Go. But the resulting code won't be as elegant as you could do in Rust. I'd chose Rust just because how good the type system is: sum type, pattern matching, iterators, typestate pattern, etc makes programming a pleasure. Navigating async code takes little getting used to though. And be ready for reading many new symbols i.e. ', ~, etc.

I've been programming for 12+ years and it gets real boring when you've to type/copy same thing again and again, and Go doesn't want to help you here.

throwasdfjwe7 · 2 years ago

You should learn both and be able to use both comfortably.