Golang's big miss on memory arenas

bilbo-b-baggins · 4 days ago

Man this person is mediocre at best. You can do fully manual memory management in Go if you want. The runtime is full of tons of examples where they have 0-alloc, Pools, ring buffers, Assembly, and tons of other tricks.

If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.

But like… the write up completely missed that manual memory management exists, and Golang considers it “unsafe” and that’s a design principle of the language.

You could argue that C++ RAII overhead is “bounded performance” compared to C. Or that C’s stack frames are “bounded performance” compared to a full in-register assembly implementation of a hot loop.

But that’s bloody stupid. Just use the right tool for the job and know where the tradeoffs are, because there’s always something. The tradeoff boundary for an individual project or person is just arbitrary.

stingraycharles · 4 days ago

As someone who writes Go code that processes around 100B messages per day (which all need to be parsed and transformed), I can confirm that the author’s position is very much misguided.

And it also completely ignores the fascinating world of “GC-free Java”, which more than a few of the clients I work with use: Java with garbage collection entirely disabled. It’s used in finance a lot.

Is it pretty? No.

Is it effective? Yes.

Regarding Go’s memory arenas, do you need to use memory arenas everywhere ? Absolutely not. Most high performance code has a hot part that’s centered (like the tokenizer example that OP used). You just make that part reuse memory instead of alloc / dealloc and that’s it.

silisili · 4 days ago

Same. I'm genuinely confused by all the comments of 'ah man, this is holding me back' in this thread, and folks claiming it's not possible to do any arena tricks in Go.

I'm not sure if these are just passerbys, or people who actually use Go but have never strayed from the std lib.

swid · 4 days ago

This isn't true in practice because you won't be able to control where allocations are made in the dependencies you use, including inside the Go standard library itself. You could rewrite/fork that code, but then you lose access to the Go ecosystem.

The big miss of the OP is that it ignores the Go region proposal, which is using lessons learned from this project to solve the issue in a more tractable way. So while Arenas won't be shipped as they were originally envisioned, it isn't to say no progress is being made.

jitl · 4 days ago

I had to fork go’s CSV to make it re-use buffers and avoid defensive copies. But im not sure an arena api is a panacea here - even if i can supply an arena, the library needs certain guarantees about how memory it returns is aliased / used by the caller. Maybe it would still defensive copy into the arena, maybe not. So i don’t see how taking arena as parameter lets a function reason about how safely it can use the arena.

sheepscreek · 4 days ago

I personally loved using Go 8 years ago. When I built a proof of concept for a new project in both Go and Rust, it became clear that Rust would provide the semantics I’m looking for out of the box. Less fighting with the garbage collector or rolling out my own memory management solution.

If I’m doing that with a lot of ugly code - I might as well use idiomatic Zig with arenas. This is exactly the point the author tried to make.

Your last paragraph captures the tension perfectly. Go just isn’t the tool we thought for some jobs, and maybe that’s okay. If you’re going to count nanoseconds or measure total allocations, it’s better to stick to a non-GC language. Or a third option can be to write your hot loops in one such language; and continue using Go for everything else. Problem solved.

9rx · 4 days ago

> Go just isn’t the tool we thought for some jobs

Go made it explicitly clear when it was released that it was designed to be a language that felt dynamically-typed, but with performance closer to statically-typed languages, for only the particular niche of developing network servers.

Which job that needs to be a network server, where a dynamically-typed language is a appropriate, does Go fall short on?

One thing that has changed in the meantime is that many actually dynamically-typed languages have also figured out how to perform more like a statically-typed language. That might prompt you to just use one of those dynamically-typed languages instead, but I'm not sure that gives any reason to see Go as being less than it was before. It still fulfills the expectation of having performance more like a statically-typed language.

wolvesechoes · 4 days ago

> Or a third option can be to write your hot loops in one such language; and continue using Go for everything else. Problem solved.

Or use Go and write ugly code for those hot loops instead of introducing another language and build system. Then you can still enjoy nicety of GC in other parts of your code.

cafxx · 4 days ago

> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.

A word of caution. If you do this and then you store pointers into that slice, the GC will likely not see them (as if you were just storing them as `uintptr`s)

ncruces · 4 days ago

You need to ensure that everything you put in the arena only references stuff in the same arena.

No out pointers. If you can do that, you're fine.

0xjnml · 4 days ago

> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.

Only if the type is not a pointer per se or does not contain any inner pointers.

Otherwise the garbage collector will bite you hard.

zbentley · 4 days ago

Types with inner pointers add difficulty to be sure, but it’s still possible to use them with this pattern. You have to make sure of three things to do so: 1) no pointers outside of the backing memory; 2) an explicit “clear()” function that manually nulls out inner pointers in the stored object (even inner pointers to other things in the backing slice); 3) clear() is called for all such objects that were ever stored before the backing slice is dropped and before those objects are garbage collected.

bborud · 4 days ago

Do you have some tips for blog postings, code, articles that explore these topics in Go?

raggi · 4 days ago

> You can do fully manual memory management in Go if you want. The runtime is full of tons of examples where they have 0-alloc, Pools, ring buffers, Assembly, and tons of other tricks.

The runtime only exposes a small subset of what it uses internally and there's no stable ABI for runtime internals. If you're lucky to get big enough and have friends they might not break you, some internal linkage is being preserved, but in the general case for a general user, nope. Updates might make your code untenable.

> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.

AIUI the prior proposals still provided automated lifetime management, though that's related to various of the standing concerns, so you can't match that from "userspace" of go, finalizers don't get executed on a deterministic schedule. Put simply: that's not the same thing.

As someone else points out this is also much more fraught with error than just typing what you described. On top of the GC issue pointed out already, you'll also hit memory model considerations if you're doing any concurrency, which if you actually needed to do this surely you are. Once you're doing that you'll run into the issue, if you're trying to compete with systems languages, that Go only provides a subset of the platform available memory model, in the simplest form it only offers acq/rel atomic semantics. It also doesn't expose any notion of what thread you're running on (which can change arbitrarily) or even which goroutine you're running on. This limits your design space quite significantly at the bounds your performance for high frequency small region operations. I'd actually hazard an educated guess that an arena written as you casually suggest would perform extremely poorly at any meaningful scale (lets say >=32 cores, still fairly modest).

> You could argue that C++ RAII overhead is “bounded performance” compared to C. Or that C’s stack frames are “bounded performance” compared to a full in-register assembly implementation of a hot loop. > But that’s bloody stupid. Just use the right tool for the job and know where the tradeoffs are, because there’s always something. The tradeoff boundary for an individual project or person is just arbitrary.

Sure, reducto ad absurdum, though I typically would optimize against the (systems language) compiler long before I drop to assembly, it's 2025 systems compilers are great and have many optimizations, intrinsics and hints.

> Man this person is mediocre at best.

Harsh, I think the author is fine really. I think their most significant error isn't in missing or not discussing difficult other things they could do with Go, it's seemingly being under the misconception prior to the Arena proposal that Go actually cedes control for lower level optimization. It doesn't, and it never has, and it likely never will (it will gain other semi-generalized internal optimizations over time, lots of work goes into that).

In some cases you can hack some in on your own, but Go is not well placed as a "systems language" if you mean by that something like "competitive efficiency at upper or lower bound scale tasks", it is much better placed as a framework for writing general purpose servers at middle scales. It's best placed on systems that don't have batteries, and that have plenty of ram. It'll provide you with a decent opportunity to scale up and then out in that space as long as you pay attention to how you're doing along the way. It'll hurt if you need to target state of the art efficiency at extreme ends, and very likely block you wholesale.

I'm glad Go folks are still working on ideas to try to find a way for applications to get some more control over allocations. I'm also not expecting a solution that solves my deepest challenges anytime soon though. I think they'll maybe solve some server cases first, and that's probably good, that's Go's golden market.

tptacek · 5 days ago

The vibe I get from this post is of someone who hasn't routinely used arenas in the past and thinks they're kind of a big deal. But a huge part of the point of an arena is how simple it is. You can just build one. Meanwhile, the idea that arena handles were going to be threaded through every high-allocation path in the standard library is fanciful.

foobiekr · 5 days ago

Two big issues in Golang are that you can't actually build an arena allocator that can be used for multiple types in a natural way.

The other is that almost no library is written in such a way that buffer re-use is possible (looking at you, typical kafka clients that throw off a buffer of garbage per message and protobuf). The latter could be fixed if people paid more attention to returning buffers to the caller.

bilbo-b-baggins · 4 days ago

You totally can build it using unsafe and generics. I’ve done it with mmap-backed byte slices for arbitrary object storage.

fpoling · 4 days ago

Rust also suffers from libraries returning a newly allocated strings and vectors when the code should allow to pass a pre-existing string or vector to place the results.

Granted the latter leads to more verbose code and chaining of several calls is no longer possible.

But I am puzzled that even performance-oriented libraries both in Go and Rust still prefer to allocate the results themselves.

ndr · 5 days ago

I'm curious, do you have any arena experience out of c/cpp/rust/zig?

It may be that you can "just" build one, but you can't "just" use it and expect any of the available libraries and built ins to work with it.

How many things would you have to "just" rewrite?

tptacek · 5 days ago

There was never a proposal to automate arenas in Go code, and that wouldn't even make sense: the point of arenas is that you bump-allocate until some program-specific point where you free all at once (that's why they're so great for compiler code, where you do passes over translation units and can just do the memory accounting at each major step).

(Yes: I used arenas a lot when I was shipping C code; they're a very easy way to get big speed boosts out of code that does a lot of malloc).

9rx · 5 days ago

> How many things would you have to "just" rewrite?

The same ones you'd have to rewrite using the (experimental) arenas implementation found in the standard library. While not the only reason, this is the primary reason for why it was abandoned; it didn't really fit into the ecosystem the way users would expect — you included, apparently.

"Memory regions" is the successor, which is trying to tackle the concerns you have. Work on it is ongoing.

Deleted Comment

ideal_gas · 5 days ago

> By killing Memory Arenas, Go effectively capped its performance ceiling.

I'm still optimistic about potential improvements. (Granted, I doubt there will be anything landing in the near future beyond what the author has already mentioned.)

For example, there is an ongoing discussion on "memory regions" as a successor to the arena concept, without the API "infection" problem:

https://github.com/golang/go/discussions/70257

cafxx · 4 days ago

There's a bunch of activity ongoing to make things better for memory allocation/collection in Go. GreenTeaGC is one that has already landed, but there are others like the RuntimeFree experiment that aims at progressively reduce the amount of garbage generated by enabling safe reuse of heap allocations, as well as other plans to move more allocations to the stack.

Somehow concluding that "By killing Memory Arenas, Go effectively capped its performance ceiling" seems quite misguided.

pjmlp · 4 days ago

That one is kind of interesting given the past criticism of Java and .NET having too many GCs and knobs.

With time Go is also getting knobs, and turns out various GC algorithms are actually useful.

cafxx · 4 days ago

Not sure what you are referring to. There are no knobs involved in the things I mentioned (aside from the one to enable the experiment, but that's just temporary until the experiment completes - one way or the other).

silisili · 5 days ago

I'm a bit split on this one.

Simple arenas are easy enough to write yourself, even if it does make unidiomatic code as the author points out. Pretty much anything that allocates tons of slices sees a huge performance bump from doing this. I -would- like that ability in an easier fashion.

On the other, hand, new users will abuse arenas and use them everywhere because "I read they are faster", leading to way worse code quality and bugs overall.

I do agree it would become infectious. Once people get addicted to microbenchmarking code and seeing arenas a bit faster in whatever test they are running, they're going to ask that all allocating functions often used (especially everything in http and json) have the ability to use arenas, which may make the code more Zig-like. Not a dig at Zig, but that would either make the language rather unwieldy or double the number of functions in every package as far as I can see.

riku_iki · 5 days ago

> Simple arenas are easy enough to write yourself

you can write arena yourself, but it is useless if lang doesn't allow you to integrate it, e.g. allocate objects and vars on that arena..

silisili · 4 days ago

You can, for some things, but it's a rather ugly process. I've mainly used it with slices and strings. So not useless, but certainly not full featured or simple.

cyberax · 5 days ago

Go now has memory regions, an automatic form of arenas: https://go.googlesource.com/proposal/+/refs/heads/master/des...

I think the deeper issue is that Go's garbage collector is just not performant enough. And at the same time, Go is massively parallel with a shared-everything memory model, so as heaps get bigger, the impact of the imperfect GC becomes more and more noticeable.

Java also had this issue, and they spent decades on tuning collectors. Azul even produced custom hardware for it, at one point in time. I don't think Go needs to go in that direction.

jimbokun · 4 days ago

Regions seem like a much cleaner and simpler solution to this problem.

sc68cal · 4 days ago

This does make me appreciate some of the decisions that Zig has made, about passing allocators explicitly and also encouraging the use of the ArenaAllocator for most programs.

Since Zig built up the standard library where you always pass an allocator, they avoided the problem that the article mentions, about trying to retrofit Go's standard library to work with an arena allocator.

Although, that's not the case for IO in Zig. The most recent work has actually been reworking the standard library to be where you explicitly pass IO like you pass an allocator.

But it's still a young language so it's still possible to rework it.

I really do enjoy using the arena allocator. It makes things really easy, if your program follows a cyclical pattern where you allocate a bunch of memory and then when you're done just free the entire arena

didibus · 5 days ago

Interesting that it never talks about direct competitors to the "middle ground" as well, like Java, C#, Erlang, Haskell, various Lisps, etc.

9rx · 5 days ago

Not all that interesting when you think about it. Doing so would lead to having to admit that the Go team was right that the proposed arena solution isn't right; that there is a better solution out there. Which defies the entire basis of the blog post. The sunk cost fallacy wouldn't want to see all the effort put into the post go to waste upon realizing that the premise is flawed.

The post could have also mentioned that the Go project hasn't given up. There are alternatives being explored to accomplish the same outcome. But, as before, that would invalidate the basis of the post and the sunk cost fallacy cannot stand the idea of having to throw the original premise into the trash.

Dead Comment