Interestingly enough, by following up some references of the article I discovered that Go is also following up on Java and .NET design decisions, that maybe could be there early on.
Note that both of those features use generics, which weren't present in the language before, so IMO Go just prefers to wait a bit before implementing useful features instead of having too many of them (ahem, Swift?)
I think that’s a bit of a stretch to say go will implement all the features of c# and Java because of a few new features. Go isn’t a frozen language, they just take a lot of time and discussion before committing to a major change.
The point isn't implementing all the features of c# and Java, rather doubling down on their simplicity mantra against all programming language complexity, only to revisit such decisions years later, because after all the other programming languages had good reasons to have such features in first place.
Interning is neat. Most of my experience is really dated. Primarily in the JVM, and mostly for class names, for reflection and class loaders. It's sort of surprising seeing this added to go, with its desires for minimalism. But when you can use it, it can be a big win.
Look past the "loading the whole book in memory" the author gets to the point soon enough.
The ip address example is ok. It's true, and highlights some important points. But keep in mind pointers are 64 bit. If you're not ipv6, and you're shuffling a lot of them, you're probably better off just keeping the uint64 and converting to string and allocating the struct as needed. interning doesn't appear to be much of a win in that narrow case. but if you do care about ipv6, and you're connecting to millions of upstreams, it's not unreasonable.
It's neat it's available. it's good to be aware of interning, but it's generally not a huge win. For a few special cases, it can be really awesome.
Fun fact: in Go, an IPv4 address is internally represented as an IPv6 address, starting with ten zeroes and two 0xffs. The IPv4 address is copied in the last four bytes.
The article also mentions strings.Clone which has been around for a while. Using that is very easy and it stops big strings being pinned into memory. I had a problem with this in the S3 backend where some string was pinning the entire XML response from the server which strings.Clone fixed.
This is new for Go? I remember learning about Java string interning decades ago in the context of xml parsers. If I remember correctly, there were even some memory leaks associated with it and thread locals?
You could've implemented bespoke interning at any point in Go; it was added to the standard library only recently, though, likely because it may leverage Go's relatively recent support for generics.
I missed the initial blogpost about this; thanks for the solid explanation and the links. Probably won't make much of a difference for my use cases but cool to know this is now in the stdlib.
Cool idea, but sounds detrimental in terms of cache efficiency. Typically processing a string by reading it sequentially is quite cache efficient as the processor will prefetch, but with this method it seems like the string will not be contiguous in memory which will lead to more cache misses.
It's quite common if you need to parse a schema and keep it in memory for a long time. We're working on GraphQL federation gateway and interning strings is really useful there due to schemas sometimes being hundreds of megabytes.
Writing your own interner is under hundred lines of code and takes about 15 minutes. Writing a thread-safe interner is a bit trickier problem, but there are good libraries for that.
Lots of small strings sparsely spread over 282MB is pretty cache inefficient?
Whereas the separate small allocations will likely end up close together, which gives a better chance that a given cache line will contain several rather than just one.
- Deprecating finalizers and using cleaner queues (https://openjdk.org/jeps/421)
- Weak references (https://learn.microsoft.com/en-us/dotnet/api/system.weakrefe..., https://docs.oracle.com/javase/8/docs/api/java/lang/ref/Weak...)
Related tickets,
"runtime: add AddCleanup and deprecate SetFinalizer" - https://github.com/golang/go/issues/67535
"weak: new package providing weak pointers" - https://github.com/golang/go/issues/67552
One day Go will eventually have all those features that they deemed unnecessary from "complex" languages.
Dead Comment
Look past the "loading the whole book in memory" the author gets to the point soon enough.
The ip address example is ok. It's true, and highlights some important points. But keep in mind pointers are 64 bit. If you're not ipv6, and you're shuffling a lot of them, you're probably better off just keeping the uint64 and converting to string and allocating the struct as needed. interning doesn't appear to be much of a win in that narrow case. but if you do care about ipv6, and you're connecting to millions of upstreams, it's not unreasonable.
It's neat it's available. it's good to be aware of interning, but it's generally not a huge win. For a few special cases, it can be really awesome.
** edit uint32 for ipv4. bit counting is hard.
https://cs.opensource.google/go/go/+/refs/tags/go1.23.1:src/...
This is called an IPv4-Mapped Address.
https://www.rfc-editor.org/rfc/rfc5156#section-2.2
People often want to have millions of S3 objects in memory and reducing the memory used would be very desirable.
I interned all the strings used - there are a lot of duplicates like Content Type and it reduced the memory usage by about 30% which is great.
I wonder how much difference this little fix mentioned in the article for go1.23.2 will make? https://github.com/golang/go/issues/69370
The article also mentions strings.Clone which has been around for a while. Using that is very easy and it stops big strings being pinned into memory. I had a problem with this in the S3 backend where some string was pinning the entire XML response from the server which strings.Clone fixed.
[1] http://clhs.lisp.se/Body/f_intern.htm
Writing your own interner is under hundred lines of code and takes about 15 minutes. Writing a thread-safe interner is a bit trickier problem, but there are good libraries for that.
Whereas the separate small allocations will likely end up close together, which gives a better chance that a given cache line will contain several rather than just one.