Interning in Go - Readit News

Interestingly enough, by following up some references of the article I discovered that Go is also following up on Java and .NET design decisions, that maybe could be there early on.

- Deprecating finalizers and using cleaner queues (https://openjdk.org/jeps/421)

- Weak references (https://learn.microsoft.com/en-us/dotnet/api/system.weakrefe..., https://docs.oracle.com/javase/8/docs/api/java/lang/ref/Weak...)

Related tickets,

"runtime: add AddCleanup and deprecate SetFinalizer" - https://github.com/golang/go/issues/67535

"weak: new package providing weak pointers" - https://github.com/golang/go/issues/67552

One day Go will eventually have all those features that they deemed unnecessary from "complex" languages.

nasretdinov · a year ago

Note that both of those features use generics, which weren't present in the language before, so IMO Go just prefers to wait a bit before implementing useful features instead of having too many of them (ahem, Swift?)

pjmlp · a year ago

They use generics because they are available now, otherwise most likely they would be magic functions like before.

badhombres · a year ago

I think that’s a bit of a stretch to say go will implement all the features of c# and Java because of a few new features. Go isn’t a frozen language, they just take a lot of time and discussion before committing to a major change.

pjmlp · a year ago

The point isn't implementing all the features of c# and Java, rather doubling down on their simplicity mantra against all programming language complexity, only to revisit such decisions years later, because after all the other programming languages had good reasons to have such features in first place.

Dead Comment

Interning is neat. Most of my experience is really dated. Primarily in the JVM, and mostly for class names, for reflection and class loaders. It's sort of surprising seeing this added to go, with its desires for minimalism. But when you can use it, it can be a big win.

Look past the "loading the whole book in memory" the author gets to the point soon enough.

The ip address example is ok. It's true, and highlights some important points. But keep in mind pointers are 64 bit. If you're not ipv6, and you're shuffling a lot of them, you're probably better off just keeping the uint64 and converting to string and allocating the struct as needed. interning doesn't appear to be much of a win in that narrow case. but if you do care about ipv6, and you're connecting to millions of upstreams, it's not unreasonable.

It's neat it's available. it's good to be aware of interning, but it's generally not a huge win. For a few special cases, it can be really awesome.

** edit uint32 for ipv4. bit counting is hard.

wjholden · a year ago

Fun fact: in Go, an IPv4 address is internally represented as an IPv6 address, starting with ten zeroes and two 0xffs. The IPv4 address is copied in the last four bytes.

https://cs.opensource.google/go/go/+/refs/tags/go1.23.1:src/...

This is called an IPv4-Mapped Address.

https://www.rfc-editor.org/rfc/rfc5156#section-2.2

oefrha · a year ago

No, the example isn't about IP address octets, it's about IPv6 zones — we're talking strings like "eth0".

jfoutz · a year ago

survivedurcode · a year ago

Beware the trade-offs of interning affecting GC behavior. Now you can’t have a stack-allocation optimization, for example.

tapirl · a year ago

The interning feature should be only used for some rare cases, it is not intended to be used widely.

nickcw · a year ago

The unique package is my top feature for go1.23. I've been experimenting with it in rclone.

People often want to have millions of S3 objects in memory and reducing the memory used would be very desirable.

I interned all the strings used - there are a lot of duplicates like Content Type and it reduced the memory usage by about 30% which is great.

I wonder how much difference this little fix mentioned in the article for go1.23.2 will make? https://github.com/golang/go/issues/69370

The article also mentions strings.Clone which has been around for a while. Using that is very easy and it stops big strings being pinned into memory. I had a problem with this in the S3 backend where some string was pinning the entire XML response from the server which strings.Clone fixed.

Be aware that there is a bug in the current implementation (1.23.0 and 1.23.1) https://github.com/golang/go/issues/69370

User23 · a year ago

For reference, the term comes from Lisp’s INTERN. [1]

[1] http://clhs.lisp.se/Body/f_intern.htm

morkalork · a year ago

This is new for Go? I remember learning about Java string interning decades ago in the context of xml parsers. If I remember correctly, there were even some memory leaks associated with it and thread locals?

pphysch · a year ago

You could've implemented bespoke interning at any point in Go; it was added to the standard library only recently, though, likely because it may leverage Go's relatively recent support for generics.

peterldowns · a year ago

I missed the initial blogpost about this; thanks for the solid explanation and the links. Probably won't make much of a difference for my use cases but cool to know this is now in the stdlib.

cherryteastain · a year ago

Cool idea, but sounds detrimental in terms of cache efficiency. Typically processing a string by reading it sequentially is quite cache efficient as the processor will prefetch, but with this method it seems like the string will not be contiguous in memory which will lead to more cache misses.

pimeys · a year ago

It's quite common if you need to parse a schema and keep it in memory for a long time. We're working on GraphQL federation gateway and interning strings is really useful there due to schemas sometimes being hundreds of megabytes.

Writing your own interner is under hundred lines of code and takes about 15 minutes. Writing a thread-safe interner is a bit trickier problem, but there are good libraries for that.

ctz · a year ago

Lots of small strings sparsely spread over 282MB is pretty cache inefficient?

Whereas the separate small allocations will likely end up close together, which gives a better chance that a given cache line will contain several rather than just one.

SwiftyBug · a year ago

Seems like the classic trade-off of compute time x memory.