Go does not need a Java-style GC

> In a multithreaded program, a bump allocator requires locks. That kills their performance advantage.

Java uses per-thread pointer bump allocators[1]

> While Java does it as well, it doesn’t utilize this info to put objects on the stack.

Correct, but it does scalar replacement[2] which puts them in registers instead

> Why can Go run its GC concurrently and not Java? Because Go does not fix any pointers or move any objects in memory.

Most java GCs are concurrent[3], if you want super low pauses you can get those too[4][5]. Pointers can get fixed while the application is running with GC barriers

[1]: https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/

[2]: https://shipilev.net/jvm/anatomy-quarks/18-scalar-replacemen...

[3]: https://shipilev.net/jvm/anatomy-quarks/3-gc-design-and-paus...

[4]: https://wiki.openjdk.java.net/display/zgc/Main

[5]: https://wiki.openjdk.java.net/display/shenandoah

sam_bishop · 4 years ago

I agree. The author seems to know quite a bit about Go and GCs, but doesn't seem to have much experience with Java. As a Java performance engineer, it sounds like he is comparing Go to how he thinks Java works based on what he's read about it.

pjmlp · 4 years ago

Additionally he doesn't seem to know that much about C#, which also has advanced GC, while allowing for C++ like memory management, if needed.

Deleted Comment

mappu · 4 years ago

"Scalar replacement" explodes the object into its class member variables and does not construct a class object at all. That does result in the exact same `sub %esp` (that Go would do for any struct), but it is restricted to only working if every single usage of that class type is fully inlined and the class is never passed anywhere that needs it in its object form.

It's worse than what Go has. Go can stack-allocate any struct and still pass pointers to it to non-inlined functions.

pkolaczk · 4 years ago

Scalar replacement does not work even in very trivial cases: https://pkolaczk.github.io/overhead-of-optional/

In all those cases, Optionals were inlined, didn't escape, yet they haven't been properly optimized out.

altfredd · 4 years ago

Scalar replacements (as currently implemented in Java) does not work in real-world programs.

Well-written code does not need it. Poorly written code can not trigger it, because the JIT is too dumb and isn't getting better.

There is no sane test to determine whether a piece of code will be inlined in Java. In practice anything more complex than byte array is unlikely to be inlined. Even built-in ByteBuffers aren't! Meanwhile Go compiler treats Go slices just as nicely or better than arrays.

native_samples · 4 years ago

The JIT is getting better. Major escape analysis upgrades are a big part of where Graal (a drop-in replacement for the HotSpot JIT) gets its performance boosts. EA definitely does work well there because Truffle depends on escape analysis and scalar replacement very heavily. GraalVM CE is better than regular HotSpot at doing it and GraalVM EE is even better again.

Deleted Comment

pjmlp · 4 years ago

Which JIT? There are plenty of them to choose from across Java implementations.

majou · 4 years ago

ZGC[4] in particular has me excited, enough so to want to pick up a JVM language.

Thaxll · 4 years ago

ZGC and Shenandoah can be slower than G1, those are not silver bullets. The fact that there is 4-5 GCs explains the situation, there is not a single GC that is better than the others.

It really depends of the workload.

the-alchemist · 4 years ago

ZGC is already available, since JDK 15, September 2020. =)

https://wiki.openjdk.java.net/display/zgc/Main#Main-ChangeLo...

Cool article, I'm not sure I agree with the headline.

I used to write low-scale Java apps, and now I write memory intensive Go apps. I've often wondered what would happen if Go did have a JVM style GC.

It's relatively common in Go to resort to idioms that let you avoid hitting the GC. Some things that come to mind:

* all the tricks you can do with a slice that have two slice headers pointing to the same block of memory [1]

* object pooling, something so common in Go it's part of the standard library [2]

Both are technically possible in Java, but I've never seen them used commonly (though in fairness I've never written performance critical Java.) If Go had a more sophisticated GC, would these techniques be necessary?

Also Java is supposed to be getting value types soon (tm) [3]

[1] https://ueokande.github.io/go-slice-tricks/

[2] https://pkg.go.dev/sync#Pool

[3] https://openjdk.java.net/jeps/169

papercrane · 4 years ago

Object pooling in Java used to be fairly common. I don't see it much anymore in new code, but used to run into it all the time when writing code for Java 1.4/5. Even Sun used pooling when they wrote EJBs. Individual EJBs can be recycled instead of released to the GC.

Nowadays the GC implementations are good enough that's it's not worth the effort and complexity.

Though now that I think about it Netty provides an object pooling mechanism.

sam_bishop · 4 years ago

Pooling objects (for the purposes of minimizing GC) is consider a bad practice in modern Java. The article suggests that compacting, generational collectors are a bad thing, but they can dramatically speed up the amount of time it takes to deallocate memory if most of your objects in a given region of memory are now dead. All you have to do is remove objects that are still alive, and you're done: that region is now available for use again. The result is that long-lived objects have a greater overhead.

fh973 · 4 years ago

Does object pooling still make sense for direct ByteBuffers nowadays?

munificent · 4 years ago

> Both are technically possible in Java, but I've never seen them used commonly (though in fairness I've never written performance critical Java.)

I don't know about the Java world, but in C#—especially in games written in Unity—object pooling is very common.

TideAd · 4 years ago

Writing High Performance .NET Code (https://www.writinghighperf.net/) has a chapter on this. In C#, time spent collecting depends on the number of still-living objects. That means you want objects you allocate to be short-lived (dead by the time GC happens) or to live forever (they go to the gen 2 heap and stay there). The book suggests object pooling when the lifetime of objects is between those two extremes, or when objects are big enough for the Large Object Heap.

But at the end of the section, the book says:

  I do not usually run to pooling as a default solution. As a general-purpose mechanism, it is clunky and error-prone. However, you may find that your application will benefit from pooling of just a few types.

What kind of things do you pool in Unity?

adrr · 4 years ago

I've seen it on a large site written in c#. Object pools of stream objects for serializing and deserializing data. This was 10 years ago.

jillesvangurp · 4 years ago

Java has a pretty decent standard library with different list, map and set implementations and quite a few third party libraries with yet more data structures. Honestly, Go felt a bit primitive and verbose to me on that front on the few times I used it. Simplicity has a price and some limitations.

There are also other tricks you can do like for example using off heap memory (e.g. Lucene does this), using array buffers, or using native libraries. There obviously is a lot of very memory intensive, widely used software written for the JVM and no shortage of dealing with all sorts of memory related challenges. I'd even go as far as to argue that quite a few of those software packages might be a little out of the comfort zone for Go. Maybe if it were used more for such things, there would be increased demand for better GCs as well?

Object pooling is pretty common for things like connection pools. For example apache commons pool is used for doing connection pooling (database, http, redis, etc.) in Spring Boot and probably a lot more products. Also there are thread pools, worker pools and probably quite a few more that are pretty widely used and quite a few of those come with the Java standard library. Caching libraries are also pretty common and well supported popular web frameworks like Spring.

A typical Java based search or database software product (Elasticsearch, Kafka, Casandra, etc.) is likely to use all of the above. Likewise for things like Hadoop, Spark, Neo4j, etc.

Of course there's a difference between Java the language and the JVM, which is also targeted by quite a few other languages. For example, I've been using Kotlin for the last few years. There are functional languages like Scala and Clojure. And people even run scripting languages on jython, jruby, groovy, or javascript on it.

There even have been some attempts to make Go run on the JVM. Apparently performance, concurrency and memory management were big motivators for attempting that (you know, stuff the JVM does at scale): https://githubmemory.com/repo/golang-jvm/golang-jvm

Their pitch: "You can use go-jvm simply as a faster version of Golang, you can use it to run Golang on the JVM and access powerful JVM libraries such as highly tuned concurrency primitives, you can use it to embed Golang as a scripting language in your Java program, or many other possibilities."

fl0ki · 4 years ago

Not sure if you're in on the joke, but for those who didn't go to the repo itself:

https://github.com/golang-jvm/golang-jvm

It's just a copy-paste of JRuby on April 1st and the readme now includes a rickroll.

Maybe it's irresponsible of them to leave it up in a way that Google still finds as a legitimate-looking search result.

geodel · 4 years ago

> There even have been some attempts to make Go run on the JVM. Apparently performance, concurrency and memory management were big motivators for attempting that (you know, stuff the JVM does at scale):

This seems legit. Just links to their website/Wiki are not working right now.

javier2 · 4 years ago

Object pooling used to be more common in Java. Now it is mainly used for objects that are expensive (incurs latency) to create, not for GC reasons.

bestinterest · 4 years ago

How have you found Go in contrast to Java. Is the simplicity worth it?

jatone · 4 years ago

yes. golang is actually less restrictive than java. and avoids a ton of the bullshit abstractions you see in every java code base.

char theHeap[0x1000000]; atomic_ulong bumpPtr; void* bump_malloc(int size){ uint32_t returnCandidate = bumpPtr.fetch_add(size, std::memory_order_relaxed); if(returnCandidate + size >= HEAP_SIZE){ // Garbage collect. Super complicated, lets ignore it lol. // Once garbage collect is done, try to malloc again. If fail then panic(). } return &theHeap[returnCandidate]; }

natanbc · 4 years ago

gwbas1c · 4 years ago

Skimmed this and found that a lot of claims are inaccurate and don't have data... Then realized, ugh, it's another Medium post. One particular wrong claim:

> In C#, for example, reference objects and value objects live entirely separate lives. Value objects are always passed by value. You cannot take the the address of a value object and pass that around. No, you need to copy the whole object every time.

In C#, there are normal, C-style pointers. (Although they aren't used very often, mostly when interoperating with C APIs.) Furthermore, you can pass a struct by reference if copying the struct is a problem.

I once wrote some performance critical code where I used structs and past them around by pointers. I ran the code through a profiler, and then refactored to use normal idiomatic non-pointer code. After running through a profiler there was no performance difference.

---

The author of this post really needs to base their claims on actual data. Some well-tuned programs in the languages compared and then compare the performance.

sitkack · 4 years ago

In general when talking about quantitative subjects, we need to use quantitative measures. I think the author is nearly there, but in general, unless you have data to refute your claims, it should be ignored. I don't say this snarkely, but in a domain where we actually have hard quantitative measures, they should be used and required for argument.

WatchDog · 4 years ago

Quantitative measures would be nice, but I would imagine that it's going to be really difficult to quantitatively compare go and java garbage collectors, without a billion other factors about the language/runtime getting in the way.

joelfolksy · 4 years ago

Yep. His claims about C# are so egregiously wrong that the entire piece can safely be dismissed.

Yoric · 4 years ago

I'm a bit skeptical.

Yes, Go has value types and pointers. But whether you need a modern GC will undoubtedly depend a lot on the type of algorithms you need to execute. Also, it's great that you can implement (some form of) allocators, and that will definitely help for many algorithms, but that's definitely a case of tradeoff between convenience and readability. Similarly, unless I'm mistaken, TCMalloc "solves" fragmentation in two cases: either allocations are small (very common) or memory allocation maps neatly to threads (much less common). That's two good cases to have, but I wouldn't count on it solving memory allocation on its own for, say, a browser engine or a videogame.

Oh, and hasn't Java's GC been fully concurrent for a while now?

That being said, revisiting the assumptions made by Java (and other languages) is a very good idea.

esarbe · 4 years ago

> Java is a language that basically outsourced memory management entirely to its garbage collector. This turned out to be a big mistake.

Given that Java is not far behind C and C++ for many kinds of workload and in quite some scenarios can outperform C code, I'm not sure that I buy this line of reasoning.

> Doing these updates requires freezing all threads.

Eh, what? There are multi-threaded GCs, what is this article even talking about?

> However, this does not put C# and Java on equal footing with languages like Go and C/C++ in terms of memory management flexibility

In which world are Go and C in the same worlds when it comes to memory management flexibility? That's like comparing a language that uses a Garbage Collector to one that requires manual memory management. Because - it is.

> Modern Languages Don’t Need Compacting GCs

> If need be, the Pacer slows down allocation while speeding up marking.

So, you just traded in one drawback for another?

Heck, I get it. The JVM is not a thin graceful fawn. It's a complicated beast that requires years of experience to tame it - and even then it will come back from time to time an bite you. There are many good points to critique the JVM - but I don't get the feeling that the author of this article has spent much time with modern JVMs because he's not pointing out any of them.

hashmash · 4 years ago

The author doesn't really understand how Java escape analysis works, and just focuses on one key aspect: "It does not replace a heap allocation with a stack allocation for objects that do not globally escape."

The author then implies that escape analysis is only used to reduce lock acquisition. Java escape analysis will replace a heap allocation with a stack allocation if the code is fully inlined. This is known as scalar replacement.

aktau · 4 years ago

I don't know much about this, but upthread various people say that scalar replacement happens very rarely if at all, currently. E.g.: https://news.ycombinator.com/item?id=29324132.

Could you perhaps comment on that, since you seem to have experience?

The issue is not that it doesn't happen - it does, all the time. The problem is it's unpredictable, hard to control and hard to measure. So it's sort of magic that gets blurred into all the other optimizations the VM is doing, and refactoring your code can make the difference between it happening or not.

First of all that blog post is wrong, it uses Optional which isn't a value type to start with, and contains internal fields.

It is considered a value like class, that will eventually be turned into a value type when Valhala arrives, until then there are no guarantees in scalar replacement.

Secondly, GraalVM or Azul are much better at it, which the author didn't bother to try their blog post on.

madmax108 · 4 years ago

As some of the other comments in the thread allude, this is quite a rudimentary (or rather outdated) understanding of how Java GC operates and ends up (unfortunately) turning an otherwise good comparison into a straw-man argument.

As someone who's worked with Java from the days where "If you want superhigh performance from Java without GC pauses, then just turn off GC and restart your process every X hours" was considered a "valid" way to run high-performance Java systems, I think the changes Java has made to GC are among the biggest improvements to the framework/JVM and have contributed vastly to JVM stability and growth over the last decade.

Fundamentals of Java about data/class layouts in memory have remain same for decades. So author is right at big picture.

> I think the changes Java has made to GC are among the biggest improvements to the framework/JVM and have contributed vastly to JVM stability and growth over the last decade.

This is of course true. However the point is for Java it is absolute necessity for Go it may be nice to have.

avita1 · 4 years ago

dragontamer · 4 years ago

Wait, what?

What's wrong with:

------

You don't even need acq_release consistency here, as far as I can tell. Even purely relaxed memory ordering seems to work, which means you definitely don't need locks or even memory barriers.

The exception is maybe the garbage-collect routine. Locking for garbage collect is probably reasonable (other programmers accept that garbage-collection is heavy and may incur a large running + synchronization costs), but keeping locks outside of the "hot path" is the goal.

This is what I do with my GPU test programs, where locking is a very, very bad idea (but possible to do). Even atomics are kinda-bad in the GPU world but relaxed-atomics are quite fast.

-------

> In Java, this requires 15 000 separate allocations, each producing a separate reference that must be managed.

    points [] array = new points[15000];

This is a singular alloc in Java. 15000 _constructors_ will be called IIRC (My Java is rusty though, someone double-check me on that), but that's not the same as 15000 allocs being called, not by a long shot.

Frankly, it seems like this poster is reasonably good at understanding Go performance, but doesn't seem to know much about Java performance or implementations.

GreenToad · 4 years ago

    Point[] array=new Point[15000];

In java would create an array with null references, to fill it up you need to create each object so that point is valid.

I stand corrected on that point.

comex · 4 years ago

Even with relaxed memory ordering, you’re still bouncing the cache line around between every CPU that tries to access bumpPtr. I would expect that to be significantly slower than using per-thread heaps (especially if you have a large number of CPUs).

haimez · 4 years ago

Java bump pointer allocations (ie: normal allocations, except for eg: large array allocations) occur in a thread local allocation buffer. Only when a thread exhausts its current allocation buffer does it need to worry about potentially contending with other threads to acquire a new one.