Guide to Java Virtual Threads

> The problem with platform threads is that they are expensive from a lot of points of view. First, they are costly to create. Whenever a platform thread is made, the OS must allocate a large amount of memory (megabytes) in the stack to store the thread context, native, and Java call stacks. This is due to the not resizable nature of the stack. Moreover, whenever the scheduler preempts a thread from execution, this enormous amount of memory must be moved around.

Scheduler pre-emption does not cause stack memory to be copied. Perhaps they're thinking of registers.

(As an side, suspension and resumption of Java virtual threads does result in its stack being copied--saved and restored--as this was deemed less costly[1] than growable stacks, which is how Go works.)

> As we can imagine, this is a costly operation, in space and time. In fact, the massive size of the stack frame limits the number of threads that can be created. We can reach an OutOfMemoryError quite easily in Java, continually instantiating new platform threads till the OS runs out of memory:

Stack frame != stack.

The author seems confused about some concepts. I didn't read beyond this so don't know whether that confusion effected any of their conclusions or advice.

[1] EDIT: Whether less costly in terms of performance or development effort I'm not sure. A major reason JavaScript and many other languages don't implement stackfull coroutines is that the virtual machines--interpreters, JITs, etc--are written in a way that in-language function calls and recursion directly or indirectly rely on the underlying native "C" stack. This correspondence is not something you can typically remedy without completely rewriting the implementation from scratch. Language implementations like Go and Lua and were written from the beginning to avoid this correspondence. To accomplish stackfull coroutines languages like Java and, IIUC, OCaml really had no choice but to rely on some other tricks, though I think OCaml permitted some tricks not available to Java, because OCaml could do some transforms which Java couldn't given the nature of the JVM.

ameliaquining · 3 years ago

I think JavaScript doesn't want this because, semantically, the whole language is designed around the idea of a single thread of execution, that can't be suspended except explicitly with an await statement. So if you call a function, you know that it can't suspend and let some other thread take control and make arbitrary changes out from under you before control returns to you. Breaking this assumption would probably break too much existing code.

avianlyric · 3 years ago

What do stackfull coroutines have to do with cooperative vs non-cooperative concurrency? They’re entirely orthogonal problems.

JavaScript absolutely does want stackfull coroutines (even if they’re not called coroutines, but just async stacks). That why chrome has so much magic inside it to reassemble async stack traces for exceptions. But it has to do that via all manner of complex bookkeeping and jiggery-pokery, which is often broken by libraries doing clever things. Having async functionality built on top of cooperative coroutines, all sharing a single system thread, would make async stacktraces trivial to produce accurately, and make it substantially easier to debug highly interwoven async code.

fulafel · 3 years ago

Also the space for the native thread stack ("C stack") is just allocated virtual memory (at least on *ix), not physical memory. When the program starts to touch stack pages, on first touch the user code will trap to the OS where the vm system will transparently fill in the needed physical pages as the usage grows.

Virtual threads don't seem like a worthwhile complexity tradeoff unless you're trying to run lots of threads in 32-bit address space. I wonder if this got started in that era and just took time to mature.

BenoitP · 3 years ago

> unless you're trying to run lots of threads

Well this is exactly the design point. The authors want to promote a coding style where spawning a new thread is ultra-cheap, possibly at a ratio of very few IO calls per virtual thread.

The ideal application would be WhatsApp's use of Erlang [1][2]: 2.8M active connections per server (in 2012! 100GB RAM servers), each of them mostly idle with 200k msgs/sec.

All of this while keeping the threading model, keeping your stacks intact for debugging, and possibly a hierarchy of threads where you can kill a whole branch and hot-reload it with new code. (which is a thing that's not easy to do with reactive programming / async await)

[1] http://highscalability.com/blog/2014/2/26/the-whatsapp-archi...

[2] https://web.archive.org/web/20221220020352/http://www.erlang...

jeroenhd · 3 years ago

Virtual threads usually don't suffer the overhead of asking the OS to spawn a thread. If your workload consists of many small concurrent tasks, spawning threads can easily become costlier than processing the workload itself.

These "let's make our own threads" implementations all seem to stem from "I want threads, but I don't want to wait for the kernel to do its thing". This approach has some downsides for implementations (there's a reason the kernel takes a moment to spawn a thread and now you have to deal with the implications) but staying in userland also has some advantages in terms of pure performance.

Such tasks could of course be done faster using thread pools and a manual division of the workload (or adding locking to a dynamic work queue, etc.) but the threading model can be easier to visualise and reason about. It sits somewhere in the middle between the performance of a custom threading solution and the ease of use of single threaded code.

I imagine things like web servers, dealing with tons of different connections, will be able to use this mechanism quite effectively. If you're just batching through a dataset, I don't think you'll have much of an advantage using this model.

pdpi · 3 years ago

This sort of lightweight thread is at the core of things like Erlang or Go. You can just spin up processes/goroutines by the thousands without impacting performance too much. It just completely changes the way you write concurrency code.

intelVISA · 3 years ago

People are worried about GPT4 producing nonsense huh... articles like this prove humans still have that market on lock.

grimgrin · 3 years ago

There are a few issues and inaccuracies in this statement.

While it is true that platform threads can be expensive in terms of resources, the claim that the OS must allocate "megabytes" of memory for the stack is an exaggeration. The actual size of the stack depends on the operating system and the specific implementation, but typical default values range from a few dozen kilobytes to a few hundred kilobytes, not megabytes.

The statement implies that the entire stack is moved around when the scheduler preempts a thread from execution. This is not accurate. When a thread is preempted, the operating system saves the context of the thread, which is a relatively small amount of data, including the values of the CPU registers and the program counter. The stack itself is not moved around during this process.

It is not correct to say that the stack is "not resizable." While the default stack size is set by the operating system, many programming languages and operating systems allow you to specify the stack size when creating a new thread. However, it is true that once a thread has been created, its stack size typically cannot be changed.

Sent from OpenAI.

daveidol · 3 years ago

I would say this article is a far cry from "nonsense" - it's quite informative, even if there are a few small inaccuracies or naming issues.

C# .NET had async and await in 2012, for comparison. I've always loved Java but Microsoft deserves immense credit for raising the bar, and so quickly, too.

capableweb · 3 years ago

Java Virtual Threads seems to be a ("better" in their mind) alternative than async/await, so not sure Microsoft should be credited for Java Virtual Threads?

> Also, the async/await approach, such as Kotlin coroutines, has its own problems. Even though it aims to model the one task per thread approach, it can’t rely on any native JVM construct. For example, Kotlin coroutines based the whole story on suspending functions, i.e., functions that can suspend a coroutine. However, the suspension is wholly based upon non-blocking IO, which we can achieve using libraries based on Netty, but not every task can be expressed in terms of non-blocking IO. Ultimately, we must divide our program into two parts: one based on non-blocking IO (suspending functions) and one that does not. This is a challenging task; it takes work to do it correctly. Moreover, we lose again the simplicity we want in our programs.

> The above are reasons why the JVM community is looking for a better way to write concurrent programs. Project Loom is one of the attempts to solve the problem. So, let’s introduce the first brick of the project: virtual threads.

jayd16 · 3 years ago

It's not really an alternative to async/await. Implicit vs cooperative multithreading have pros and cons for each.

VagueMag · 3 years ago

At least from the Java language advocates' perspective, async/await is a worse solution to the problem of async than the structured concurrency approach that virtual threads will enable.

DeathArrow · 3 years ago

async/wait is using tasks which are the equivalent of virtual threads.

What I do like to see in C# is something akin to goroutines from Go or actors in Elixir.

andrekandre · 3 years ago

thats interesting, any good examples?

DeathArrow · 3 years ago

Not only async, but C# had LINQ, lambdas, records, pattern matching, pointers, hardware instructions via intrinsics, stackalloc etc. before similar constructs came into Java, if they ever did.

Probably there are examples where Java introduced something first, but I don't know because I'm not so well versed in Java.

While similar and inspired by Java, I do prefer C# because is less verbose, requires less boiler plate, it generally has only one proper way to do things and is kind of jack of all trades, in the sense you can tackle any area of programming besides very low level systems programming - and It quite can reach that point, too if there will be a way to disable GC and allow manual memory management.

Web backend - check, web frontend - check, mobile apps - check, desktop - check, multi platform - check, embedded - check, games - check, VM - check, native AOT - check.

It also looks great in benchmarks: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

I am biased but since I recognize it, maybe you shouldn't downvote me just for that. :)

And what's even better than C# is F#, but that's too bad nobody likes functional programming or hire programmers to use functional languages.

pron · 3 years ago

There's plenty of stuff Java did first (or only) -- an optimising JIT, two new generations of GC (G1, ZGC--a low latency GC with <1ms max pause time), low-overhead deep profiling (Java Flight Recorder) -- but they're all in the runtime. Java's strategy since the beginning has always been to innovate on the platform and be a last-mover on the language, keeping it conservative. .NET seems to follow an opposite strategy.

That's how we've been able to avoid properties and just have algebraic data types, avoid async/await and do virtual threads, avoid string interpolation and just have safe string templates. This also allows us to keep the number of features in the language relatively low -- as we, and most of our users, like it.

krzyk · 3 years ago

> Not only async, but C# had LINQ, lambdas, records, pattern matching, pointers, hardware instructions via intrinsics, stackalloc etc. before similar constructs came into Java, if they ever did.

Just a clarification regarding one statement which I know for sure is not true: Java team started working on records before C# started similar work.

I don't know about the rest.

vijucat · 3 years ago

Yes! It was distrust towards Microsoft that kept .NET from growing to be a universal language and kept Java in the game, to be honest. The language / framework, per se, is to be celebrated as a great leap forward borne out of the good kind of competition.

vips7L · 3 years ago

Have you looked at the source of some of the C# benchmarks? I don’t believe they’re representative of how one would actually write C#. They’re all extremely hand tuned using raw pointers and unsafe blocks. The regex benchmark actually just delegates to a C library over FFI.

xmcqdpt2 · 3 years ago

> And what's even better than C# is F#, but that's too bad nobody likes functional programming or hire programmers to use functional languages.

In JVM land there is Scala and people do hire for it, more so than any other typed FP language, AFAIK.

seabrookmx · 3 years ago

That head start has had a huge impact on the ecosystem too. Random libraries (ex: Google.Cloud.PubSub.V1) have first class, mature support for async and streams. Compare that to Python (and I'd have to assume Python is much more popular on Google Cloud) which only recently got async support and it's still kludgy. This really applies across the board for anything web related.

JavaScript/TypeScript is probably the only ecosystem with comparable async support, unless you count golang which achieves a similar result with different ergonomics. I'm still on the fence on which I personally prefer.. I can see the appeal of not having the async logic pollute the callstack, but at the same time the magic[1] way golang handles i/o seems antithetical to its philosophy of being simple and explicit (for example, with respect to error handling).

[1]: https://www.reddit.com/r/golang/comments/xiu4zg/comment/ip77...

thatnerdyguy · 3 years ago

And they are experimenting with green threads now. Would be... hilarious? If it landed in .NET before Java.

https://github.com/dotnet/runtimelab/tree/feature/green-thre...

geodel · 3 years ago

In Java it is landed already as preview feature last September and will be final feature this September.

pkulak · 3 years ago

This is way better than async/await. This is Go/Erlang levels of green thread ease of use.

metaltyphoon · 3 years ago

I guess everyone is entitled to their opinion.

kitd · 3 years ago

Java had green threads in v1.0.

dopidopHN · 3 years ago

My understanding is that green threads are different on the implementation side as well. Basically they are cheaper at the system level? The API do not even change much last time I checked. It’s all happening in how the JVM span threads

riku_iki · 3 years ago

> .NET had async and await in 2012, for comparison.

java had ExecutorService and futures support for a while, it is just another synthetic sugar around the same solution, although Java's approach is much more powerful and flexible.