Readit News logoReadit News
wahern · 3 years ago
> The problem with platform threads is that they are expensive from a lot of points of view. First, they are costly to create. Whenever a platform thread is made, the OS must allocate a large amount of memory (megabytes) in the stack to store the thread context, native, and Java call stacks. This is due to the not resizable nature of the stack. Moreover, whenever the scheduler preempts a thread from execution, this enormous amount of memory must be moved around.

Scheduler pre-emption does not cause stack memory to be copied. Perhaps they're thinking of registers.

(As an side, suspension and resumption of Java virtual threads does result in its stack being copied--saved and restored--as this was deemed less costly[1] than growable stacks, which is how Go works.)

> As we can imagine, this is a costly operation, in space and time. In fact, the massive size of the stack frame limits the number of threads that can be created. We can reach an OutOfMemoryError quite easily in Java, continually instantiating new platform threads till the OS runs out of memory:

Stack frame != stack.

The author seems confused about some concepts. I didn't read beyond this so don't know whether that confusion effected any of their conclusions or advice.

[1] EDIT: Whether less costly in terms of performance or development effort I'm not sure. A major reason JavaScript and many other languages don't implement stackfull coroutines is that the virtual machines--interpreters, JITs, etc--are written in a way that in-language function calls and recursion directly or indirectly rely on the underlying native "C" stack. This correspondence is not something you can typically remedy without completely rewriting the implementation from scratch. Language implementations like Go and Lua and were written from the beginning to avoid this correspondence. To accomplish stackfull coroutines languages like Java and, IIUC, OCaml really had no choice but to rely on some other tricks, though I think OCaml permitted some tricks not available to Java, because OCaml could do some transforms which Java couldn't given the nature of the JVM.

ameliaquining · 3 years ago
I think JavaScript doesn't want this because, semantically, the whole language is designed around the idea of a single thread of execution, that can't be suspended except explicitly with an await statement. So if you call a function, you know that it can't suspend and let some other thread take control and make arbitrary changes out from under you before control returns to you. Breaking this assumption would probably break too much existing code.
avianlyric · 3 years ago
What do stackfull coroutines have to do with cooperative vs non-cooperative concurrency? They’re entirely orthogonal problems.

JavaScript absolutely does want stackfull coroutines (even if they’re not called coroutines, but just async stacks). That why chrome has so much magic inside it to reassemble async stack traces for exceptions. But it has to do that via all manner of complex bookkeeping and jiggery-pokery, which is often broken by libraries doing clever things. Having async functionality built on top of cooperative coroutines, all sharing a single system thread, would make async stacktraces trivial to produce accurately, and make it substantially easier to debug highly interwoven async code.

fulafel · 3 years ago
Also the space for the native thread stack ("C stack") is just allocated virtual memory (at least on *ix), not physical memory. When the program starts to touch stack pages, on first touch the user code will trap to the OS where the vm system will transparently fill in the needed physical pages as the usage grows.

Virtual threads don't seem like a worthwhile complexity tradeoff unless you're trying to run lots of threads in 32-bit address space. I wonder if this got started in that era and just took time to mature.

BenoitP · 3 years ago
> unless you're trying to run lots of threads

Well this is exactly the design point. The authors want to promote a coding style where spawning a new thread is ultra-cheap, possibly at a ratio of very few IO calls per virtual thread.

The ideal application would be WhatsApp's use of Erlang [1][2]: 2.8M active connections per server (in 2012! 100GB RAM servers), each of them mostly idle with 200k msgs/sec.

All of this while keeping the threading model, keeping your stacks intact for debugging, and possibly a hierarchy of threads where you can kill a whole branch and hot-reload it with new code. (which is a thing that's not easy to do with reactive programming / async await)

[1] http://highscalability.com/blog/2014/2/26/the-whatsapp-archi...

[2] https://web.archive.org/web/20221220020352/http://www.erlang...

jeroenhd · 3 years ago
Virtual threads usually don't suffer the overhead of asking the OS to spawn a thread. If your workload consists of many small concurrent tasks, spawning threads can easily become costlier than processing the workload itself.

These "let's make our own threads" implementations all seem to stem from "I want threads, but I don't want to wait for the kernel to do its thing". This approach has some downsides for implementations (there's a reason the kernel takes a moment to spawn a thread and now you have to deal with the implications) but staying in userland also has some advantages in terms of pure performance.

Such tasks could of course be done faster using thread pools and a manual division of the workload (or adding locking to a dynamic work queue, etc.) but the threading model can be easier to visualise and reason about. It sits somewhere in the middle between the performance of a custom threading solution and the ease of use of single threaded code.

I imagine things like web servers, dealing with tons of different connections, will be able to use this mechanism quite effectively. If you're just batching through a dataset, I don't think you'll have much of an advantage using this model.

pdpi · 3 years ago
This sort of lightweight thread is at the core of things like Erlang or Go. You can just spin up processes/goroutines by the thousands without impacting performance too much. It just completely changes the way you write concurrency code.
intelVISA · 3 years ago
People are worried about GPT4 producing nonsense huh... articles like this prove humans still have that market on lock.
grimgrin · 3 years ago
> The problem with platform threads is that they are expensive from a lot of points of view. First, they are costly to create. Whenever a platform thread is made, the OS must allocate a large amount of memory (megabytes) in the stack to store the thread context, native, and Java call stacks. This is due to the not resizable nature of the stack. Moreover, whenever the scheduler preempts a thread from execution, this enormous amount of memory must be moved around.

There are a few issues and inaccuracies in this statement.

While it is true that platform threads can be expensive in terms of resources, the claim that the OS must allocate "megabytes" of memory for the stack is an exaggeration. The actual size of the stack depends on the operating system and the specific implementation, but typical default values range from a few dozen kilobytes to a few hundred kilobytes, not megabytes.

The statement implies that the entire stack is moved around when the scheduler preempts a thread from execution. This is not accurate. When a thread is preempted, the operating system saves the context of the thread, which is a relatively small amount of data, including the values of the CPU registers and the program counter. The stack itself is not moved around during this process.

It is not correct to say that the stack is "not resizable." While the default stack size is set by the operating system, many programming languages and operating systems allow you to specify the stack size when creating a new thread. However, it is true that once a thread has been created, its stack size typically cannot be changed.

Sent from OpenAI.

daveidol · 3 years ago
I would say this article is a far cry from "nonsense" - it's quite informative, even if there are a few small inaccuracies or naming issues.
vijucat · 3 years ago
C# .NET had async and await in 2012, for comparison. I've always loved Java but Microsoft deserves immense credit for raising the bar, and so quickly, too.
capableweb · 3 years ago
Java Virtual Threads seems to be a ("better" in their mind) alternative than async/await, so not sure Microsoft should be credited for Java Virtual Threads?

> Also, the async/await approach, such as Kotlin coroutines, has its own problems. Even though it aims to model the one task per thread approach, it can’t rely on any native JVM construct. For example, Kotlin coroutines based the whole story on suspending functions, i.e., functions that can suspend a coroutine. However, the suspension is wholly based upon non-blocking IO, which we can achieve using libraries based on Netty, but not every task can be expressed in terms of non-blocking IO. Ultimately, we must divide our program into two parts: one based on non-blocking IO (suspending functions) and one that does not. This is a challenging task; it takes work to do it correctly. Moreover, we lose again the simplicity we want in our programs.

> The above are reasons why the JVM community is looking for a better way to write concurrent programs. Project Loom is one of the attempts to solve the problem. So, let’s introduce the first brick of the project: virtual threads.

jayd16 · 3 years ago
It's not really an alternative to async/await. Implicit vs cooperative multithreading have pros and cons for each.
VagueMag · 3 years ago
At least from the Java language advocates' perspective, async/await is a worse solution to the problem of async than the structured concurrency approach that virtual threads will enable.
DeathArrow · 3 years ago
async/wait is using tasks which are the equivalent of virtual threads.

What I do like to see in C# is something akin to goroutines from Go or actors in Elixir.

andrekandre · 3 years ago
thats interesting, any good examples?
DeathArrow · 3 years ago
Not only async, but C# had LINQ, lambdas, records, pattern matching, pointers, hardware instructions via intrinsics, stackalloc etc. before similar constructs came into Java, if they ever did.

Probably there are examples where Java introduced something first, but I don't know because I'm not so well versed in Java.

While similar and inspired by Java, I do prefer C# because is less verbose, requires less boiler plate, it generally has only one proper way to do things and is kind of jack of all trades, in the sense you can tackle any area of programming besides very low level systems programming - and It quite can reach that point, too if there will be a way to disable GC and allow manual memory management.

Web backend - check, web frontend - check, mobile apps - check, desktop - check, multi platform - check, embedded - check, games - check, VM - check, native AOT - check.

It also looks great in benchmarks: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

I am biased but since I recognize it, maybe you shouldn't downvote me just for that. :)

And what's even better than C# is F#, but that's too bad nobody likes functional programming or hire programmers to use functional languages.

pron · 3 years ago
There's plenty of stuff Java did first (or only) -- an optimising JIT, two new generations of GC (G1, ZGC--a low latency GC with <1ms max pause time), low-overhead deep profiling (Java Flight Recorder) -- but they're all in the runtime. Java's strategy since the beginning has always been to innovate on the platform and be a last-mover on the language, keeping it conservative. .NET seems to follow an opposite strategy.

That's how we've been able to avoid properties and just have algebraic data types, avoid async/await and do virtual threads, avoid string interpolation and just have safe string templates. This also allows us to keep the number of features in the language relatively low -- as we, and most of our users, like it.

krzyk · 3 years ago
> Not only async, but C# had LINQ, lambdas, records, pattern matching, pointers, hardware instructions via intrinsics, stackalloc etc. before similar constructs came into Java, if they ever did.

Just a clarification regarding one statement which I know for sure is not true: Java team started working on records before C# started similar work.

I don't know about the rest.

vijucat · 3 years ago
Yes! It was distrust towards Microsoft that kept .NET from growing to be a universal language and kept Java in the game, to be honest. The language / framework, per se, is to be celebrated as a great leap forward borne out of the good kind of competition.
vips7L · 3 years ago
Have you looked at the source of some of the C# benchmarks? I don’t believe they’re representative of how one would actually write C#. They’re all extremely hand tuned using raw pointers and unsafe blocks. The regex benchmark actually just delegates to a C library over FFI.
xmcqdpt2 · 3 years ago
> And what's even better than C# is F#, but that's too bad nobody likes functional programming or hire programmers to use functional languages.

In JVM land there is Scala and people do hire for it, more so than any other typed FP language, AFAIK.

seabrookmx · 3 years ago
That head start has had a huge impact on the ecosystem too. Random libraries (ex: Google.Cloud.PubSub.V1) have first class, mature support for async and streams. Compare that to Python (and I'd have to assume Python is much more popular on Google Cloud) which only recently got async support and it's still kludgy. This really applies across the board for anything web related.

JavaScript/TypeScript is probably the only ecosystem with comparable async support, unless you count golang which achieves a similar result with different ergonomics. I'm still on the fence on which I personally prefer.. I can see the appeal of not having the async logic pollute the callstack, but at the same time the magic[1] way golang handles i/o seems antithetical to its philosophy of being simple and explicit (for example, with respect to error handling).

[1]: https://www.reddit.com/r/golang/comments/xiu4zg/comment/ip77...

thatnerdyguy · 3 years ago
And they are experimenting with green threads now. Would be... hilarious? If it landed in .NET before Java.

https://github.com/dotnet/runtimelab/tree/feature/green-thre...

geodel · 3 years ago
In Java it is landed already as preview feature last September and will be final feature this September.
pkulak · 3 years ago
This is way better than async/await. This is Go/Erlang levels of green thread ease of use.
metaltyphoon · 3 years ago
I guess everyone is entitled to their opinion.
kitd · 3 years ago
Java had green threads in v1.0.
dopidopHN · 3 years ago
My understanding is that green threads are different on the implementation side as well. Basically they are cheaper at the system level? The API do not even change much last time I checked. It’s all happening in how the JVM span threads
riku_iki · 3 years ago
> .NET had async and await in 2012, for comparison.

java had ExecutorService and futures support for a while, it is just another synthetic sugar around the same solution, although Java's approach is much more powerful and flexible.

recursivedoubts · 3 years ago
java green threads strike back
jjtheblunt · 3 years ago
That would be the best name by far!
rzzzt · 3 years ago
They were called green as well as "M:N" threads: https://en.wikipedia.org/wiki/Green_thread#Green_threads_in_...
throwaway2037 · 3 years ago
Sorry to ask a somewhat unrelated question: How did the author create those flow charts? That style is Just. So. Cool.
Kolja · 3 years ago
The style, and especially the splotchy background coloring, made Paper[1] (By WeTransfer it seems? When it was released they were called fiftythree.) my first guess.

[1]: https://apps.apple.com/us/app/paper-by-wetransfer/id50600381...

dgb23 · 3 years ago
Looks like they used excalidraw.

It’s a OSS and a PWA. A lot of people seem to use it because it lets you draw without worrying about details.

FlyingSnake · 3 years ago
It could also be an Onyx or ReMarkable tablet as well.
jasmer · 3 years ago
These are great, but Loom discussions almost always mix details with issues of the API.

It's hard to fathom what that means for end developers.

Will anything at all change but performance?

What are the practical implications beyond that?

We just don't bother with Executors and live happily?

twic · 3 years ago
The way i think about the situation is:

1. Platform threads + blocking APIs = comfortable to use, does not scale to many connections

2. Platform threads + non-blocking APIs = painful to use, scales to many connections

3. Virtual threads + blocking APIs = comfortable to use, scales to many connections

So, at the moment, if you want to write software which scales to many connections (which not everybody needs to do, but some do), you have to suffer for it. With virtual threads, it will be a lot more pleasant.

As a concrete example, with non-blocking architecture, there is no way to have an InputStream or OutputStream which streams an arbitrarily large amount of data over the network without buffering it all. Because the contracts of those classes say that they block when there is no data or space available! If you look at the APIs of web servers based on non-blocking APIs (eg [1], [2]), they want you to read or write the whole body in one go, or maybe in chunks, which will be pushed to you if you're reading. You can only build a stream on top of that by buffering everything, or using another thread, which destroys the advantage of non-blocking architecture. There is loads of useful I/O machinery that is built on top of streams, like JSON parsers and generators, so your choice is either not using that, or accepting buffering or extra threads.

With virtual threads, you just use blocking I/O, servers and clients can expose streams for request and response bodies, and it's trivial to use any I/O machinery you like. It's also far easier to write your own.

[1] https://undertow.io/javadoc/1.3.x/io/undertow/io/Receiver.ht...

[2] https://undertow.io/javadoc/1.3.x/io/undertow/io/Sender.html

fulafel · 3 years ago
In these discussions it's good to quantify what you mean by "many connections".

Eg web is full of people posting Java OutOfMemory stack traces when they haven't increased the OS resource limits from the default and are imagining that the limit is 10k threads instead of 1M threads on their hw, falsely concluding that Java uses uses a lot of physical memory per thread stack.

jasmer · 3 years ago
So why can't I just use an OS regular thread to use a non-blocking API?

What is the advantage of a virtual thread in that case? You are saying that it's painful? How exactly? If you use an Executor that's setup with a proper Thread Pool it's mostly painless?

I mean, virtural threads are nicer but I see that as almost a syntax issue. A bit cleaner.

I guess I'm asking what you mean by 'painful to use?'.

xxs · 3 years ago
>2. Platform threads + non-blocking APIs = painful to use, scales to many connections

Wow, I have written a substantial amount of non-blocking IO since around java 1.4.2 (when it actually became stable). It was not much harder to use at all (compared to io streams). The issues w/ buffering/scalability and internal scheduling will be exactly the same with green threads.

fnordsensei · 3 years ago
Additionally, Loom is about structured concurrency, built on top of virtual threads.

Two different projects, albeit that one is making use of the output of the other.

mrkeen · 3 years ago
Weird for the article to compare OS threads to Loom threads, and skip over the last ten years of Futures.
gleenn · 3 years ago
But Futures still tie up a real OS thread. You can chain them together which alleviates some additional thread cost but it's not M:N scheduling, you're just being more clever with your OS threads. Virtual Threads separate those concepts.
riku_iki · 3 years ago
> but it's not M:N scheduling

Why do you think it is not M:N scheduling? You have M tasks and N OS threads in the pool?

mrkeen · 3 years ago
That's what should have been in the article.
jjtheblunt · 3 years ago
but that's what the topic of the article in its title is.

I say that as someone who once had a Solaris Internals book (and worked at Sun in that area), and it would be pretty great for that book's explanation to be online for Java people who didn't use original green threads to see.

Bjartr · 3 years ago
I learned a new word from this article, "Omissis", which means "omissions and reactions". Equivalent in this context to "...snip..." or "some code omitted".
rr888 · 3 years ago
Can I forget about reactive programming now? Its been fashionable for the last 5+ years, I hate it. Kinda cool to play around with but just seems to double the complexity on everything.