Too often people think the problem between sync and async is a red/blue coloring within the programming language. The problem really is that every OS already has red/blue syscalls.
Everything else about “why can’t I mix these functions” is a direct result of that broken consistency exposed by the OS. As such I’m not sure there is a zero-cost abstraction (stackful coroutines is close) that any programming language can build to improve the situation.
Meanwhile, the trend is clear - every OS has adopted (and is adopting more) non-blocking syscalls because they are direly needed. The only benefit blocking syscalls offer is sugar to improve syscall ergonomics.
I think if more people talked about it this way it would become clear that adding a “blocking” tag to the syscalls and bubble that tag up the call stack is the right next step to depreciating those legacy OS APIs. I don’t mean to say we should accept poor ergonomics but adding the blocking tag is a great first step to 1. Reminding people of the problem. 2. Identifying areas where research is needed to improve ergonomics and replace the existing “blocking” tag with minimal downsides.
Boats, if you're still reading this: I don't have anything of technical substance to add to the discussion, but thanks for repeatedly explaining the rationale behind async Rust. And thanks for doing what you believed you had to do to make Rust a successful language, and especially, as you explained in an earlier post, for implementing the big feature that the companies who were most willing to fund Rust development wanted. I hope this can at least somewhat counterbalance all of the rants, complaints, and uninformed but overly confident comments.
Edit to add: I haven't personally done much work with async Rust yet, but the thing that appeals to me about async Rust is intra-task concurrency. So I'm not just trying to counterbalance the complainers and make you feel better; I really believe that Rust's futures and async/await syntax were not a mistake.
What I like about this post is that it provides a good attack against the function coloring philosophy, which always rubbed me the wrong way. Function coloring essentially implies that there's no good reason for async and sync to be different, and it's being used as an argument that we should try to sweep the distinction between the two under the rug as much as possible, and that's just not a good idea.
I also like the point that maybe(async) isn't terribly compelling. Recently, I've been looking at building an async parser, and the sync version of the interface is pretty simple: fn parse(r: &mut dyn Read) -> Result<ParsedObj> [1]. In theory, I could slap an async on it, change Read to AsyncRead, and now I have an async version. Except it's not usefully async: the consumer of the data can't do anything until everything has been fully read in. So what you want in the asynchronous version of the API is something that looks like a stream of parse events. But that API is way too much faff for anyone trying to use the API synchronously. Or is it? If you have a high-level method that distilled the stream of output events into a simple, easy-to-use answer, then you end up with something that looks maybe(async)-able [2].
Where I think the author gets it wrong, though, is in asserting that it's statefulness that causes maybe(async) to break down. State per se isn't the problem: our underlying parser above is statefully turning a stream of input data into a stream of parse events, and this could sometimes work well with maybe(async). The problem arises when you're multiplexing many streams at once. In synchronous code, there's just no way to do that kind of multiplexing (as pointed out earlier in the blog post), while it's a pretty key design feature of async code. The trouble really comes in when you have a system whose outer interface is a 1:1 stream interface, but which internally needs to go through a 1:N piece and an N:1 piece. It looks like it could be maybe(async)-able from the outside, but its implementation is hopelessly not maybe(async)-able.
[1] Honestly, a lot of parsers go a step further and just read in the input buffer as a &[u8] and rely on zero-copy, but that interface turns out to be even less useful in an async context.
[2] But the faff remains if you can't resort to one of the pre-canned methods. How much of an issue that ends up being is left to the reader to decide.
I've had to build something like an explicit futures system for a common case that doesn't look like an async pseudo-thread. There are a large number of requests for various assets. When an asset arrives from the network, all the objects waiting for it need to be informed so they can update. Individual objects may have multiple updates pending. Updates need not arrive in order. It's a many to many relationship, not a 1-1 relationship.
This is not uncommon in big-world games. I hit it in my metaverse client. It's a standard Windows feature. Windows file systems have "FindFirstChangeNotificationA", which lets a process monitor a collection of files for modifications. That has scaling issues for large numbers of files, but it's the same problem.
Similar ideas exist in the database world. You'd like to be able to ask "tell me when the results of this SELECT change". That's been played with as a concept, but didn't go mainstream.
In this area you can turn a manageable special case into a really hard generic problem. This is useful if you need a thesis topic, less useful if you need a working system.
I've worked with similar systems in gamedev. To be exact, we downloaded 2d images and then managed image atlases with them — so we needed to update renderers not only when image they required was downloaded and included in an atlas, but also when an atlas with their image was rebuilt.
To be honest, the system that we designed wasn't that dissimilar from promises. It was a dynamic collection of callbacks (using built-in C# events), not unlike then/catch delegates. I've worked on it about 10-12 years ago, exactly when JS world had discussions on what flavour of promises was the best, and it was one of the main sources of inspiration.
* Asset might be in memory in a cache. If so, satisfy request from cache.
* If asset is not in cache, queue up request, possibly on a priority queue that gets re-prioritized. Merge requests for same asset.
* When request reaches front of queue, see if the things that wanted it still want it. Remove dead requests. Discard entire request if all requestors no longer need it.
* Lock against two asset fetches for same asset being performed simultaneously.
* Fetch and preprocess asset
* Check again that asset is still wanted.
* Deliver asset
* Unlock against two asset fetches for same asset.
This is almost generic, except that once level of detail becomes involved, it gets less generic.
When I read this, I have to think of a take I heard a few times, namely that "Rust would be better off if the language had opted for a proper effect system from the get go, instead of grafting specialized forms of it onto the language now."
Does anyone have some thoughts on this? I just lack the knowledge to really evaluate that sort of take. There's a lot of discussions on whether async Rust is "good" or "bad", or the right abstraction or not, but it's pretty clear that (at this point) we're more less stuck with it (for better or worse), but that's not what I am asking about.
What I am asking is whether a proper effect system would have genuinely, and legitimately improved or significantly simplified the whole state of Rust async or not, and how. A lot of the async Rust problems sound like "hard" problems, and I'm a bit skeptical that a feature like that would really reduce the "heavy lifting" you have to do.
> Rust would be better off if the language had opted for a proper effect system from the get go
It does sound very appealing, but I'm not sure there's enough prior art to steal from, even now, but especially not when Rust was hatching. It would probably be too big a gamble. I'm checking out Koka in that space, but it's early days.
> If you were a language designer of some renown, you might convince a large and wealthy technology company to fund your work on a new language which isn’t so beholden to C runtime, especially if you had a sterling reputation as a systems engineer with a deep knowledge of C and UNIX and could leverage that (and the reputation of the company) to get rapid adoption of your language. Having achieved such an influential position, you might introduce a new paradigm, like stackful coroutines or effect handlers, liberating programmers from the false choice between threads and futures. If Liebniz is right that we live in the best of all possible worlds, surely this is what you would do with that once in a generation opportunity.
This seems... a bit specific? Was it a reference to a specific new language or was that more of a poignant wish?
Eh bien! mon cher Pangloss, lui dit Candide, quand vous avez été pendu, disséqué, roué de coups, et que vous avez ramé aux galères, avez-vous toujours pensé que tout allait le mieux du monde? Je suis toujours de mon premier sentiment, répondit Pangloss; car enfin je suis philosophe; il ne me convient pas de me dédire, Leibnitz ne pouvant pas avoir tort, et l’harmonie préétablie étant d’ailleurs la plus belle chose du monde, aussi bien que le plein et la matière subtile.
If you're a language/platform like Swift or Rust (see also Erlang or OCaml) you're gonna end up with some kind of userspace task scheduler because OSes don't give you enough control over threads. If you're a systems language you're gonna use stackless coroutines because speed/memory and C interop. In Rust, you use their task scheduler via async/await, but sadly the ergonomics aren't wonderful and there's now weirdo problems like what reactor/engine/whatever they standardize on etc. The interesting and good stuff will be built on top of async/await, probably shouldn't have touted it as the interface.
My hypothesis here is that something like concurrency/parallelism exists on two levels. You've gotta get the underlying implementation right, and you have to expose a mental model and affordances for using it. The problem is that languages that are good at the first thing are usually real bad at the second thing. More broadly I think this is a trilemma--pick any 2 for your language:
- small
- can implement performant concurrency/parallelism
- can express a high-level mental model of concurrency/parallelism
No one should be surprised that Rust will throw "small" overboard here. I think the only question is how many models will they support? Will they do CSP AND actors (they're different!)? Will there be some kind of OpenMP type thing? Will there be new keywords or--shudder--new SYNTAX?
---
I feel like I have to say Go's solution to this is bold and elegant. Giving up straightforward C interop was pretty gutsy, and the affordances for working with its task scheduling system are familiar and intuitive. I want to stress this (because TFA goes way out of its way to be a little shitty w/ Go): this worked absolutely great. Tons of good software is written in Go. It is super effective. It spent its innovation tokens on experience and tooling and that was really successful (i.e. type systems aren't the only way to fight data races). There were exactly zero debates about stuff like await syntax or what engine to use/standardize. In the time it took Rust to ship async/await, probably 7 billion actually useful Go programs were shipped. Just because someone else's values aren't yours doesn't mean they're the wrong values. I think it's cool that people like Boats are being really thoughtful about how to do this and sharing their thoughts and processes out in the open. I just think there's room in this crazy world for at least two points of view, and that we could do with a little less sneering.
> Will they do CSP AND actors (they're different!)? Will there be some kind of OpenMP type thing?
Rust builds support for all of these models based on it's basic underlying "shared ^ mutable" and "Send/Sync" safety guarantees. It should be quite clear based on this whether you're sharing access to some read-only data, staying within a single-threaded context with mutability, or relying on some sort of synchronization primitive (be it atomic data, locks, r/w mutexes etc). The fact that we understand some uses of these facilities as involving patterns such as "CSP" or "Actors" is quite beside the point.
I don't super understand your point. I understood TFA to making the argument that dealing with the low-level business of stuff like mutexes and async/await isn't what most people want to do:
> We should aspire not to simplify the system by hiding the differences between futures and threads, but instead to find the right set of APIs and language features that build on the affordances of futures to make more kinds of engineering achievable than before.
The stuff you listed, along with async/await, makes this possible. Isn't that what TFA is saying?
This might be a silly question, but what would be the downside of a single threaded language similar to javascript, but where every function is just 'async capable' and can always be non-blocking, with no special syntax like async/await?
So every time you call foo(); you anticipate that it might resolve immediately, or it might take some time, but it won't block any other function (unless it actually does some cpu bound operation)
And any time foo(); does something asynchronous, all of its callers, and its callers callers, become implicitly asynchronous too.
Of course you would have to have some primitive for when you actually want to do several things concurrently within the scope of a single function rather than blocking that function, but that doesn't sound too bad and in JavaScript you effectively need to use Promise.all most of the time anyway.
I'm sure there's some major downside I'm missing -- but what is it?
You can do that, Java did recently. You need full control of all the blocking primitives so they can yield the thread.
Where this breaks is with FFI: if you cannot intercept blocking calls in foreign functions, you block useful threads or even deadlock. This is what the quote in the article is about:
> The cost for the native compatibility with the C/system runtime is the “function coloring” problem.
Is stackfulness required for what the grandparent describes? It seems possible to do this stacklessly: all functions implicitly compile down to a state machine, but futures are never visible.
I just want a language where await is the default behavior and I have to specify when I want to capture a future as a value and do something with it later. Most of the time I want an emulation of a synchronous thread of execution even if it's actually async. It's crazy to have to constantly specify the common case and to have the possibility of surprising behavior at best if I accidentally forget to await something somewhere.
"T" and "impl Future<Output = T>" are very different types so unless you're doing very silly things I don't think you can be "surprised" in Rust. The fact that await points are explicitly marked and visible in the code is a feature, not a bug. It's similar to the case for "?" vs. hidden exceptions.
Generally speaking async code in a language that isn't ground up designed around that is far slower, because the language ends up using some relatively expensive mechanism like mutexs. In most languages making 90% of your code 90% slower but "everything async" isn't really a worthwhile general trade off.
Even is your language run time is fully async, it's really hard to do anything useful without touching C/C++/the kernel/system libraries, and as soon as you do that all those guarantees go out the window, because you cannot rewrite the world in $newlang.
Everything else about “why can’t I mix these functions” is a direct result of that broken consistency exposed by the OS. As such I’m not sure there is a zero-cost abstraction (stackful coroutines is close) that any programming language can build to improve the situation.
Meanwhile, the trend is clear - every OS has adopted (and is adopting more) non-blocking syscalls because they are direly needed. The only benefit blocking syscalls offer is sugar to improve syscall ergonomics.
I think if more people talked about it this way it would become clear that adding a “blocking” tag to the syscalls and bubble that tag up the call stack is the right next step to depreciating those legacy OS APIs. I don’t mean to say we should accept poor ergonomics but adding the blocking tag is a great first step to 1. Reminding people of the problem. 2. Identifying areas where research is needed to improve ergonomics and replace the existing “blocking” tag with minimal downsides.
Edit to add: I haven't personally done much work with async Rust yet, but the thing that appeals to me about async Rust is intra-task concurrency. So I'm not just trying to counterbalance the complainers and make you feel better; I really believe that Rust's futures and async/await syntax were not a mistake.
I also like the point that maybe(async) isn't terribly compelling. Recently, I've been looking at building an async parser, and the sync version of the interface is pretty simple: fn parse(r: &mut dyn Read) -> Result<ParsedObj> [1]. In theory, I could slap an async on it, change Read to AsyncRead, and now I have an async version. Except it's not usefully async: the consumer of the data can't do anything until everything has been fully read in. So what you want in the asynchronous version of the API is something that looks like a stream of parse events. But that API is way too much faff for anyone trying to use the API synchronously. Or is it? If you have a high-level method that distilled the stream of output events into a simple, easy-to-use answer, then you end up with something that looks maybe(async)-able [2].
Where I think the author gets it wrong, though, is in asserting that it's statefulness that causes maybe(async) to break down. State per se isn't the problem: our underlying parser above is statefully turning a stream of input data into a stream of parse events, and this could sometimes work well with maybe(async). The problem arises when you're multiplexing many streams at once. In synchronous code, there's just no way to do that kind of multiplexing (as pointed out earlier in the blog post), while it's a pretty key design feature of async code. The trouble really comes in when you have a system whose outer interface is a 1:1 stream interface, but which internally needs to go through a 1:N piece and an N:1 piece. It looks like it could be maybe(async)-able from the outside, but its implementation is hopelessly not maybe(async)-able.
[1] Honestly, a lot of parsers go a step further and just read in the input buffer as a &[u8] and rely on zero-copy, but that interface turns out to be even less useful in an async context.
[2] But the faff remains if you can't resort to one of the pre-canned methods. How much of an issue that ends up being is left to the reader to decide.
This is not uncommon in big-world games. I hit it in my metaverse client. It's a standard Windows feature. Windows file systems have "FindFirstChangeNotificationA", which lets a process monitor a collection of files for modifications. That has scaling issues for large numbers of files, but it's the same problem. Similar ideas exist in the database world. You'd like to be able to ask "tell me when the results of this SELECT change". That's been played with as a concept, but didn't go mainstream.
In this area you can turn a manageable special case into a really hard generic problem. This is useful if you need a thesis topic, less useful if you need a working system.
To be honest, the system that we designed wasn't that dissimilar from promises. It was a dynamic collection of callbacks (using built-in C# events), not unlike then/catch delegates. I've worked on it about 10-12 years ago, exactly when JS world had discussions on what flavour of promises was the best, and it was one of the main sources of inspiration.
* Object wants some asset.
* Asset might be in memory in a cache. If so, satisfy request from cache.
* If asset is not in cache, queue up request, possibly on a priority queue that gets re-prioritized. Merge requests for same asset.
* When request reaches front of queue, see if the things that wanted it still want it. Remove dead requests. Discard entire request if all requestors no longer need it.
* Lock against two asset fetches for same asset being performed simultaneously.
* Fetch and preprocess asset
* Check again that asset is still wanted.
* Deliver asset
* Unlock against two asset fetches for same asset.
This is almost generic, except that once level of detail becomes involved, it gets less generic.
Does anyone have some thoughts on this? I just lack the knowledge to really evaluate that sort of take. There's a lot of discussions on whether async Rust is "good" or "bad", or the right abstraction or not, but it's pretty clear that (at this point) we're more less stuck with it (for better or worse), but that's not what I am asking about.
What I am asking is whether a proper effect system would have genuinely, and legitimately improved or significantly simplified the whole state of Rust async or not, and how. A lot of the async Rust problems sound like "hard" problems, and I'm a bit skeptical that a feature like that would really reduce the "heavy lifting" you have to do.
Discussed most recently @ https://www.abubalay.com/blog/2024/01/14/rust-effect-lowerin... / https://news.ycombinator.com/item?id=39005780 It's a viable approach if you can take the time to get it right. Async rust started out with more of a limited, MVP approach, where they first did the simplest thing that would surely work, and then went on from there.
It does sound very appealing, but I'm not sure there's enough prior art to steal from, even now, but especially not when Rust was hatching. It would probably be too big a gamble. I'm checking out Koka in that space, but it's early days.
This seems... a bit specific? Was it a reference to a specific new language or was that more of a poignant wish?
> You might go do that, in a less than optimal world.
Not very new, seeing as Erlang was open sourced by Ericsson in 1998. (The language itself is from 1986 but was proprietary up to that point.)
Eh bien! mon cher Pangloss, lui dit Candide, quand vous avez été pendu, disséqué, roué de coups, et que vous avez ramé aux galères, avez-vous toujours pensé que tout allait le mieux du monde? Je suis toujours de mon premier sentiment, répondit Pangloss; car enfin je suis philosophe; il ne me convient pas de me dédire, Leibnitz ne pouvant pas avoir tort, et l’harmonie préétablie étant d’ailleurs la plus belle chose du monde, aussi bien que le plein et la matière subtile.
If you're a language/platform like Swift or Rust (see also Erlang or OCaml) you're gonna end up with some kind of userspace task scheduler because OSes don't give you enough control over threads. If you're a systems language you're gonna use stackless coroutines because speed/memory and C interop. In Rust, you use their task scheduler via async/await, but sadly the ergonomics aren't wonderful and there's now weirdo problems like what reactor/engine/whatever they standardize on etc. The interesting and good stuff will be built on top of async/await, probably shouldn't have touted it as the interface.
My hypothesis here is that something like concurrency/parallelism exists on two levels. You've gotta get the underlying implementation right, and you have to expose a mental model and affordances for using it. The problem is that languages that are good at the first thing are usually real bad at the second thing. More broadly I think this is a trilemma--pick any 2 for your language:
- small
- can implement performant concurrency/parallelism
- can express a high-level mental model of concurrency/parallelism
No one should be surprised that Rust will throw "small" overboard here. I think the only question is how many models will they support? Will they do CSP AND actors (they're different!)? Will there be some kind of OpenMP type thing? Will there be new keywords or--shudder--new SYNTAX?
---
I feel like I have to say Go's solution to this is bold and elegant. Giving up straightforward C interop was pretty gutsy, and the affordances for working with its task scheduling system are familiar and intuitive. I want to stress this (because TFA goes way out of its way to be a little shitty w/ Go): this worked absolutely great. Tons of good software is written in Go. It is super effective. It spent its innovation tokens on experience and tooling and that was really successful (i.e. type systems aren't the only way to fight data races). There were exactly zero debates about stuff like await syntax or what engine to use/standardize. In the time it took Rust to ship async/await, probably 7 billion actually useful Go programs were shipped. Just because someone else's values aren't yours doesn't mean they're the wrong values. I think it's cool that people like Boats are being really thoughtful about how to do this and sharing their thoughts and processes out in the open. I just think there's room in this crazy world for at least two points of view, and that we could do with a little less sneering.
Rust builds support for all of these models based on it's basic underlying "shared ^ mutable" and "Send/Sync" safety guarantees. It should be quite clear based on this whether you're sharing access to some read-only data, staying within a single-threaded context with mutability, or relying on some sort of synchronization primitive (be it atomic data, locks, r/w mutexes etc). The fact that we understand some uses of these facilities as involving patterns such as "CSP" or "Actors" is quite beside the point.
> We should aspire not to simplify the system by hiding the differences between futures and threads, but instead to find the right set of APIs and language features that build on the affordances of futures to make more kinds of engineering achievable than before.
The stuff you listed, along with async/await, makes this possible. Isn't that what TFA is saying?
So every time you call foo(); you anticipate that it might resolve immediately, or it might take some time, but it won't block any other function (unless it actually does some cpu bound operation)
And any time foo(); does something asynchronous, all of its callers, and its callers callers, become implicitly asynchronous too.
Of course you would have to have some primitive for when you actually want to do several things concurrently within the scope of a single function rather than blocking that function, but that doesn't sound too bad and in JavaScript you effectively need to use Promise.all most of the time anyway.
I'm sure there's some major downside I'm missing -- but what is it?
Where this breaks is with FFI: if you cannot intercept blocking calls in foreign functions, you block useful threads or even deadlock. This is what the quote in the article is about:
> The cost for the native compatibility with the C/system runtime is the “function coloring” problem.
Even is your language run time is fully async, it's really hard to do anything useful without touching C/C++/the kernel/system libraries, and as soon as you do that all those guarantees go out the window, because you cannot rewrite the world in $newlang.