>STM is an optimistic concurrency system. This means that threads never block waiting for locks. Instead, each concurrent operation proceeds, possibly in parallel, on their own independent transaction log. Each transaction tracks which pieces of data it has accessed or mutated and if at commit time it is detected that some other transaction has been committed and altered data which this transaction also accessed, then the latter transaction is rolled back and is simply retried.
I already foresaw (and it gets mentioned later), the problem that if you have many small, frequent operations, they will prevent a big, long operation from happening because they will always change the state and cause conflicts before the big one can finish. You can easily code yourself an app that will softlock forever.
The post doesn't offer a good solution (it talks about one where there's a tradeoff, but you don't need tradeoffs for this)
The way this gets fixed it is to make the lock acquisition (or in this case, the priority of the merge?) preemptive (Wound-Wait)
All transactions have a global, ever incrementing number attached to them, their ID, which we call "seniority", if you try to get a a lock, and the lock is held by a transaction with a lower seniority (=> a higher ID), you kill the transaction, take the lock, and once you're done the transaction you killed is allowed to retry
In the meantime, if a transaction with the lower seniority tries to get the lock, it gets blocked
This insures that your program will always finish.
In the case of "lots of frequent small transactions" + "one big, long report", the report will get killed a few times, until your report inevitably becomes the most senior transaction to ask for this resource, and is allowed to complete.
databases also solve this problem by allowing the read only transaction to see a consistent snapshot of the data at some point in time. so the reader can do its work without needing a logical mutex.
With default settings they let other processes' commits cut into the middle of your read. (See READ_COMMITTED.). I was flabbergasted when I read this.
When you turn MSSQL up to a higher consistency level you can get it to grind to a halt and throw Deadlock exceptions on toy code. (Moving money between accounts will do it).
I used to advertise STM as like ACID-databases for shared-memory systems, but I guess it's even more consistent than that.
Yeah, STMs can use MVCC in the same way, but that doesn't solve the long-transaction problem for bulk update transactions. The first example in Gray and Reuter is adding monthly interest to every savings account, exactly once.
I think STM looks pretty cool in toy examples, but in practice it's pretty bad. Very difficult for me to make strong logical argument about this, just based on how it feels.
In Clojure, there are first-class STM primitives with retriable transactions in the standard lib, but every time I tried to use them, the code was slow, difficult to reason about, full of "colored" functions, and non-composable.
Also, unintuitively, the code starts immediately smelling slightly "wrong", as if I tried to put into code my first childish thoughts instead of actually thinking about how the system _should_ work. Like the notorious bank accounts example: yeah, cool, two accounts' balances are TVars — but what if I have 10M accounts? Is the list of accounts also a TVar since sometimes new accounts are created? Are indexes on the list also TVars? How do you persist the lists, how do you synchronize slow persistence and in-memory transactions? How do you synchronize all the deposits with a backup/failover machine? As you continue trying to untangle all this with STM, you start drowning in complexity, and encountering huge contention among transactions touching many parts of your systems, and spewing countless retry logs.
It's not only my own experience — I believe it's widely accepted in Clojure community that nobody actually uses STM, and instead uses simple atomic updates or queues.
I have a similar experience, though it's not necessarily related to shortcomings in the STM implementation (which I always felt was quite competent in the space). To expand, I think a few things are true:
1. I've come to feel that actually managing concurrency is a "below the fold" competency for most engineers. I'm not saying it should be, and people need to understand what a data race is.
2. As you pointed out, most state in services that I work on is not in need of coordinated updates, so atoms are where you want to be, or single-writer queues. 100% agreed
3. Structured concurrency approaches, fibers, and other abstracted runtimes seem to be the best way to get concurrent and parallel work, given points 1 and 2 about how coordinated state really isn't that common for most work on my team
A lot of times when we're updating state we're more having to think about it in the database, employing either a last-write-wins strategy or when coordination is in need, some form of pessimistic or optimistic locking.
I chalk this up to problems with Clojure's STM API specifically rather than STM generally, e.g. Haskell's STM implementation is considerably more useful.
I don't have any experience with STMs in other languages, but as far as I can see, the ideas are very similar in Haskell and Kotlin implementations, for example, so I guess the downsides are the same as well
I found that a lot of the problems I had been having with mutexes, stem from the fact that traditionally the mutex and the data it protects are separate.
Bolting them together, like Rust's Mutex<T> does, solves a lot these problems. It let's you write normal, synchronous code and leave the locking up to the caller, but without making it a nightmare. You can't even access the data without locking the mutex.
This isn't an attack on the (very well written) article though. Just wanted to add my two cents.
Mutexes suffer from a host of problems, and imo are not a very good concurrency primitive - they were designed to turn single-threaded code into multi-threaded. With todays 8+ cores in most systems, usually a single point of contention quickly becomes a problem.
They're liable to deadlocks/livelocks, and sometimes not only with other explicitly Mutex-like things (it might happen some library you use has a lock hidden deep inside).
They're also often backed byOS primitives (with big overheads) with inconsistent behaviors between platforms (spinlocks, waiting etc). We've run into an issue with .NET, that their version of Mutex didn't wake up the blocked thread on Linux as fast as on Windows, meaning we needed about 100x the time to serve a request as the thread was sleeping too long.
There are questions like when to use spinlocks and when to go to wait sleep, which unfortunately the developer has to answer.
Not assigning blame here, just pointing out that threading primitives and behaviors don't translate perfectly between OSes.
Multi-threading is hard, other solutions like queues suffer from issues like backpressure.
That's why I'm skeptical about Rust's fearless concurrency promise - none of these bugs are solved by just figuring out data races - which are a huge issue, but not the only one.
Your view on mutex performance and overhead is outdated, at least for the major platforms: The Rust standard library mutex only requires 5 bytes, doesn't allocate, and only does a syscall on contention. The mutex implementation in the parking_lot library requires just 1 byte per mutex (and doesn't allocate and only does a syscall on contention). This enables very fine-grained, efficient locking and low contention.
Traditionally traditionally, monitors were declared together with the data they contained, and the compiler enforced that the data was not accessed outside the monitor. Per Brinch Hansen wrote a rather bitter broadside against Java's concurrency model when it came out.
> You can't even access the data without locking the mutex.
It's even nicer than that: you can actually access data without locking the mutex, because while you hold a mutable borrow to the mutex, Rust statically guarantees that no one else can acquire locks on the mutex.
Given a data item of non-thread safe type (i.e. not Mutex<T> etc), the borrow checker checks that there's only ever one mutable reference to it. This doesn't solve concurrency as it prevents multiple threads from even having the ability to access that data.
Mutex is for where you have that ability, and ensures at runtime that accesses get serialized.
I find it better to model that as an Actor than a mutex, but I guess it's inherently the same thing, except the actor also allows asynchronous operations.
No. It’s not a property of the type so you can have multiple items under a mutex and you’re not at the mercy of whoever wrote it, it works fine with POD types, it does not force a lock / unlock on each method call (instead the compiler essentially ensures you hold the lock before you can access the data), and the borrow checker is there to ensure you can not leak any sort of sub-states, even though you can call all sorts of helpers which have no requirement to be aware of the locking.
It’s what synchronized classes wish they had been, maybe.
This felt like a blast from the past. At a few times reading this article, I had to go back and check that, yes, it's actually a new article from the year 2025 on STM in Haskell and it's even using the old bank account example.
I remember 15 or 20 years (has it been that long?) when the Haskell people like dons were banging on about: 1) Moore's law being dead, 2) future CPUs will have tons of cores, and 3) good luck wrangling them in your stone age language! Check out the cool stuff we've got going on over in Haskell!
Yeah, remember when we used to care about making better programming languages that would perform faster and avoid errors, instead of just slapping blockchains or AI on everything to get VC money. Good times.
And it's all still true, although I would offer the usual argument that concurrency!=parallelism, and if you reach for threads&STM to try to speed something up, you'll probably have a bad time. With the overhead of GC, STM-retries, false-sharing, pointer-chasing, etc you might have a better time rewriting it single-threaded in C/Rust.
STM shines in a concurrent setting, where you know you'll multiple threads accessing your system and you want to keep everything correct. And nothing else comes close.
To be maximally fair to the Haskell people, they have been enormously influential. Haskell is like Canada: you grow up nicely there and then travel the world to bring your energy to it.
Disabled on most CPUs, plagued by security issues. I haven't used it but I assume debugging would be extremely painful, since any debug event would abort the transaction.
I've been excited about STMs since I read "Composable Memory Transactions" back in 02005, shortly before it was published, and I still believe in the glorious transactional future, but it's difficult to adopt an STM piecemeal; it kind of wants to own your entire program, the same way that garbage collection or nonblocking I/O do, and more so than multithreading with locks. You kind of have to commit entirely to an STM. The efforts in C# to adopt an STM ending around 02010 were a disaster as a result.
The article says a couple of things about STMs that are not true of STMs in general, just true of the Haskell STM the author is familiar with, like a small Brazilian child confidently telling you that almost everyone speaks Portuguese.
One of these is, "STM is an optimistic concurrency system." The possibility of making your concurrency 100% lock-free is one of the most appealing things about STMs, and I think it could be a key to solving the UI latency problem, which just keeps getting worse and worse. Actors and CSP don't normally help here; an Actor is just as "blocking" as a lock. But you can implement an STM with partly pessimistic concurrency, or purely pessimistic, and it might even be a good idea.
Another is, "One last benefit of STM which we haven't yet discussed is that it supports intelligent transaction retries based on conditions of the synchronized data itself." This was an innovation introduced by "Composable Memory Transactions", and many STMs do not support it, including Keir Fraser's awesomely fast version. I am even less certain that it is the correct tradeoff for all uses than I am about purely optimistic synchronization.
But all of this is why I'm rereading Gray and Reuter's Transaction Processing right now after 25 years. With the benefit of 35 years of hindsight, it's a frustrating mix of inspiring long-range vision and myopic boneheadedness. But it shares a lot of hard-won wisdom about problems like long transactions that pop up in a new guise in STMs.
I think writing concurent programs will always be a hard problem, relative to the difficulty of writing non-concurrent programs, and the only "solution" is to isolate, minimize, and regulate contention. The implementation details of TM, locks, monitors, semaphores, actors, message queues, transactions, etc., are at best "distractions", at worst hindrances. I think a good model of a concurrent program, one that lends itself to writing the program simply, will be applicable across many different implementations. Anything that obscures the development of such a model is harmful. Worst of all is the sheer prevalence of shared resources (especially shared memory). Sharing brings contention, so control sharing.
I don't agree that whether you're using TM, shared-memory monitors, or actors with message queues is an implementation detail or that there is a better programming model that hides the differences between them. You can implement any of them on top of any of the others, but you're still programming to whatever model you put on top.
On the problems with STM in C#, see https://joeduffyblog.com/2010/01/03/a-brief-retrospective-on... (I can't believe nobody else has posted this link yet). As with the Chris Penner article, there are a lot of things described as features of STMs in general which are actually just properties of the STM he worked on, which explains some of the things that sound like nonsense if you've only worked with Haskell's STM or Clojure's. (Duffy is much better about delineating the boundaries of the systems he's talking about, though, because he knows there are alternatives.)
interesting, the notation also implies 2005 and 2010 A.D and not B.C, or maybe the notation is about exactly A.D? either way, interesting choice if it was intentional. we say “year 515” without disambiguation right
> A data race occurs any time two threads access the same memory location concurrently and non-deterministically when at least one of the accesses is a write.
From what I understand of the C++ memory model (shared by C and Rust), this is not the definition of data race – a data race occurs when two or more threads access memory concurrently where at least one access is a write and the accesses are unsynchronized. However, synchronized accesses may not have a deterministic ordering, in which case a race condition occurs.
(Confusing as it may be, I believe this is standard terminology.)
Optimistic locking works great when what is excluded is effectively a calculation. The annoyance, though, is you have basically acknowledged that you can use a compare-and-swap at the end of your calculation to know that things worked.
This is not at all always the case, though. Sometimes, what you need to use mutual exclusion for is actively working with something that is, itself, active work. That is, sometimes you have to stop something even starting.
Think about how you would model a laundry mat. It isn't enough to say you could use optimistic locks for access to the machines. You can't use the machine until you can use the machine.
This is not unheard of in computing, either. Sometimes, you can't process a buffer until you know it has been finished by the code before you.
> Think about how you would model a laundry mat. It isn't enough to say you could use optimistic locks for access to the machines.
One machine? Too easy, I want multiple machines!
I want to reserve this machine and that machine. And this is happening while someone else wants to grab that machine and this machine. It costs money to grab a machine, so I'll insert a coin into this machine, and then insert a coin into that machine. The other guy grabbed that machine before me, so am I locked out? It turns out he only had one coin, not two: not only did this mean he untook that machine, but the coin he spent flew out of the machine and back into his pocket. While that was going on, I took a nap - instead of busily waiting for that machines - and the laundromat operator woke me up when it was time to try to grab both machines again.
myActions :: Wallets -> Machines -> STM ()
myActions wallets machines = do
bookMachine "this machine"
bookMachine "that machine"
where
bookMachine :: Machine -> STM ()
bookMachine m = do
reserveMachine machines m
spendCoin
spendCoin :: STM ()
spendCoin = do
decrement wallets
newBalance <- getBalance wallets
when (newBalance < 0)
(throwSTM "Ran out of money")
Right, but my point is that you are still guarding a calculation, there. I suppose it is fair to say that this code just describes the paying for it side of things. My point is that I often want the code to describe everything I can actively do while the machine is in use by someone else.
To that end, if I was to "script out" a trip to the laundry mat, it would not be as simple as "reserve both, when successful, pay." It would be "Confirm driers are working, find an empty washer to fill, if found fill it if not check times and plan to come back later if above a threshold start over when I'm back, otherwise pay it, come back later when it is it done, ...."
I already foresaw (and it gets mentioned later), the problem that if you have many small, frequent operations, they will prevent a big, long operation from happening because they will always change the state and cause conflicts before the big one can finish. You can easily code yourself an app that will softlock forever.
The post doesn't offer a good solution (it talks about one where there's a tradeoff, but you don't need tradeoffs for this)
The way this gets fixed it is to make the lock acquisition (or in this case, the priority of the merge?) preemptive (Wound-Wait)
All transactions have a global, ever incrementing number attached to them, their ID, which we call "seniority", if you try to get a a lock, and the lock is held by a transaction with a lower seniority (=> a higher ID), you kill the transaction, take the lock, and once you're done the transaction you killed is allowed to retry
In the meantime, if a transaction with the lower seniority tries to get the lock, it gets blocked
This insures that your program will always finish.
In the case of "lots of frequent small transactions" + "one big, long report", the report will get killed a few times, until your report inevitably becomes the most senior transaction to ask for this resource, and is allowed to complete.
> That ensures a function can't run forever if it's already clear it can't finish without a conflict.
That just makes them bail out quicker, it doesn't help them ever succeed.
They could, but are typically configured not to.
With default settings they let other processes' commits cut into the middle of your read. (See READ_COMMITTED.). I was flabbergasted when I read this.
When you turn MSSQL up to a higher consistency level you can get it to grind to a halt and throw Deadlock exceptions on toy code. (Moving money between accounts will do it).
I used to advertise STM as like ACID-databases for shared-memory systems, but I guess it's even more consistent than that.
In Clojure, there are first-class STM primitives with retriable transactions in the standard lib, but every time I tried to use them, the code was slow, difficult to reason about, full of "colored" functions, and non-composable.
Also, unintuitively, the code starts immediately smelling slightly "wrong", as if I tried to put into code my first childish thoughts instead of actually thinking about how the system _should_ work. Like the notorious bank accounts example: yeah, cool, two accounts' balances are TVars — but what if I have 10M accounts? Is the list of accounts also a TVar since sometimes new accounts are created? Are indexes on the list also TVars? How do you persist the lists, how do you synchronize slow persistence and in-memory transactions? How do you synchronize all the deposits with a backup/failover machine? As you continue trying to untangle all this with STM, you start drowning in complexity, and encountering huge contention among transactions touching many parts of your systems, and spewing countless retry logs.
It's not only my own experience — I believe it's widely accepted in Clojure community that nobody actually uses STM, and instead uses simple atomic updates or queues.
1. I've come to feel that actually managing concurrency is a "below the fold" competency for most engineers. I'm not saying it should be, and people need to understand what a data race is. 2. As you pointed out, most state in services that I work on is not in need of coordinated updates, so atoms are where you want to be, or single-writer queues. 100% agreed 3. Structured concurrency approaches, fibers, and other abstracted runtimes seem to be the best way to get concurrent and parallel work, given points 1 and 2 about how coordinated state really isn't that common for most work on my team
A lot of times when we're updating state we're more having to think about it in the database, employing either a last-write-wins strategy or when coordination is in need, some form of pessimistic or optimistic locking.
I go into more detail here: https://news.ycombinator.com/item?id=35805138 and https://news.ycombinator.com/item?id=32759886
This isn't an attack on the (very well written) article though. Just wanted to add my two cents.
They're liable to deadlocks/livelocks, and sometimes not only with other explicitly Mutex-like things (it might happen some library you use has a lock hidden deep inside).
They're also often backed byOS primitives (with big overheads) with inconsistent behaviors between platforms (spinlocks, waiting etc). We've run into an issue with .NET, that their version of Mutex didn't wake up the blocked thread on Linux as fast as on Windows, meaning we needed about 100x the time to serve a request as the thread was sleeping too long.
There are questions like when to use spinlocks and when to go to wait sleep, which unfortunately the developer has to answer.
Not assigning blame here, just pointing out that threading primitives and behaviors don't translate perfectly between OSes.
Multi-threading is hard, other solutions like queues suffer from issues like backpressure.
That's why I'm skeptical about Rust's fearless concurrency promise - none of these bugs are solved by just figuring out data races - which are a huge issue, but not the only one.
> they were designed to turn single-threaded code into multi-threaded
not really
> usually a single point of contention quickly becomes a problem.
not generally, no
> They're liable to deadlocks/livelocks,
deadlocks/livelocks are orthogonal to any specific primitive
> They're also often backed byOS primitives (with big overheads) with inconsistent behaviors between platforms (spinlocks, waiting etc).
the mutex as a primitive is orthogonal to any specific implementation...
etc. etc.
http://brinch-hansen.net/papers/1999b.pdf
It's even nicer than that: you can actually access data without locking the mutex, because while you hold a mutable borrow to the mutex, Rust statically guarantees that no one else can acquire locks on the mutex.
https://doc.rust-lang.org/std/sync/struct.Mutex.html#method....
Mutex is for where you have that ability, and ensures at runtime that accesses get serialized.
It’s what synchronized classes wish they had been, maybe.
I remember 15 or 20 years (has it been that long?) when the Haskell people like dons were banging on about: 1) Moore's law being dead, 2) future CPUs will have tons of cores, and 3) good luck wrangling them in your stone age language! Check out the cool stuff we've got going on over in Haskell!
STM shines in a concurrent setting, where you know you'll multiple threads accessing your system and you want to keep everything correct. And nothing else comes close.
The article says a couple of things about STMs that are not true of STMs in general, just true of the Haskell STM the author is familiar with, like a small Brazilian child confidently telling you that almost everyone speaks Portuguese.
One of these is, "STM is an optimistic concurrency system." The possibility of making your concurrency 100% lock-free is one of the most appealing things about STMs, and I think it could be a key to solving the UI latency problem, which just keeps getting worse and worse. Actors and CSP don't normally help here; an Actor is just as "blocking" as a lock. But you can implement an STM with partly pessimistic concurrency, or purely pessimistic, and it might even be a good idea.
Another is, "One last benefit of STM which we haven't yet discussed is that it supports intelligent transaction retries based on conditions of the synchronized data itself." This was an innovation introduced by "Composable Memory Transactions", and many STMs do not support it, including Keir Fraser's awesomely fast version. I am even less certain that it is the correct tradeoff for all uses than I am about purely optimistic synchronization.
But all of this is why I'm rereading Gray and Reuter's Transaction Processing right now after 25 years. With the benefit of 35 years of hindsight, it's a frustrating mix of inspiring long-range vision and myopic boneheadedness. But it shares a lot of hard-won wisdom about problems like long transactions that pop up in a new guise in STMs.
See also https://www.infoq.com/news/2010/05/STM-Dropped/.
Are you planning for this comment to be relevant for long enough that the leading zeros will be helpful for disambiguation?
From what I understand of the C++ memory model (shared by C and Rust), this is not the definition of data race – a data race occurs when two or more threads access memory concurrently where at least one access is a write and the accesses are unsynchronized. However, synchronized accesses may not have a deterministic ordering, in which case a race condition occurs.
(Confusing as it may be, I believe this is standard terminology.)
This is not at all always the case, though. Sometimes, what you need to use mutual exclusion for is actively working with something that is, itself, active work. That is, sometimes you have to stop something even starting.
Think about how you would model a laundry mat. It isn't enough to say you could use optimistic locks for access to the machines. You can't use the machine until you can use the machine.
This is not unheard of in computing, either. Sometimes, you can't process a buffer until you know it has been finished by the code before you.
One machine? Too easy, I want multiple machines!
I want to reserve this machine and that machine. And this is happening while someone else wants to grab that machine and this machine. It costs money to grab a machine, so I'll insert a coin into this machine, and then insert a coin into that machine. The other guy grabbed that machine before me, so am I locked out? It turns out he only had one coin, not two: not only did this mean he untook that machine, but the coin he spent flew out of the machine and back into his pocket. While that was going on, I took a nap - instead of busily waiting for that machines - and the laundromat operator woke me up when it was time to try to grab both machines again.
To that end, if I was to "script out" a trip to the laundry mat, it would not be as simple as "reserve both, when successful, pay." It would be "Confirm driers are working, find an empty washer to fill, if found fill it if not check times and plan to come back later if above a threshold start over when I'm back, otherwise pay it, come back later when it is it done, ...."