Must be this tall to write multi-threaded code (2015)

There was a paper long ago that showed duality between semaphore/locking code and message-queuing code. So folks figured they were the same.

Not so! semaphore/locking is very hard if more than one semaphore exists. Look at the 'dining philosophers problem' and so on.

But queuing! Done right, that can be proved correct through static analysis. Do threads that consume one queue, block on another queue? Draw the graph of queue-blocking - does it loop anywhere? Then you could have a problem.

I.e. if your message-consumers don't block at all, then you cannot have a problem with deadlock.

You CAN have a problem with messages stalling however - languishing on a list waiting for something to complete that might never complete. But at runtime this can be debugged fairly easily - if all your queues are visible to the debugger.

In the semaphore implementation, the locking situation depends on the state of threads and their stacks, which all have to be frisked and back-executed to find out who holds what lock etc. Not always debugger-friendly to do.

I favor a multi-threading environment of threads dedicated to queues, and all the queues visible to the debugger. That kind of setup has never done me dirty.

anonymousDan · 4 years ago

One trick I used to use when writing multithreaded java code for debugging purposes was to never use the built in synchronized/implicit object lock, since that made it quite difficult to understand from a stack trace what object was being locked. Instead, we would define an explicit inner lock and lock on that explicitly. The class name would then show up in stack traces making it much easier to track down.

hinkley · 4 years ago

In cases where not all operations on the objects necessitate the same lock semantics, this is also handy because you can split or nest locks to reduce write-blocks-read and read-blocks-write situations.

But if you're finding yourself using this more than a couple of times in a project, you're probably uncovering violations of the Single Responsibility Principle. Especially if you're splitting locks.

itsmemattchung · 4 years ago

One of my _favorite_ threading papers published in 1982, written by Andrew Birrell:

https://s3.amazonaws.com/content.udacity-data.com/courses/ud...

Reading about that paper in graduate school cleared up a lot of misconceptions I had about threads

rstuart4133 · 4 years ago

But ... semaphores and queues are indeed very similar, and everything you said about a queue is also true for a semaphore. A queue has two fundamental operations:

    trait Queue<T> {
        fn push(t: T)
        fn pop() -> T    // waits until the queue isn't empty, then does the pop()
    }

A semaphore is is just a Queue where T is a constant (typically the unit type, which has zero size). Since it's always the the same thing there is no need to store the actual items as pop() can just create copies as needed, which also means push() can just discard it's argument. That means you can get away with just storing a count of how many items are in the queue, not the items themselves. Now rename:

    push --> signal
    pop --> wait

And bingo, we have a semaphore. Which also means if you are having difficulty reasoning about semaphores but find queues easy to think about, just reverse the name transformation above and semaphores become just as easy.

Rust exploits a similar equivalence with a mapping and a set: a set is just a mapping that maps every key to the unit type.

JoeAltmaier · 4 years ago

The other half of it is, thread blocking on a single queue. And system message queues being a debugger friendly concept (not just a declared local). Code discipline is 2/3 of the solution.

morelisp · 4 years ago

I mean that's a nice theoretical argument, but the empirical results are not so clear.

https://dl.acm.org/doi/10.1145/3297858.3304069

Surprisingly, our study shows that it is as easy to make concurrency bugs with message passing as with shared memory, sometimes even more. For example, around 58% of blocking bugs are caused by message passing.

...

Our study found that message passing does not necessarily make multithreaded programs less error-prone than shared memory. In fact, message passing is the main cause of blocking bugs. To make it worse, when combined with traditional synchronization primitives or with other new language features and libraries, message passing can cause blocking bugs that are very hard to detect.

I'd also add that in the modern world we've "solved" many multi-threading problems by splitting the queues and data stores out into multiple services. Now we don't have any threading problems - only "distributed system" problems, and if there's a bug there that's the architect's fault, not the programmer's!

toast0 · 4 years ago

The results of concurrency bugs with message passing tend to be less bad than the results of concurrency bugs with semaphore/locking code; although this is often because message passing tends to not be shared memory semaphore/locking tends to be shared memory.

Blocking bugs are a service issue, but incorrect locking can lead to deadlock or unprotected writes that corrupt data.

Depending on what they mean by 'detect', I've found it's actually quite easy to detect blocking issues in a message passing environment (Erlang, specifically), if a process has a blocking issue, it will end up stuck waiting for a message that will never arrive; however its mailbox will be growing as new requests are added. Scanning for large mailboxes, as well as growing mailboxes is easily automated and detects the issue. Of course, that's runtime detection; it's hard to detect these with code analysis (because it's not exactly a code problem. It's a system problem, and the system is an emergent property of the running code, it's not explicitly described anywhere, at least in Erlang; this is simultaneously amazingly powerful and limiting)

JoeAltmaier · 4 years ago

Yeah so many services APIs are blocking, it's hard to keep threads non-blocking. Often there is a way, but you have to dig for it.

dekhn · 4 years ago

the only model I use with threads is either pre-static (start a function on each thread with unique parameters, result is returned to a driving function at the end) or message-queue (each thread is blocking, waiting for a message). For the former, I use shared memory, but it's read only and guaranteed to live longer than any of the worker threads that use it (passing a consted ref_ptr). For the latter, I don't share any memory at all; the message itself contains the parameters, fully materialized.

I can ensure all my queues are loop-free (directed acyclic graph of workers reading from their parent nodes and writing to their children nodes). IIRC one of the complaints about the petri net model is that it was unprovable that any problem could finish, even trivial ones.

This has worked pretty well for me and I usually don't have to debug any truly nasty thread problems. My biggest problem tends to be thread pools where one of the workers gets wedged, and there's no way to cleanly kill the worker thread and steal the work onto another thread.

JoeAltmaier · 4 years ago

Particularly network activity! At the lowest levels the creaky old TCP/IP stack not amenable to non-blocking.

throwawaymaths · 4 years ago

> I favor a multi-threading environment of threads dedicated to queues, and all the queues visible to the debugger. That kind of setup has never done me dirty.

This is a basic description of Erlang, and by extension elixir, gleam, caramel, etc.

Plus at least some of them get default access to (useful for 80% of use cases) implementations of backpressure, deadlock timeouts, etc.

Etheryte · 4 years ago

This sounds interesting and I've implemented simple ideas similar to the pattern you describe before, however I haven't read about its use in depth. Do you happen know of an article/book/resource that describes this along with real world experiences? If not, would you mind writing a blog post or article on it please?

JoeAltmaier · 4 years ago

I had the benefit of cutting my teeth at my first job, working on a message-passing OS. CTOS, based on RMX/86 used messages for everything. It was a very early networked computing system. Diskless workstations where file messages simply got forwarded to a server etc. And all the messages and queues were visible in the debugger!

So I learned good thread hygiene right out of school.

bluGill · 4 years ago

I favor that too, but lock free algorithms can in some cases perform perform enough better in the real world.

I've also had to implement my own message queue based on lock free to get data out of an implement handler.

syrrim · 4 years ago

Dining philosophers is solved by always acquiring locks in the same order.

chrisseaton · 4 years ago

...if it's possible to do that for your application! It isn't always, if what lock you need next depends on the value of a previously locked value.

Deleted Comment

btilly · 4 years ago

I favor a multi-threading environment of threads dedicated to queues, and all the queues visible to the debugger. That kind of setup has never done me dirty.

You mean like Go does with channels? (Not sure how good their visibility is to the debugger though.)

lostcolony · 4 years ago

Probably more like Elixir/Erlang, if we're talking programming language. Though even there, BEAM processes are the unit of execution, and are multiplexed on threads. Parent references OS development elsewhere.

Go has a couple of deviations; channels aren't queues unless you are careful to size them > the number of things you'll ever put in them (else the thing trying to put something on a channel deadlocks; you can of course timeout on it, but it presents tight coupling in that case, the goroutine sending has to care about the state of the thing receiving), goroutines aren't dedicated to channels (that is, there is an m-n relationship between channels and goroutines which can lead to a lot of incidental complexity), and, you can see what is on a channel if you have a breakpoint, but that assumes you can get the code to break inside of something containing a reference to the channel.

fmajid · 4 years ago

It's based on Tony Hoare's Communicating Sequential Processes, and is far more comprehensible and possible to reason about than primitives like mutexes and semaphores that are too close to the underlying hardware implementation.

http://www.usingcsp.com

convolvatron · 4 years ago

its a _little_ more costly depending, but otherwise I agree with you. it also clearly delineates what data is shared and everything else can be assumed private...so it helps make the overall architecture more explicit

Low-level thread programming is a minefield. I've been doing IRQ/concurrent programming since the 1980s, and still hate it. Thread bugs are a nightmare.

Concurrency needs to be "baked into" higher-level languages, development APIs, and standard libraries. Basically, covered in bubble-wrap. That's starting to happen. It won't be as efficient as low-level coding, but it is likely to still have a significant impact on the performance of most code.

And folks that are good at low-level threading would be well-served to work to support this.

hinkley · 4 years ago

I studied distributed computing in college, and I spent a lot of time not quite internalizing the fact that since this was an elective at one of the highest ranked state schools in the nation, probably most other people didn't have the same information I did.

I ended up doing a lot of concurrency work early on because I was good at it, but over time the evidence just kept piling up that nobody else could really follow it, and so I've used it less and less over time, and in particular try very hard to keep it away from the main flow of the application. It's more something I pull out for special occasions, or to explain bugs.

Where a lot of frameworks fail is that while re-entrance and concurrency are different problems, they share a lot of qualities, both computationally and with the limits of human reasoning. Recursive code with side effects looks and acts a lot like concurrent code, because here I am looking at some piece of data and the moment I look away some other asshole changed it out from under me. Most frameworks end up being a bad study in recursion, usually in the name of reducing code duplication.

Pure functional people love recursive code, but it's the pure part that avoids the problems with trying to interlace multiple activities at once. Without that, you're trying to eat your dessert without eating your vegetables first.

Deleted Comment

skohan · 4 years ago

Idk I think it is not as difficult as it is sometimes framed. I think the key is to design your program/separation of concerns in such a way as to make it easy to reason about concurrency, and minimize the need for synchronization points.

I think a lot of the problems arise when you just try to write normal synchronous code, or to parallelize code which was originally synchronous, and don't realize the implicit constraints you had been relying on which no longer hold when concurrency is introduced.

hinkley · 4 years ago

Based on a non-scientific study, I think the spatial thinkers do great with concurrency, the visual thinkers do okay, and everyone else is in real trouble. Which reminds me, I need to interview my developer friend with aphantasia about how he feels about concurrency.

Concurrency should be the sizzle and not the steak, otherwise you're reducing the diversity of your team rather substantially. Good people are hard enough to find. Driving the people you have away doesn't generally end well.

ChrisMarshallNY · 4 years ago

FP is good for adapting to threading, but it has difficulties, when applied to things like GUI programming, async communications, or device control (places that need threading).

A lot of languages are getting things like async/await/actor, etc., but even that is still too low-level for a lot of programmers. It can easily turn into a lot of single-threaded code.

It needs to be completely under the surface. Swift does that with generics (I suspect other languages do it, as well). For example, you can say Array<Int>, or [Int]. They mean the same thing, but one does not have the generic syntax.

If we can do similar stuff with things like mutexes and syncing, then it will go a long way towards highly performant, safe, code.

nyanpasu64 · 4 years ago

In audio code, I'd rather use a properly written wait-free SPSC queue, than a least-common-denominator messaging mechanism provided by the standard library like postMessage() (where both the Win32 and JavaScript version suffer from contention and cause audio stuttering, see https://github.com/Dn-Programming-Core-Management/Dn-FamiTra... and https://blog.paul.cx/post/a-wait-free-spsc-ringbuffer-for-th...), though I'm not sure if generic channel/queue objects are as bad in practice. And message-passing (with anything other than primitive types) is a pattern for sharing memory that, if properly implemented and utilized (you don't send a pointer through a channel and access from both threads/goroutines), ensures no more than 1 thread accesses the object in a message at a time.

I think most but not all code can be constructed using primitives like (regular or wait-free) message queues and (RwLock or triple buffer) value cells, but I think all concurrent code which communicates with other threads of execution needs concurrent reasoning to design and write correctly. In my experience, Rust marking data as exclusive or shared is quite helpful for concurrent design and reasoning, whereas prohibiting shared memory altogether reduces performance drastically but is no better at correctness. I think message-passing merely shifts data race conditions into messaging race conditions (but perhaps Go is easier to reason about in practice than I expect). In fact, building programs heavily reliant on message passing between separate OS processes per service (like PipeWire) doesn't suffer from multithreading but rather multiprocessing and distributed systems, making it harder to establish consistent execution states or data snapshots at any point in time, or reason about invariants.

And even code not intended to communicate between threads needs to take care that no state is shared and mutated by another thread on accident (I concede this is easier with JS Web Workers or Rust which restrict shared mutability, than C++, Java, or Go which don't).

samsquire · 4 years ago

Go has a number of problems with regard to data races

https://eng.uber.com/data-race-patterns-in-go/

I think you need shared memory concurrency for performance. There are papers that argue for optimistic concurrency control and blocking concurrency control.

Concurrency Control Performance Modelling Alternatives and Implications

nick_ · 4 years ago

Pony has concurrency baked in via high-perf actor implementation. It's really nice. I believe Go has this as well, but it also has low-level concurrency API as well?

nlewycky · 4 years ago

I don't have a photo but we used to have the same sign at Google, with an added note "unless accompanied by ThreadSanitizer".

At runtime: https://clang.llvm.org/docs/ThreadSanitizer.html

At compile-time: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html

The article points out that that:

  The resulting invariants end up being documented in comments:
  
  // These variables are protected by monitor X:

but you can write those in C++ code and have the compiler check them. That's what thread safety annotations are for.

GuB-42 · 4 years ago

I use Valgrind's helgrind.

In fact, Valgrind is generally my favorite debugging tool: memcheck (default) for memory errors and leaks, callgrind for profiling, massif for memory profiling and finding "stealth" leaks, helgrind for multi-threading and a few others that I use less often.

The great thing about Valgrind is that it doesn't require any instrumentation at build time (though debug symbols are recommended), just run "valgrind your_program". Behind the scenes, it is actually a VM, that's how it can work with binaries directly. In theory it works with any language, and no need to do anything funny with libraries, kernel stuff, etc...

The biggest problem is that the performance penalty is huge, 10x is typical but 100x is not uncommon. ThreadSanitizer is slow but not that slow, I don't know which one is the best at finding issues, I think they are on par, but when you are an a particularly hairy problem, it is good to have both.

wallstprog · 4 years ago

valgrind is great, but be aware that valgrind runs single-threaded, which definitely changes the way multi-threaded apps behave: https://valgrind.org/docs/manual/manual-core.html#manual-cor...

In practice what this means is that cpu-bound threads can monopolize the single thread, and we've had thread starvation issues as a result in some tests. Setting the "-fair-sched" option resolved those issues, but made valgrind run even slower ...

kubanczyk · 4 years ago

TIL that, so predictably, "[Go's] race detector is based on the C/C++ ThreadSanitizer runtime library". So, I can definitely confirm its usefulness. https://go.dev/blog/race-detector

macintux · 4 years ago

> In this approach, threads own their data, and communicate with message-passing. This is easier said than done, because the language constructs, primitives, and design patterns for building system software this way are still in their infancy.

“Infancy”? Erlang was invented in the 80s.

tonnydourado · 4 years ago

OP does say system software, and although Erlang can have amazing uptime, I'm not so sure about it's raw performance story (on the other hand, I do remember seeing some insane benchmarks from a elixir framework, so, maybe I'm wrong).

Also, other than Erlang and Elixir, which other reasonably mainstream language has this as first class features? Even Rust doesn't really put queues and message passing front and center, it just makes it easier to get away with doing traditional thread programming.

Things can be old in years since discovery/invention, but still in it's infancy in usability and adoption.

Go has channels (`chan`) which it considers more native than e.g. Mutex. The latter is in a stdlib, the former is a feature of the language proper.

Alas, the Go ecosystem could use more channels. I'd say that the adoption is still at the infancy stage. I wonder whether there are any practical shortcomings or is it just a readability tradeoff (the Mutex tends to be super-readable, where the channels not so).

Lammy · 4 years ago

> Also, other than Erlang and Elixir, which other reasonably mainstream language has this as first class features?

Ruby 3.0+ with Ractors: https://docs.ruby-lang.org/en/master/ractor_md.html#label-Co...

bjourne · 4 years ago

Truth be told, it is trivially easy to create deadlocks even in a pure message-passing environment such as Erlang. Message-passing is way oversold and doesn't solve as many problems as people think.

dxhdr · 4 years ago

> Message-passing is way oversold and doesn't solve as many problems as people think.

My experience with Erlang has been that it solves a great deal of concurrency problems and is way undersold.

mbrodersen · 4 years ago

It is trivially easy to make any technology not work correctly. That doesn’t mean that those technologies are not worth using.

whalesalad · 4 years ago

Message passing is oversold in many environments that aren’t Erlang but I think it’s safe to say they’re the original, gold standard.

moonchild · 4 years ago

And the CSP paper dates to 1978.

kerblang · 4 years ago

Most of us work in information systems, which is to say, doing database stuff all day, usually with an RDBMS like MySQL, Postgres etc. And what are we doing? Grabbing a bunch of transactional locks. Against what? Shared mutable state, and not in-memory state, but database state counting into the thousands, millions, billions or more records. And if we do it wrong? Deadlock, often deadlock across multiple processes. And what do we do about that? We deal with it, because what else can you do?

You could of course throw transactions away and declare them more trouble than they're worth, but that probably won't go over well. I've found it helps to have a "consistent first lock", so for example in a consumer-based system you'd lock the customer record first, because most transactional operations happen in that context. If you always have that consistent first lock then deadlock can't happen.

My point is that if I assert "multithreading is unacceptable!", most of business information systems goes out the window on the same principle, because the locking is even more dastardly - multi-process and persistent state instead of in-process in-memory state. I don't think you could throw actors/schmactors/coroutines/goroutines/etc at such usefully either. And if you said "Well only the best programmers need apply," well BIS is not exactly hotshot heaven, by and large. It's a bunch of grunts.

So I agree multithreading is hard when you are locking many things at once, which I try to avoid. But I just don't get this thing of trying to banish it or make it the exclusive domain of geniuses.

kaba0 · 4 years ago

Databases (and I would argue runtimes, compilers, etc.) are written in a much much higher quality than your regular CRUD webapp, by usually much more experienced developers, with proper architectural plans and they are sometimes even model checked by TLA+ or the like.

That's beside the point, as I'm not talking about RDBMS bugs. I can corrupt database data when the RDBMS is working exactly as it's supposed to. Strong types and various constraints like foreign key, unique, etc. will mitigate, but there's infinite ways to make a mess beyond that.

xigency · 4 years ago

I don’t really support this view.

While doing anything creative with threading is in fact challenging, in Java it’s easy to safely boost performance with threading using parallel streams and concurrent data structures.

Even in other languages like C/C++, it’s mostly worth it to design something up-front that is naturally parallelizable and find a safe and correct way to do it.

And the incentive to parallelize is only increasing with the increasing thread counts of newer hardware. Sure, it’s scary, but there’s really no excuse to avoid running with threading, multiple processes, or multiple services.

jayd16 · 4 years ago

In my experience I saw a bunch of novice devs using parallel streams for N<=100 lists and such, for much worse performance. Its certainly not foolproof either or it would just be the default.

marginalia_nu · 4 years ago

To be fair, whether parallel streams is useful isn't a function of N alone, but the time each operation takes. If it's a nontrivial calculation, then by all means.

Although parallel streams are seldom the best option in terms of performance even for large N. They may often be faster than sequential processing for large N, but a more thought out approach is almost always faster still, often significantly so.

eftychis · 4 years ago

The problem with concurrency and parallelism is ownership — mainly that is write access than coherence. I have seen again and again the bad habit of people to bypass well defined set ups of data ownership because they feel they are smarter, it is unnecessary, we will figure it out down the road, and my favourite: it doesn’t solve deadlocks so why use anything as some bug families remain.

The issue is not of frameworks or type systems but of companies and people. Engineers know this stuff by training, but either are new and commit hubris or management makes them (or maybe some of them are simply untrained — and somehow never went to a bank (queues)).

P.S. For fun instance, I have seen unsafe used to bypass Rust’s type and ownership system to keep track of db snapshots in a “nice setup.” Rust was not wrong.

I don't know, maybe it's the sadist in me but I enjoy the challenge of writing fast and correct multi-threaded code. It's a valuable skill if you can master it.

layer8 · 4 years ago

You mean masochist? ;)

I’m similar, though I wish programming languages would gives us more facilities to statically double-check the correctness reasoning.

oriolid · 4 years ago

Masochist? Think about the poor programmer who has to maintain GP's code :)

hardlianotion · 4 years ago

Nah. Parent is thinking about users.

SilasX · 4 years ago

Skill? It can't be reduced to following easily checkable heuristics and rules?

astrange · 4 years ago

There are unsafe primitives out there like condition variables and anything lockless (like audio programming).

I don't claim to have mastered it!