> What is ThreadSanitizer? ThreadSanitizer (TSan) is compile-time instrumentation to detect data races according to the C/C++ memory model on Linux. It is important to note that these data races are considered undefined behavior within the C/C++ specification.
TSan, ASan, UBSan - if you are writing code and your toolchain supports these, you should use them. Over the past 6-7+ years I've used them, I have never seen a false positive (I'm sure some have occasionally existed but they seem rare in practice).
> However, developers claimed on various occasions that a particular report must be a false positive. In all of these cases, it turned out that TSan was indeed right and the problem was just very subtle and hard to understand.
Address Sanitizer -- never seen a false positive. Ever.
UBSan -- again never had a false positive.
Thread Sanitizer does have false positives in some circumstances. They're hard to verify to be falsy and in many ways it's better to refactor to eliminate the complaint because you'll end up with better cleaner code. For example, Qt uses those atomic fences that the article describes [0] [1].
[0]: TSan does not produce false positive data race reports when properly deployed, which includes instrumenting all code that is loaded into the process and avoiding primitives that TSan doesn’t understand (such as atomic fences).
In Firefox, media threads use a reduced stack size (128 kB), due to, historically, 32-bit address-space concerns with lots of media elements on a page. Opus, in its default configuration, requires a fair amount of working stack memory (a few dozen kB in some cases). "A few dozen" is much smaller than 128 kB, but apparently enabling ASAN increases the stack usage, quite a bit. So we got crashes under ASAN when it overflowed the stack that would not have been possible without ASAN enabled.
I've gotten false positives in ASan when trying to instrument a Win32 application code compiled in MSVC using MSVC's then-experimental ASan support, using the wrong combination of the maze of ASan libraries (static library combined with dynamic MFC). And I also get startup crashes when running MSVC ASan binaries in Wine. But it's obscure beta use-cases and some may be fixed by now.
It would probably be fairly easy to change Qt's Mutex implementation to be TSan-compatible and only do so for TSan builds (by swapping out the fences for atomics when building with TSan). This is what we did in Firefox.
UBSan has "false" positives, in that unsigned integer overflow is not C/C++ UB. They've decided to warn about it anyway, and sometimes it could be a bug, so that isn't totally unreasonable.
In my opinion, the main limitation of TSan is that it cannot find all possible races. (An impossible task so I think TSan is a pretty great tool to get to the point that this is the main limitation)
As a dynamic tool, if a race is observed during execution by TSan, the race is very likely to be real. But it is impossible to "observe" execution for every part of a large code base, and so real races are likely still hiding in some edge case that TSan did not execute.
Static Analysis tools are interesting in that they have the opposite problem. Static tools can detect races across the entire code base, but without runtime information it is difficult to know if the races are real. This leads to more real races being detected, but also comes with more false positives.
There is a lot of interesting working being done on data race detection and how we can bring these two sides closer together to find more real races, without overwhelming developers with false positives.
This is absolutely true and hence we combine not only our tests with TSan, but also fuzzing, to explore even more corner cases.
On the static vs. dynamic side, I would always opt for the dynamic when it can guarantee me no false positives, even if the results are incomplete. It is pretty much impossible to deploy a tool that produces lots of false positives because developers usually will reject it at some point and question every result.
> In my opinion, the main limitation of TSan is that it cannot find all possible races.
In my experience TSan finds all possible races if it's built with unit tests and unit tests provide enough coverage. If unit tests don't provide enough coverage then there are likely other bugs lurking beyond just data races.
I've had this happen to me personally. Looking again and again at the tsan error and the code it pointed to, I was absolutely sure it was a false positive. Thankfully I got a friend to look at it and after a good night's sleep he was able to explain to me how exactly the code was wrong.
Tl;dr: It is extremely hard to write correct threaded code in C/C++, and it is quite hard to recognize errors even when they are pointed out to you.
Two pieces of information that I found interesting:
* Rust had poor support for sanitizers ("Rust, which has much less mature support for sanitizers)". Work was done to improve this, but it's not clear how well it works now - it would be worth a blog post in itself IMO. This was surprising, because at least I assumed that it works flawlessly thanks to clang.
* Rust / C++ bridges are predictably problematic and weaken Rust's guarantees ("races in C++ code being obfuscated by passing through Rust").
This is the problem to solve for most companies working in a traditionally C and C++ oriented domain, as the C* code will dwarf the Rust code. Because of the shared memory space any problems in that code will be just as critical as if the entire code was unsafe. It would be quite awesome if Rust were able to isolate unsafe code not just at compile time, but also at runtime, by e.g. offering the option of executing the unsafe code in a separate address space. This would require some manual work for moving the data back and forth, but it could offer the tools to support such a work mode e.g. unsafe blocks could take parameters.
I wish Mozilla would be more transparent about the absolutely normal and typical difficulties they must be encountering in a mixed code base. I have the feeling that there's plenty of them, but they don't want to emphasise them in order not to discourage Rust adoption.
In terms of Rust guarantees with C* code; the Rust parts of the code will be fine. Which means if you have an issue, you can easily isolate it to the C* code (probably doesn't help a lot). You can probably isolate things if you use IPC (I believe there is a few crates for that), but it wouldn't prevent the result from being wrong, only from unsafe behaviour stomping the address space. You could likely avoid copy if you relied on SHM for larger data patterns.
On the other hand, integrating Rust and C* isn't terribly hard, there is automated tools that generate bindings for C/Rust so you don't really have to think about it much other than unpacking the unsafe data from C*.
I worked on a mixed C++ and C# codebase where we switched from in-process calls to IPC for that very reason, taking the efficiency hit. Having the .NET runtime running in the same address space confused any sanitizer or leak detector. You rely on those tools to manage a large C++ codebase in practice.
Also, because the .NET GC churns through a lot of memory, statistically almost every problem on the C++ turned up on the C# side first, sometimes with very confusing effects. For example, .NET has a compacting GC, so an object can get corrupted and then randomly moved somewhere else. We didn't have that kind of problem often, but when we did, it was pure hell to debug.
> Overall Rust appears to be fulfilling one of its original design goals: allowing us to write more concurrent code safely. Both WebRender and Stylo are very large and pervasively multi-threaded, but have had minimal threading issues. What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions — and those mistakes were simple to fix.
> This is in contrast to many of our C++ races, which often involved things being randomly accessed on different threads with unclear semantics, necessitating non-trivial refactorings of the code.
And when it comes to finding bugs in your unsafe concurrency primitives in Rust programs, there is a tool called Loom that helps with that, as a powerful complement to what ThreadSanitizer provides: https://github.com/tokio-rs/loom . You can think of Loom sort of like QuickCheck, except the permutations are all the possible thread states of your program based on the memory orderings that you have specified.
Here's a video that briefly talks about what Loom can do https://youtu.be/rMGWeSjctlY?t=7519 (it's part of a larger video about memory orderings, and is a great introduction to the topic).
Loom is really awesome, though it is focused on exhaustive testing, so not suitable for code that has a lot of possible interleavings (e.g. due to a ton of threads, or a large body of code).
There is a new project out of AWS called Shuttle [1] which is like Loom, but it does random exploration instead of exhaustive exploration, which enables massively distributed testing of really complicated stuff.
Loom's style of exploration (and that of aws shuttle mentioned below) can be quite effective. Coyote is the equivalent library in the .NET world (formerly known as P#) and comes with a bunch of exploration "schedulers" (from random exploration to probabilistic and even an experimental reinforcement learning based one). They released an animation rich video today (see https://news.ycombinator.com/item?id=26718273). Worth watching even if you're using Rust as Loom and aws shuttle are conceptually similar.
Yeah, that was the part that stood out to me as well. I thought it was pretty intuitively obvious that having your data races confined to small segments is better than allowing them to be spread over the entire codebase, but a lot of people act like it negates the advantage of working in a safer language entirely. It's nice to have empirical reports that back up the "obvious" though. Sometimes "obvious" things turn out to be wrong. It's always good to check them.
> but a lot of people act like it negates the advantage of working in a safer language entirely.
I think the underlying assumption is more along the lines of: the easy cases are made easier, and the hard cases are made impossible — or rather the concern being that rust hasn’t made the problem easier, it just moved it around.
Which is often the case with claims of solving long-lasting problems
D takes a different approach. D has a type constructor `shared`. Only types marked as `shared` (or `immutable`) can be accessed by multiple threads. If you've got concurrency problems, the scope of the problems are immediately reduced to the subset of the code typed as `shared`.
In addition, `shared` types cannot be accessed directly. They can only be accessed via library routines, such as D's atomic library.
Having an `immutable` data type also helps, as immutable data can be accessed by multiple threads without need for synchronization.
> D takes a different approach. D has a type constructor `shared`. Only types marked as `shared` (or `immutable`) can be accessed by multiple threads.
This is called Sync in rust. There is also a Send trait for types types which can be moved (but not necessarily shared), as there are data items which can not leave their origin thread e.g. thread locals or non-atomically refcounted instances.
That description seems very similar to how Rust's type system enforces thread safety. You have two kinds of references: shared and exclusive. The shared references will typically only provide immutable access, whereas if you have an exclusive reference, you can mutate it. Beyond this, there are some special types that can be mutated through shared references, e.g. the AtomicI32 type. Similarly, the Mutex type lets you lock it given only a shared reference and provides you with an exclusive reference when locked.
Did D ever get around to specifying a formal memory model? Because when I tried to adopt D a couple years ago I felt there was a lot of ambiguity surrounding the semantics of `shared`, particularly when interfacing with C and C++ code.
I ended up just casting to and from `shared`, and that seemed to work, but it was pretty verbose for highly parallel code and I was never quite sure under what circumstances doing that might violate the compiler's invariants.
Also such casting appeared to eliminate most of the benefits, since what appeared to be local data might have been cast and had a pointer sent across to another thread elsewhere. In the end `shared`, @nogc antics (the language internals weren't fully annotated), closures requiring the GC (compare to C++), and the friction of interfacing with highly templated C++ code such as the Eigen library caused me to abandon the attempt. I sure learned a lot in the process though!
It's interesting, I read this and had the opposite conclusion -- people are still writing racy code using Rust.
I mean, it's great that you can write threaded code with more confidence, but it seems the conclusion here is that Rust gives you more confidence but not absolute confidence and you still need the normal stable of runtime detection tooling. The second fix[1] in particular resembles the bad pattern of "sprinkle atomics it until it works" you see in non-Rust languages.
(If this comment raises your hackles, please read the "no true Scotsman[2]" fallacy before responding.)
> What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions
(emphasis mine)
The important thing is that the races only happened in unsafe { } blocks. It's well-established that these blocks should basically be treated like C++ in terms of scrutiny, but (I believe) you can still have roughly "absolute confidence" in non-unsafe code.
It's true and interesting that unsafe { } blocks deserve sanitization analysis like C++ code does - and maybe this hasn't been discussed enough - but I don't think it's fair to suggest that the enormous language benefits are outweighed by some false sense of confidence. The proportions in that idea just seem way out of whack.
You are correct. The "no data races" slogan of Rust is accurate but you need to mention the way it needs to be understood. First, data races are only a subset of all race conditions. Second, like the other safety guarantees, the statement only applies to safe Rust. Once you use unsafe Rust or any non-Rust language (and both usually happens in any non trivial program), this guarantee stops being a guarantee. But it doesn't mean it vanishes into nothingness. It instead becomes a quantitative statement: it's much easier to avoid data races in Rust than in unsafe alternatives.
I think when you build Rust projects that require reliability, it's foolish to lean back and believe that marketing slogan and not look for concurrency bugs. However, once you found them, it's way easier to fix them, at least when your unsafety is well encapsulated like how it's recommended.
I think the big challenge for Rust in this context is to improve its tooling so that using it is comparable or even easier than the C++ tooling. This blog post is proof that such tools should be ran otherwise you are missing out on important bugs.
> you still need the normal stable of runtime detection tooling
I hate that you're getting so many downvotes, and I wish people wouldn't do that...but I still want to disagree slightly :) I think it's interesting to note that the meaning of "you" changes between the two cases. If I write safe code on top of parking_lot, you could argue that I don't really need to do runtime race detection. Rather, someone in the ecosystem needs to do it, but once it's been done, the shared code is improved for everyone. If my code is all safe, I can benefit from these global improvements over time, and I can be confident that I haven't introduced any new races myself.
They literally found tons of new races conditions in c++ and 2 in the rust code with their tool. And you want to act as if that little factoid makes rust useless somehow. Anyone who thinks a language is flawless is not thinking correctly and that includes rust. However it does a much better job at this than C/C++ yet somehow you twisted that into declaring rust as useless.
"No true Scotsman" is a fallacy because "true Scotsman" doesn't have a clear definition and can mean anything. But "unsafe" has a very precise meaning in Rust. Safe Rust didn't have any data races, which is exactly what we would expect.
Is that "sprinkle atomics" stuff in bindings/interfaces for non-Rust external code? I'm not particularly surprised that you would have to wrap atomics around externalities and not doing that would be a source of errors if the external code has race conditions.
I think the mistaken idea that Rust allows "bug free" multithreaded programs will lead to a new generation of difficult to maintain multithreaded software... The lesson of multithreading should be that you need to use less of it and only when necessary, not that you should write more. It is similar to what we got when Java was the darling against C++: you now have lots of Java programs that don't have the same problems as the C++ version, but still are difficult to understand, leading to the well known FactoryFactory syndrome.
I'd go further and say that the ROI is not only "insufficient", it is negligible. Unless your legacy code is written in a language/framework where available experts are disappearing (which is not the case here), there is practically zero value on rewriting code that works as intended.
TSan, ASan, UBSan - if you are writing code and your toolchain supports these, you should use them. Over the past 6-7+ years I've used them, I have never seen a false positive (I'm sure some have occasionally existed but they seem rare in practice).
> However, developers claimed on various occasions that a particular report must be a false positive. In all of these cases, it turned out that TSan was indeed right and the problem was just very subtle and hard to understand.
Yes, I have seen this phenomenon!
UBSan -- again never had a false positive.
Thread Sanitizer does have false positives in some circumstances. They're hard to verify to be falsy and in many ways it's better to refactor to eliminate the complaint because you'll end up with better cleaner code. For example, Qt uses those atomic fences that the article describes [0] [1].
[0]: TSan does not produce false positive data race reports when properly deployed, which includes instrumenting all code that is loaded into the process and avoiding primitives that TSan doesn’t understand (such as atomic fences).
[1]: https://lists.qt-project.org/pipermail/interest/2014-March/0...
In Firefox, media threads use a reduced stack size (128 kB), due to, historically, 32-bit address-space concerns with lots of media elements on a page. Opus, in its default configuration, requires a fair amount of working stack memory (a few dozen kB in some cases). "A few dozen" is much smaller than 128 kB, but apparently enabling ASAN increases the stack usage, quite a bit. So we got crashes under ASAN when it overflowed the stack that would not have been possible without ASAN enabled.
https://bugzilla.mozilla.org/show_bug.cgi?id=750231
As a dynamic tool, if a race is observed during execution by TSan, the race is very likely to be real. But it is impossible to "observe" execution for every part of a large code base, and so real races are likely still hiding in some edge case that TSan did not execute.
Static Analysis tools are interesting in that they have the opposite problem. Static tools can detect races across the entire code base, but without runtime information it is difficult to know if the races are real. This leads to more real races being detected, but also comes with more false positives.
There is a lot of interesting working being done on data race detection and how we can bring these two sides closer together to find more real races, without overwhelming developers with false positives.
On the static vs. dynamic side, I would always opt for the dynamic when it can guarantee me no false positives, even if the results are incomplete. It is pretty much impossible to deploy a tool that produces lots of false positives because developers usually will reject it at some point and question every result.
In my experience TSan finds all possible races if it's built with unit tests and unit tests provide enough coverage. If unit tests don't provide enough coverage then there are likely other bugs lurking beyond just data races.
I've had this happen to me personally. Looking again and again at the tsan error and the code it pointed to, I was absolutely sure it was a false positive. Thankfully I got a friend to look at it and after a good night's sleep he was able to explain to me how exactly the code was wrong.
Tl;dr: It is extremely hard to write correct threaded code in C/C++, and it is quite hard to recognize errors even when they are pointed out to you.
* Rust had poor support for sanitizers ("Rust, which has much less mature support for sanitizers)". Work was done to improve this, but it's not clear how well it works now - it would be worth a blog post in itself IMO. This was surprising, because at least I assumed that it works flawlessly thanks to clang.
* Rust / C++ bridges are predictably problematic and weaken Rust's guarantees ("races in C++ code being obfuscated by passing through Rust"). This is the problem to solve for most companies working in a traditionally C and C++ oriented domain, as the C* code will dwarf the Rust code. Because of the shared memory space any problems in that code will be just as critical as if the entire code was unsafe. It would be quite awesome if Rust were able to isolate unsafe code not just at compile time, but also at runtime, by e.g. offering the option of executing the unsafe code in a separate address space. This would require some manual work for moving the data back and forth, but it could offer the tools to support such a work mode e.g. unsafe blocks could take parameters.
I wish Mozilla would be more transparent about the absolutely normal and typical difficulties they must be encountering in a mixed code base. I have the feeling that there's plenty of them, but they don't want to emphasise them in order not to discourage Rust adoption.
On the other hand, integrating Rust and C* isn't terribly hard, there is automated tools that generate bindings for C/Rust so you don't really have to think about it much other than unpacking the unsafe data from C*.
Also, because the .NET GC churns through a lot of memory, statistically almost every problem on the C++ turned up on the C# side first, sometimes with very confusing effects. For example, .NET has a compacting GC, so an object can get corrupted and then randomly moved somewhere else. We didn't have that kind of problem often, but when we did, it was pure hell to debug.
> This is in contrast to many of our C++ races, which often involved things being randomly accessed on different threads with unclear semantics, necessitating non-trivial refactorings of the code.
Here's a video that briefly talks about what Loom can do https://youtu.be/rMGWeSjctlY?t=7519 (it's part of a larger video about memory orderings, and is a great introduction to the topic).
There is a new project out of AWS called Shuttle [1] which is like Loom, but it does random exploration instead of exhaustive exploration, which enables massively distributed testing of really complicated stuff.
[1] https://github.com/awslabs/shuttle
I think the underlying assumption is more along the lines of: the easy cases are made easier, and the hard cases are made impossible — or rather the concern being that rust hasn’t made the problem easier, it just moved it around.
Which is often the case with claims of solving long-lasting problems
In addition, `shared` types cannot be accessed directly. They can only be accessed via library routines, such as D's atomic library.
Having an `immutable` data type also helps, as immutable data can be accessed by multiple threads without need for synchronization.
This is called Sync in rust. There is also a Send trait for types types which can be moved (but not necessarily shared), as there are data items which can not leave their origin thread e.g. thread locals or non-atomically refcounted instances.
I ended up just casting to and from `shared`, and that seemed to work, but it was pretty verbose for highly parallel code and I was never quite sure under what circumstances doing that might violate the compiler's invariants.
Also such casting appeared to eliminate most of the benefits, since what appeared to be local data might have been cast and had a pointer sent across to another thread elsewhere. In the end `shared`, @nogc antics (the language internals weren't fully annotated), closures requiring the GC (compare to C++), and the friction of interfacing with highly templated C++ code such as the Eigen library caused me to abandon the attempt. I sure learned a lot in the process though!
Deleted Comment
I mean, it's great that you can write threaded code with more confidence, but it seems the conclusion here is that Rust gives you more confidence but not absolute confidence and you still need the normal stable of runtime detection tooling. The second fix[1] in particular resembles the bad pattern of "sprinkle atomics it until it works" you see in non-Rust languages.
(If this comment raises your hackles, please read the "no true Scotsman[2]" fallacy before responding.)
[1]: https://hg.mozilla.org/integration/autoland/rev/044f8b594c71 [2]: https://en.wikipedia.org/wiki/No_true_Scotsman
(emphasis mine)
The important thing is that the races only happened in unsafe { } blocks. It's well-established that these blocks should basically be treated like C++ in terms of scrutiny, but (I believe) you can still have roughly "absolute confidence" in non-unsafe code.
It's true and interesting that unsafe { } blocks deserve sanitization analysis like C++ code does - and maybe this hasn't been discussed enough - but I don't think it's fair to suggest that the enormous language benefits are outweighed by some false sense of confidence. The proportions in that idea just seem way out of whack.
I think when you build Rust projects that require reliability, it's foolish to lean back and believe that marketing slogan and not look for concurrency bugs. However, once you found them, it's way easier to fix them, at least when your unsafety is well encapsulated like how it's recommended.
I think the big challenge for Rust in this context is to improve its tooling so that using it is comparable or even easier than the C++ tooling. This blog post is proof that such tools should be ran otherwise you are missing out on important bugs.
I hate that you're getting so many downvotes, and I wish people wouldn't do that...but I still want to disagree slightly :) I think it's interesting to note that the meaning of "you" changes between the two cases. If I write safe code on top of parking_lot, you could argue that I don't really need to do runtime race detection. Rather, someone in the ecosystem needs to do it, but once it's been done, the shared code is improved for everyone. If my code is all safe, I can benefit from these global improvements over time, and I can be confident that I haven't introduced any new races myself.
[1] https://servo.org/ [2] https://mjtsai.com/blog/2020/08/11/mozilla-layoffs/
Deleted Comment
Deleted Comment
Deleted Comment