Eliminating Data Races in Firefox

> What is ThreadSanitizer? ThreadSanitizer (TSan) is compile-time instrumentation to detect data races according to the C/C++ memory model on Linux. It is important to note that these data races are considered undefined behavior within the C/C++ specification.

TSan, ASan, UBSan - if you are writing code and your toolchain supports these, you should use them. Over the past 6-7+ years I've used them, I have never seen a false positive (I'm sure some have occasionally existed but they seem rare in practice).

> However, developers claimed on various occasions that a particular report must be a false positive. In all of these cases, it turned out that TSan was indeed right and the problem was just very subtle and hard to understand.

Yes, I have seen this phenomenon!

inetknght · 5 years ago

Address Sanitizer -- never seen a false positive. Ever.

UBSan -- again never had a false positive.

Thread Sanitizer does have false positives in some circumstances. They're hard to verify to be falsy and in many ways it's better to refactor to eliminate the complaint because you'll end up with better cleaner code. For example, Qt uses those atomic fences that the article describes [0] [1].

[0]: TSan does not produce false positive data race reports when properly deployed, which includes instrumenting all code that is loaded into the process and avoiding primitives that TSan doesn’t understand (such as atomic fences).

[1]: https://lists.qt-project.org/pipermail/interest/2014-March/0...

derf_ · 5 years ago

I have seen false positives in ASAN.

In Firefox, media threads use a reduced stack size (128 kB), due to, historically, 32-bit address-space concerns with lots of media elements on a page. Opus, in its default configuration, requires a fair amount of working stack memory (a few dozen kB in some cases). "A few dozen" is much smaller than 128 kB, but apparently enabling ASAN increases the stack usage, quite a bit. So we got crashes under ASAN when it overflowed the stack that would not have been possible without ASAN enabled.

https://bugzilla.mozilla.org/show_bug.cgi?id=750231

nyanpasu64 · 5 years ago

I've gotten false positives in ASan when trying to instrument a Win32 application code compiled in MSVC using MSVC's then-experimental ASan support, using the wrong combination of the maze of ASan libraries (static library combined with dynamic MFC). And I also get startup crashes when running MSVC ASan binaries in Wine. But it's obscure beta use-cases and some may be fixed by now.

mozdeco · 5 years ago

It would probably be fairly easy to change Qt's Mutex implementation to be TSan-compatible and only do so for TSan builds (by swapping out the fences for atomics when building with TSan). This is what we did in Firefox.

loeg · 5 years ago

UBSan has "false" positives, in that unsigned integer overflow is not C/C++ UB. They've decided to warn about it anyway, and sometimes it could be a bug, so that isn't totally unreasonable.

CodeBrad · 5 years ago

In my opinion, the main limitation of TSan is that it cannot find all possible races. (An impossible task so I think TSan is a pretty great tool to get to the point that this is the main limitation)

As a dynamic tool, if a race is observed during execution by TSan, the race is very likely to be real. But it is impossible to "observe" execution for every part of a large code base, and so real races are likely still hiding in some edge case that TSan did not execute.

Static Analysis tools are interesting in that they have the opposite problem. Static tools can detect races across the entire code base, but without runtime information it is difficult to know if the races are real. This leads to more real races being detected, but also comes with more false positives.

There is a lot of interesting working being done on data race detection and how we can bring these two sides closer together to find more real races, without overwhelming developers with false positives.

mozdeco · 5 years ago

This is absolutely true and hence we combine not only our tests with TSan, but also fuzzing, to explore even more corner cases.

On the static vs. dynamic side, I would always opt for the dynamic when it can guarantee me no false positives, even if the results are incomplete. It is pretty much impossible to deploy a tool that produces lots of false positives because developers usually will reject it at some point and question every result.

inetknght · 5 years ago

> In my opinion, the main limitation of TSan is that it cannot find all possible races.

In my experience TSan finds all possible races if it's built with unit tests and unit tests provide enough coverage. If unit tests don't provide enough coverage then there are likely other bugs lurking beyond just data races.

perryizgr8 · 5 years ago

> Yes, I have seen this phenomenon!

I've had this happen to me personally. Looking again and again at the tsan error and the code it pointed to, I was absolutely sure it was a false positive. Thankfully I got a friend to look at it and after a good night's sleep he was able to explain to me how exactly the code was wrong.

Tl;dr: It is extremely hard to write correct threaded code in C/C++, and it is quite hard to recognize errors even when they are pointed out to you.

> Overall Rust appears to be fulfilling one of its original design goals: allowing us to write more concurrent code safely. Both WebRender and Stylo are very large and pervasively multi-threaded, but have had minimal threading issues. What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions — and those mistakes were simple to fix.

> This is in contrast to many of our C++ races, which often involved things being randomly accessed on different threads with unclear semantics, necessitating non-trivial refactorings of the code.

kibwen · 5 years ago

And when it comes to finding bugs in your unsafe concurrency primitives in Rust programs, there is a tool called Loom that helps with that, as a powerful complement to what ThreadSanitizer provides: https://github.com/tokio-rs/loom . You can think of Loom sort of like QuickCheck, except the permutations are all the possible thread states of your program based on the memory orderings that you have specified.

Here's a video that briefly talks about what Loom can do https://youtu.be/rMGWeSjctlY?t=7519 (it's part of a larger video about memory orderings, and is a great introduction to the topic).

gamegoblin · 5 years ago

Loom is really awesome, though it is focused on exhaustive testing, so not suitable for code that has a lot of possible interleavings (e.g. due to a ton of threads, or a large body of code).

There is a new project out of AWS called Shuttle [1] which is like Loom, but it does random exploration instead of exhaustive exploration, which enables massively distributed testing of really complicated stuff.

[1] https://github.com/awslabs/shuttle

inaseer · 5 years ago

Loom's style of exploration (and that of aws shuttle mentioned below) can be quite effective. Coyote is the equivalent library in the .NET world (formerly known as P#) and comes with a bunch of exploration "schedulers" (from random exploration to probabilistic and even an experimental reinforcement learning based one). They released an animation rich video today (see https://news.ycombinator.com/item?id=26718273). Worth watching even if you're using Rust as Loom and aws shuttle are conceptually similar.

chowells · 5 years ago

Yeah, that was the part that stood out to me as well. I thought it was pretty intuitively obvious that having your data races confined to small segments is better than allowing them to be spread over the entire codebase, but a lot of people act like it negates the advantage of working in a safer language entirely. It's nice to have empirical reports that back up the "obvious" though. Sometimes "obvious" things turn out to be wrong. It's always good to check them.

setr · 5 years ago

> but a lot of people act like it negates the advantage of working in a safer language entirely.

I think the underlying assumption is more along the lines of: the easy cases are made easier, and the hard cases are made impossible — or rather the concern being that rust hasn’t made the problem easier, it just moved it around.

Which is often the case with claims of solving long-lasting problems

WalterBright · 5 years ago

D takes a different approach. D has a type constructor `shared`. Only types marked as `shared` (or `immutable`) can be accessed by multiple threads. If you've got concurrency problems, the scope of the problems are immediately reduced to the subset of the code typed as `shared`.

In addition, `shared` types cannot be accessed directly. They can only be accessed via library routines, such as D's atomic library.

Having an `immutable` data type also helps, as immutable data can be accessed by multiple threads without need for synchronization.

eximius · 5 years ago

It's not that different. Rust has Sync + Send + &refs and &mut refs.

masklinn · 5 years ago

> D takes a different approach. D has a type constructor `shared`. Only types marked as `shared` (or `immutable`) can be accessed by multiple threads.

This is called Sync in rust. There is also a Send trait for types types which can be moved (but not necessarily shared), as there are data items which can not leave their origin thread e.g. thread locals or non-atomically refcounted instances.

aliceryhl · 5 years ago

That description seems very similar to how Rust's type system enforces thread safety. You have two kinds of references: shared and exclusive. The shared references will typically only provide immutable access, whereas if you have an exclusive reference, you can mutate it. Beyond this, there are some special types that can be mutated through shared references, e.g. the AtomicI32 type. Similarly, the Mutex type lets you lock it given only a shared reference and provides you with an exclusive reference when locked.

d110af5ccf · 5 years ago

Did D ever get around to specifying a formal memory model? Because when I tried to adopt D a couple years ago I felt there was a lot of ambiguity surrounding the semantics of `shared`, particularly when interfacing with C and C++ code.

I ended up just casting to and from `shared`, and that seemed to work, but it was pretty verbose for highly parallel code and I was never quite sure under what circumstances doing that might violate the compiler's invariants.

Also such casting appeared to eliminate most of the benefits, since what appeared to be local data might have been cast and had a pointer sent across to another thread elsewhere. In the end `shared`, @nogc antics (the language internals weren't fully annotated), closures requiring the GC (compare to C++), and the friction of interfacing with highly templated C++ code such as the Eigen library caused me to abandon the attempt. I sure learned a lot in the process though!

secondcoming · 5 years ago

Does this approach have to deal with potential recursive locking?

Deleted Comment

evmar · 5 years ago

It's interesting, I read this and had the opposite conclusion -- people are still writing racy code using Rust.

I mean, it's great that you can write threaded code with more confidence, but it seems the conclusion here is that Rust gives you more confidence but not absolute confidence and you still need the normal stable of runtime detection tooling. The second fix[1] in particular resembles the bad pattern of "sprinkle atomics it until it works" you see in non-Rust languages.

(If this comment raises your hackles, please read the "no true Scotsman[2]" fallacy before responding.)

[1]: https://hg.mozilla.org/integration/autoland/rev/044f8b594c71 [2]: https://en.wikipedia.org/wiki/No_true_Scotsman

brundolf · 5 years ago

> What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions

(emphasis mine)

The important thing is that the races only happened in unsafe { } blocks. It's well-established that these blocks should basically be treated like C++ in terms of scrutiny, but (I believe) you can still have roughly "absolute confidence" in non-unsafe code.

It's true and interesting that unsafe { } blocks deserve sanitization analysis like C++ code does - and maybe this hasn't been discussed enough - but I don't think it's fair to suggest that the enormous language benefits are outweighed by some false sense of confidence. The proportions in that idea just seem way out of whack.

est31 · 5 years ago

You are correct. The "no data races" slogan of Rust is accurate but you need to mention the way it needs to be understood. First, data races are only a subset of all race conditions. Second, like the other safety guarantees, the statement only applies to safe Rust. Once you use unsafe Rust or any non-Rust language (and both usually happens in any non trivial program), this guarantee stops being a guarantee. But it doesn't mean it vanishes into nothingness. It instead becomes a quantitative statement: it's much easier to avoid data races in Rust than in unsafe alternatives.

I think when you build Rust projects that require reliability, it's foolish to lean back and believe that marketing slogan and not look for concurrency bugs. However, once you found them, it's way easier to fix them, at least when your unsafety is well encapsulated like how it's recommended.

I think the big challenge for Rust in this context is to improve its tooling so that using it is comparable or even easier than the C++ tooling. This blog post is proof that such tools should be ran otherwise you are missing out on important bugs.

oconnor663 · 5 years ago

> you still need the normal stable of runtime detection tooling

I hate that you're getting so many downvotes, and I wish people wouldn't do that...but I still want to disagree slightly :) I think it's interesting to note that the meaning of "you" changes between the two cases. If I write safe code on top of parking_lot, you could argue that I don't really need to do runtime race detection. Rather, someone in the ecosystem needs to do it, but once it's been done, the shared code is improved for everyone. If my code is all safe, I can benefit from these global improvements over time, and I can be confident that I haven't introduced any new races myself.

stjohnswarts · 5 years ago

They literally found tons of new races conditions in c++ and 2 in the rust code with their tool. And you want to act as if that little factoid makes rust useless somehow. Anyone who thinks a language is flawless is not thinking correctly and that includes rust. However it does a much better job at this than C/C++ yet somehow you twisted that into declaring rust as useless.

dan-robertson · 5 years ago

Another fallacy to keep in mind is that two things not being perfect does not make them equal failures.

pcwalton · 5 years ago

"No true Scotsman" is a fallacy because "true Scotsman" doesn't have a clear definition and can mean anything. But "unsafe" has a very precise meaning in Rust. Safe Rust didn't have any data races, which is exactly what we would expect.

fluidcruft · 5 years ago

Is that "sprinkle atomics" stuff in bindings/interfaces for non-Rust external code? I'm not particularly surprised that you would have to wrap atomics around externalities and not doing that would be a source of errors if the external code has race conditions.

coliveira · 5 years ago

I think the mistaken idea that Rust allows "bug free" multithreaded programs will lead to a new generation of difficult to maintain multithreaded software... The lesson of multithreading should be that you need to use less of it and only when necessary, not that you should write more. It is similar to what we got when Java was the darling against C++: you now have lots of Java programs that don't have the same problems as the C++ version, but still are difficult to understand, leading to the well known FactoryFactory syndrome.

agumonkey · 5 years ago

hopefully it will lead to a new borrow checker logic