10% of Firefox crashes are caused by bitflips

I've told this story before on HN, but my biz partner at ArenaNet, Mike O'Brien (creator of battle.net) wrote a system in Guild Wars circa 2004 that detected bitflips as part of our bug triage process, because we'd regularly get bug reports from game clients that made no sense.

Every frame (i.e. ~60FPS) Guild Wars would allocate random memory, run math-heavy computations, and compare the results with a table of known values. Around 1 out of 1000 computers would fail this test!

We'd save the test result to the registry and include the result in automated bug reports.

The common causes we discovered for the problem were:

- overclocked CPU

- bad memory wait-state configuration

- underpowered power supply

- overheating due to under-specced cooling fans or dusty intakes

These problems occurred because Guild Wars was rendering outdoor terrain, and so pushed a lot of polygons compared to many other 3d games of that era (which can clip extensively using binary-space partitioning, portals, etc. that don't work so well for outdoor stuff). So the game caused computers to run hot.

Several years later I learned that Dell computers had larger-than-reasonable analog component problems because Dell sourced the absolute cheapest stuff for their computers; I expect that was also a cause.

And then a few more years on I learned about RowHammer attacks on memory, which was likely another cause -- the math computations we used were designed to hit a memory row quite frequently.

Sometimes I'm amazed that computers even work at all!

Incidentally, my contribution to all this was to write code to launch the browser upon test-failure, and load up a web page telling players to clean out their dusty computer fan-intakes.

PunchyHamster · 7 days ago

> Several years later I learned that Dell computers had larger-than-reasonable analog component problems because Dell sourced the absolute cheapest stuff for their computers; I expect that was also a cause.

Case in point: I was getting memory errors on my gaming machine, that persisted even after replacing the sticks. It caused windows bluesreen maybe once a month so I kinda lived with it as I couldn't afford to replace whole setup (I theoretized something on motherboard is wrong)

Then my power supply finally died (it was cheap-ish, not cheap-est but it had few years already). I replaced it, lo and behold, memory errors were gone

versteegen · 7 days ago

I'm surprised "faulty PSU" is not on GP's list of common problems. Almost every unstable computer I've ever experienced has been due to either a dying PSU (not an under-specced one) or dying power conversion capacitors on the motherboard.

dvngnt_ · 8 days ago

GW1 was my childhood. The MMO with no monthly fees appealed to my Mom and I met friends for years. The 8 skill build system was genius, as was the cut scenes featuring your player character. If there's ever a 3rd game I would love to see something allowing for more expression through build creation though I could see how that's hard to balance.

alexchantavy · 7 days ago

The PvP was so deep too. You would go 4v4 or 8v8 and coordinate a “3, 2, 1 spike” on a target so that all your damage would arrive at the same time regardless of spell windup times and be too much for the other team’s healer to respond to.

Could also fake spike to force the other team’s healer to waste their good heal on the wrong player while you downed the real target. Good times.

ndesaulniers · 8 days ago

I still remember summoning flesh golems as a necromancer! Too much of my life sunk into GW1. Beat all 4(?) expansions. Logged in years later after I finally put it down to find someone had guessed my weak password, stole everything, then deleted all my characters. C'est la vie.

jiggunjer · 8 days ago

Didn't they launch a remake of gw1 recently. Maybe I can get my kids hooked on that instead of this Roblox crap.

dpe82 · 8 days ago

As a mobile dev at YouTube I'd periodically scroll through crash reports associated with code I owned and the long tail/non-clustered stuff usually just made absolutely no sense and I always assumed at least some of it was random bit flips, dodgy hardware, etc.

Cthulhu_ · 7 days ago

I heard the same thing from a colleague who worked on a Dutch banking app, they were quite diligent in fixing logic bugs but said that once you fix all of those, the rest is space rays.

As an aside, Apple and Google's phone home crash reports is a really good system and it's one factor that makes mobile app development fun / interesting.

grishka · 7 days ago

For the Mastodon Android app, I also sometimes see crashes that make no sense. For example, how about native crashes, on a thread that is created and run by the system, that only contains system libraries in its stack trace, and that never ran any of my code because the app doesn't contain any native libraries to begin with?

Unfortunately I've never looked at crashes this way when I worked at VKontakte because there were just too many crashes overall. That app had tens of millions of users so it crashed a lot in absolute numbers no matter what I did.

Analemma_ · 8 days ago

There's a famous Raymond Chen post about how a non-trivial percentage of the blue screen of death reports they were getting appeared to be caused by overclocking, sometimes from users who didn't realize they had been ripped off by the person who sold them the computer: https://devblogs.microsoft.com/oldnewthing/20050412-47/?p=35.... Must've been really frustrating.

jnellis · 8 days ago

This was a design choice by AMD at the time for their Athlon Slot A cpus. Use the same slot A board which you could set the cpu speed by bridging a connections. Since the Slot A came in a package, you couldn't see the actual cpu etching. So shady cpu sellers would pull the cover off high speed cpus, and put them on slow speed cpus after overclocking them to unstable levels.

projektfu · 8 days ago

E.g., running a Pentium 75, at 75MHz.

Helmut10001 · 8 days ago

I don't understand why ECC memory is not the norm these days. It is only slightly more expensive, but solves all these problems. Some consumer mainboards even support it already.

Agingcoder · 7 days ago

No it doesn’t :-)

I’ve had plenty of servers with faulty ecc dimms that didn’t trigger , and would only show faults when actual memory testing. I had a hard time convincing some of our admins the first time ( ‘no ecc faults you can’t be right ‘ ) but I won the bet.

Edit: very old paper by google on these topics. My issues were 6-7 years ago probably.

https://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

hurfdurf · 7 days ago

Why? Intel making and keeping it workstation/Xeon-exclusive for a premium for too long. And AMD is still playing along not forcing the issue with their weird "yeah, Zen supports it, but your mainboard may or may not, no idea, don't care, do your own research" stance. These days it's a chicken and egg problem re: price and availability and demand. See also https://news.ycombinator.com/item?id=29838403

colechristensen · 8 days ago

Bit flips do not only happen inside RAM

Also, in a game, there is a tremendously large chance that any particular bit flip will have exactly 0 effect on anything. Sure you can detect them, but one pixel being wrong for 1/60th of a second isn't exactly ... concerning.

The chance for a bit flip to affect a critical path that is noticeable by the player is very low, and quite a bit lower if you design your game to react gracefully. There's a whole practice of writing code for radiation hardened environments that largely consists of strategies for recovering from an impossible to reach state.

PunchyHamster · 7 days ago

In case of Intel it's mostly coz they want to sell it as enterprise/workstation feature and make people pay extra.

AMD has been better on it but BIOS/mobo vendors not so much

Dylan16807 · 7 days ago

Well for DDR5 that's 25% more chips which isn't great even if you don't get ripped off by market segmentation.

It's possible DDR6 will help. If it gets the ability to do ECC over an entire memory access like LPDDR, that could be implemented with as little as 3% extra chip space.

epx · 7 days ago

And checksummed filesystems.

sznio · 7 days ago

What I'm wondering, even without ECC, afaik standard ram still has a parity bit, so a single flip should be detected. With ECC it would be fixed, without ECC it would crash the system. For it to get through and cause an app to malfunction you need two bit flips at least.

bell-cot · 7 days ago

Talk to someone in consumer sales about customer priorities. A bit-cheaper computer? Or one which which is, in theory, more resilient against some rare random sort of problem which customers do not see as affecting them.

jodrellblank · 7 days ago

This is getting off-topic but I’m amazed by this ability to reach out to computers around the world as a sensor array and infer things we can’t easily find out in other ways. It’s in popular culture and HN comments most often as spyware and mass surveillance of people, and that’s a bit of a shame.

GPS location and movement data is what gives Google maps its near-real-time view of traffic on all roads, and busy-ness of all shops.

I think they collect location data from people riding public transport so they can tell you how long people wait on average at bus stops before getting on a bus.

Does Google collect atmospheric pressure readings from phone altimeters and use it for weather models? Could they?

Kindle collects details on books people read, how far they read, where they stop, which sections they highlight and quote, which words they look up in dictionaries.

I wonder if anyone’s curated a list of things like this which do happen or have been tried, excluding the “gathers user data for advertising” category which would become the biggest one, drowning out everything else.

I think current phones use accelerometer data to detect possible car crashes and call emergency services. Google could use that in aggregate to identify accident blackspots but I don’t know if they do. But that would be less useful because the police already know everywhere a big accident happens because people call the police. So that’s data easily found a different way.

seanw444 · 7 days ago

> It’s in popular culture and HN comments most often as spyware and mass surveillance of people, and that’s a bit of a shame.

I don't know whether you mean it's a shame that people consider it spyware, or if you meant that it's a shame that it manifests as spyware typically. I agree with the latter, not the former. It usually is spyware. If companies went for simple opt-in popups with a brief description of the reasoning, I'd be all for that. I sometimes opt-in to these requests myself, despite being a fairly privacy-conscious person, because I understand the benefit they have to the people collecting the data for good purposes. But when surveillance is opt-out (or no choice given), it's just spyware.

MBCook · 7 days ago

Doesn’t Google also use the phone accelerometer to try and spot earthquakes?

mobilio · 8 days ago

Yup!

I've read this decade ago... https://www.codeofhonor.com/blog/whose-bug-is-this-anyway

john_strinlai · 8 days ago

for people that dont know, www.codeofhonor.com is netcoyotes (the gp comment) blog, and there is some good reading to be had there

Modified3019 · 8 days ago

Thanks to asrock motherboards for AMD’s threadripper 1950x working with ECC memory, that’s what I learned to overclock on.

I eventually discovered with some timings I could pass all the usual tests for days, but would still end up seeing a few corrected errors a month, meaning I had to back off if I wanted true stability. Without ECC, I might never have known, attributing rare crashes to software.

From then on I considered people who think you shouldn’t overlock ECC memory to be a bit confused. It’s the only memory you should be overlocking, because it’s the only memory you can prove you don’t have errors.

I found that DDR3 and DDR4 memory (on AMD systems at least) had quite a bit of extra “performance” available over the standard JEDEC timings. (Performance being a relative thing, in practice the performance gained is more a curiosity than a significant real life benefit for most things. It should also be noted that higher stated timings can result in worse performance when things are on the edge of stability.)

What I’ve noticed with DDR5, is that it’s much harder to achieve true stability. Often even cpu mounting pressure being too high or low can result in intermittent issues and errors. I would never overclock non-ECC DDR5, I could never trust it, and the headroom available is way less than previous generations. It’s also much more sensitive to heat, it can start having trouble between 50-60 degrees C and basically needs dedicated airflow when overclocking. Note, I am not talking about the on chip ECC, that’s important but different in practice from full fat classic ECC with an extra chip.

I hate to think of how much effort will be spent debugging software in vain because of memory errors.

monster_truck · 8 days ago

DDR4 and 5 both have similar heat sensitivity curves which call for increased refresh timings past 45C.

Some of the (legitimately) extreme overclockers have been testing what amounts to massive hunks of metal in place of the original mounting plates because of the boards bending from mounting pressure, with good enough results.

On top of all of this, it really does not help that we are also at the mercy of IMC and motherboard quality too. To hit the world records they do and also build 'bulletproof', highest performance, cost is no object rigs, they are ordering 20, 50 motherboards, processors, GPUs, etc and sitting there trying them all, then returning the shit ones. We shouldn't have to do this.

I had a lot of fun doing all of this myself and hold a couple very specific #1/top 10/100 results, but it's IMHO no longer worth the time or effort and I have resigned to simply buying as much ram as the platform will hold and leaving it at JEDEC.

golem14 · 8 days ago

Hmm, I wonder if we see, now since we are in a RAM availability crisis, more borderline to bad RAMs creep into the supply chain.

If we had a time series graph of this data, it might be revealing.

bpye · 7 days ago

Similar experience. I played with overclocking the DDR5 ECC memory I have on my system, it would appear to be stable and for quite a while it would be. But after a few days I'd notice a handful of correctable errors.

I now just run at the standard 5600MHz timing, I really don't find the potential stability trade off worth it. We already have enough bugs.

kmeisthax · 8 days ago

> From then on I considered people who think you shouldn’t overlock ECC memory to be a bit confused. It’s the only memory you should be overlocking, because it’s the only memory you can prove you don’t have errors.

This attitude is entirely corporate-serving cope from Intel to serve market segmentation. They wanted to trifurcate the market between consumers, business, and enthusiast segments. Critically, lots of business tasks demand ECC for reliability, and business has huge pockets, so that became a business feature. And while Intel was willing to sell product to overclockers[0], they absolutely needed to keep that feature quarantined from consumer and business product lines lest it destroy all their other segmentation.

I suspect they figured a "pro overclocker" SKU with ECC and unlocked multipliers would be about as marketable as Windows Vista Ultimate, i.e. not at all, so like all good marketing drones they played the "Nobody Wants What We Aren't Selling" card and decided to make people think that ECC and overclocking were diametrically supposed.

[0] In practice, if they didn't, they'd all just flock to AMD.

jug · 8 days ago

As a community alpha tester of GW1, this was a fun read! Such an educational journey and what a well organized and fruitful one too. We could see the game taking shape before our eyes! As a European, I 100% relied on being young and single with those American time zones. :D Tests could end in my group at like 3 am, lol.

netcoyote · 8 days ago

Oh yeah, those were some good times. It was great getting early feedback from you & the other alpha testers, which really changed the course of our efforts.

I remember in the earlier builds we only had a “heal area” spell, which would also heal monsters, and no “resurrect” spell, so it was always a challenge to take down a boss and not accidentally heal it when trying to prevent a player from dying.

aiiane · 7 days ago

I remember one of the first impressions I had in GW1 during test events was the sense of scale in the world that still managed to avoid excessive harsh geometry angles for the most part. Not surprised to hear it was pushing more polygons than average.

P.S. GW1 remains one of my favorite games and the source of many good memories from both PvP and PvE. From fun stories of holding the Hall of Heroes to some unforgettable GvG matches, y'all made a great game.

pndy · 8 days ago

I didn't expect to read bits of GW story here from one of the founders - thanks!

arprocter · 8 days ago

>Sometimes I'm amazed that computers even work at all!

Funny you say this, because for a good while I was running OC'd RAM

I didn't see any instability, but Event Viewer was a bloodbath - reducing the speed a few notches stopped the entries (iirc 3800MHz down to 3600)

cookiengineer · 8 days ago

I kind of wanted to confirm that. At that time I was still using a Compaq business laptop on which I played Guild Wars.

The Turion64 chipset was the worst CPU I've ever bought. Even 10 years old games had rendering artefacts all over the place, triangle strips being "disconnected" and leading to big triangles appearing everywhere. It was such a weird behavior, because it happened always around 10 minutes after I started playing. It didn't matter _what_ I was playing. Every game had rendering artefacts, one way or the other.

The most obvious ones were 3d games like CS1.6, Guild Wars, NFSU(2), and CC Generals (though CCG running better/longer for whatever reason).

The funny part behind the VRAM(?) bitflips was that the triangles then connected to the next triangle strip, so you had e.g. large surfaces in between houses or other things, and the connections were always in the same z distance from the camera because game engines presorted it before uploading/executing the functional GL calls.

After that laptop I never bought these types of low budget business laptops again because the experience with the Turion64 was just so ridiculously bad.

samiv · 7 days ago

Plot twist. The memory bit flip checking code was actually buggy and contained UB.

No, seriously did you actually verify the code for correctness before relying on it's results?

monster_truck · 8 days ago

Every interesting bug report I've read about Guild Wars is Dwarf Fortress tier. A very hardcore, longtime player who was recounting some of the better ones to me shared a most excellent one wrt spirits or ghosts, some sort of player summoned thing that were sticking around endlessly and causing OOM errors?

Dylan16807 · 7 days ago

> And then a few more years on I learned about RowHammer attacks on memory, which was likely another cause -- the math computations we used were designed to hit a memory row quite frequently.

For that one I'd guess no, because under normal circumstances hot locations like that will stay in cache.

taneq · 8 days ago

Wow, that’s really interesting! I always suspected bit flips happened undetected way more than we thought, so it’s great to get some real life war stories about it. Also thanks for Guild Wars, many happy hours spent in GW2. :)

nxobject · 7 days ago

Oh god yes… Dell OptiPlexes and bad caps went together in those days. I’m half convinced Valve put the gray towers in Counter-Strike so IT employees wasting time could shoot them up for therapy.

Agentlien · 8 days ago

That's a really cool anecdote. The overclock makes sense. When we released Need For Speed (2015) I spent some time in our "war room", monitoring incoming crash reports and doing emergency patches for the worst issues.

The vast majority of crashes came from two buckets:

1. PCs running below our minimum specs

2. Bugs in MSI Afterburner.

kasabali · 7 days ago

> Bugs in MSI Afterburner.

Do you mean the OSD?

andrepd · 7 days ago

Amazing story! Reminds me of old gamasutra posts like these https://web.archive.org/web/20170522151205/http://www.gamasu...

PaulHoule · 7 days ago

Back in the 90's I had an overclocked AMD486 machine which seemed OK most of the time but had segfaults compiling the Linux kernel. I sent in a bug report and Alan Cox closed it saying it was the fault of my machine being overclocked.

I dialed the machine back to the rated speed but it failed completely within 6 months.

benatkin · 7 days ago

Yikes. Dude, you're getting a Packard Bell.

jiggawatts · 8 days ago

Some multiplayer real-time strategy (RTS) games used deterministic fixed-point maths and incremental updates to keep the players in sync. Despite this, there would be the occasional random de-sync kicking someone out of a game, more than likely because of bit flips.

netcoyote · 8 days ago

For RTS games I wish we could blame bit flips, but more typically it is uninitialized memory, incorrectly-not-reinitialized static variables, memory overwrites, use-after-free, non-deterministic functions (eg time), and pointer comparisons.

God I love C/C++. It’s like job security for engineers who fix bugs.

sidewndr46 · 7 days ago

Well wow I wasn't expecting to see yet another story from Patrick Wyatt here in the comments! Much appreciated, I've enjoyed reading everything you've written over the years.

fennecbutt · 7 days ago

That's awesome. But also guild waaars, GW2 I played from beta for years, but it just got boring. Endless expansions with weird story.

We need GW3 already but my fear is mmo as a genre is dying.

uncSoft · 7 days ago

They just need to call it GW Classic apparently and it will sell

SunnyNeon · 7 days ago

How did you determine which of the causes it was?

danielEM · 7 days ago

> problems because Dell sourced the absolute cheapest stuff for their computers;

Price itself has nothing to cause problems, it is either bad design or false or incomplete data on datasheets or all of it. Please STOP spreading this narrative, the right thing is to make ads, datasheets, marketing materials etc, etc to tell you the truth that is necessary for you to make proper decision as client/consumer.

hsbauauvhabzb · 8 days ago

Did you/he ever consider redundant allocation for high value content and hash checks for low value assets that are still important?

I imagine the largest volume of game memory consumption is media assets which if corrupted would really matter, and the storage requirement for important content would be reasonably negligible?

nomel · 8 days ago

I think the most reasonable take would be to just tell the users hardware is borked, they're going to have a bad outside the game too, and point them to one of the many guides around this topic.

I don't think engineering effort should ever be put into handling literal bad hardware. But, the user would probably love you for letting them know how to fix all the crashing they have while they use their broken computer!

To counter that, we're LONG overdue for ECC in all consumer systems.

andai · 8 days ago

That's an interesting idea. How might you implement that? Like RAID but on the level of variables? Maybe the one valid use case for getters/setters? :)

Deleted Comment

just_testing · 8 days ago

I loved reading your comment and got curious: how he detected the bitflips?

mayama · 8 days ago

It looks like computing math heavy process with known answer, like 301st prime, and comparing the result.

General memory testing programs like memtest86 or memtester sets random bits into memory and verify it.

Salgat · 8 days ago

Mike is such a legend.

yownie · 7 days ago

this exactly the type of stories I come to HN to read, thanks!

rurban · 7 days ago

I hate HW soo much. To revise the biggest problems in computing, beside out of tokens: HW bugs

> In other words up to 10% of all the crashes Firefox users see are not software bugs, they're caused by hardware defects!

Bold claim. From my gut feeling this must be incorrect; I don't seem to get the same amount of crashes using chromium-based browsers such as thorium.

WhatsTheBigIdea · 8 days ago

Your gut may be leading you astray?

I also find that firefox crashes much more than chrome based browsers, but it is likely that chrome's superior stability is better handing of the other 90% of crashes.

If 50% of chrome crashes were due to bit flips, and bit flips effect the two browsers at basically the same rate, that would indicate that chrome experiences 1/5th the total crashes of firefox... even though the bit flip crashes happen at the same rate on both browsers.

It would have been better news for firefox if the number of crashes due to faulty hardware were actually much higher! These numbers indicate the vast majority of firefox crashes are actually from buggy software : (

chrismorgan · 7 days ago

I run Firefox Nightly, and occasionally a little Chromium stable. Both are running under Wayland, which I believe is still not considered stable in either. In the last year of Firefox, I had one full crash (the first in maybe three years), and about four tab crashes. Plus duplicates from deliberately reproducing issues. All but one (which I’m not certain about) were Nightly-only, fixed long before reaching stable. Were I running stable, I suspect I would not have had more than three crashes of any kind in the past five years.

I can’t say the same for Chromium. Despite barely using it, I had at least one tab or iframe crash last year, and there’s a moderate chance (I’ll suggest 15%) on any given day of leaving it open that it will just spontaneously die while I’m not paying attention to it (my wild guess, based on observations about Inkscape if it’s executing something CPU-bound for too long: it’s not responding in a timely fashion to the compositor, and is either getting killed or killing itself, not sure which that would be).

Frankly, from a crashing perspective, both are very reliable these days. Chromium is still far more prone to misrendering and other misbehaviour—they prefer to ship half-baked implementations and fix them later; Firefox, on the other hand, moves slower but has fewer issues in what they do ship.

LM358 · 8 days ago

10% of crashes does not imply 10% of your crashes.

michaelcampbell · 5 days ago

Doesn't mean 10% of any crashes either; the OP narrowed it down to roughly 1 in 20, then pulled a factor of 2 out of, well, thin air to get 10%.

BeetleB · 8 days ago

Are people getting so many FF crashes? Mine rarely does. I leave it running, opening and closing tabs, for weeks on end.

tbossanova · 8 days ago

Same, been using it for over 20 years and probably only a handful of crashes in that time. But I mostly look at dead simple web stuff (like hn) and run aggressive ad blocking so I might not be representative of the average user

mft_ · 8 days ago

I run FF on Mac laptop, Windows/Linux laptop, and Windows desktop and can’t remember it crashing in years.

zuminator · 8 days ago

Naively, the more stable a piece of software is, the more likely that its failures can be attributed to hardware error.

AngryData · 8 days ago

Its pretty stable for me, except it has some memory leaks. Generally I gotta leave heavy pages open for days at a time to notice, but if I don't close it entirely for over a week or two it will start to chug and crash.

samus · 7 days ago

It really depends on what you're doing with your hardware. Overclocking, overheating, unstable power supply, and things like that increase the likelihood of memory bitflips.

magicalhippo · 8 days ago

Slack caused frequent FF crashes, until I realized Slack has (had?) a live leak. Added an extension which force-reloads the Slack page every 15 minutes and that stopped the crashing.

Macha · 8 days ago

The only browser I’ve crashed in the last decade is mobile safari, and that’s probably because it runs out of memory

intrasight · 8 days ago

Months in my case. But I have ECC. Every five years I build a new development workstation and I always have ECC.

socalgal2 · 8 days ago

Does "Weeks on end" = 4? Or do you not take the latest update every 4 weeks?

shakna · 8 days ago

How many DRM-heavy websites do you use? Widevine is a buggy thing.

endemic · 8 days ago

macOS crashes more than Firefox for me.

fooker · 8 days ago

Yes

bsder · 8 days ago

> Bold claim. From my gut feeling this must be incorrect

RAM flips are common. This kind of thing is old and has likely gotten worse.

IBM had data on this. DEC had data on this. Amazon/Google/Microsoft almost certainly had data on this. Anybody who runs a fleet of computers gets data on this, and it is always eye opening how common it is.

ZFS is really good at spotting RAM flips.

bichiliad · 8 days ago

I think they claim that if your computer has bad hardware, you're probably sending a lot of _additional_ crashes to their telemetry system. Your hardware might be working just fine, but the guy next to you might be sending 30% more crashes.

saati · 8 days ago

I haven't seen a single firefox or chrome crash in months now, you should really stress-test your hardware.

galangalalgol · 8 days ago

I can't recall a single Firefox crash in at least a decade. What are people doing? I run ublock origin, nothing else. I do sometimes have Firefox mobile misbehave where it stops loading new pages and I jave to restart it, but open pages work normally as do all other operations, so not a crash exactly. Happens maybe once a month

Edit: more context, I power cycle at least once a week on desktop and the version is typically a bit behind new. I also don't have more tabs open than will fit in the row. All these habits seem likely to decrease crashes.

guenthert · 6 days ago

And there's an app for that, aptly named stressapptest (originally developed by google). In the (now distant) past, I found it to be much more efficient (in terms of runtime until fault detected) and effective in finding memory related (RAM chips or memory controller) defects than memtest.

p-t · 8 days ago

firefox crashes... decently often for me, but it's usually pretty clear what the cause is [having a bunch of other programs open]. every time i can recall my computer bluescreening [in the last year~, since that's how long ive had it] it was because of firefox tho.

this may have something to do with the fact that my laptop is from 2017, however.

nimih · 8 days ago

> Bold claim.

I agree. Good thing he doesn't back up his claim with any sort of evidence or reasoned argument, or you'd look like a huge moron!

crazygringo · 8 days ago

To be fair, he doesn't really:

> And because it's a conservative heuristic we're underestimating the real number, it's probably going to be at least twice as much.

The actual measurement is 5%. The 10% figure is entirely made up, with zero evidence or reasoned argument except a hand-wavy "conservative".

Edit: actually, the claim is even less supported:

> out of these ~25000 crashes have been detected as having a potential bit-flip. That's one crash every twenty potentially caused by bad/flaky memory

"Potential" is a weasel word here. We don't see any of the actual methodology. For all we know, the real value could be 0.1% or 0.01%.

shakna · 8 days ago

Chromium has better handling for bitflip errors. Mostly due to the Discardable buffers they make such extensive use of.

The hardware bugs are there. They're just handled.

saagarjha · 7 days ago

By what?

hedora · 8 days ago

I've had zero crashes in safari, ff or chrome in recent memory (except maybe OOMs). (Though I don't use Windows, so maybe that's part of the reason stuff just works?)

Perhaps you're part of the group driving hardware crashes up to 10% and need to fix your machine.

sgt · 7 days ago

I think most of it is just bad hardware, not specifically the RAM. Been using non-ECC desktop and laptop hardware for decades and I can't remember the machine crashing for .. I don't know, but a LONG time.

sfink · 7 days ago

There's a very good chance your system also does not have flaky memory. Most don't. You're not contradicting the post.

Zambyte · 8 days ago

What do you mean "the same amount"? If your browser never crashes, 10% of zero is zero.

pizza234 · 8 days ago

>> In other words up to 10% of all the crashes Firefox users see are not software bugs, they're caused by hardware defects!

> Bold claim. From my gut feeling this must be incorrect; I don't seem to get the same amount of crashes using chromium-based browsers such as thorium.

That's a misinterpretation. The finding refers to the composition of crashes, not the overall crash rate (which is not reported by the post). Brought to the extreme, there may have been 10 (reported) crashes in history of Firefox, and 1 due to faulty hardware, and the statement would still be correct.

estimator7292 · 8 days ago

He addresses this in the thread.

phyzome · 8 days ago

...normally browsers don't crash at all. Something's wrong with your computer.

maxerickson · 8 days ago

I mean, I've had quite some number of crashes that I can't correlate to anything.

Hardware problems are just as good a potential explanation for those as anything else.

cellular · 8 days ago

Maybe if Firefox tabs weren't such a memory hog it would be only 0.005% !

KennyBlanken · 8 days ago

"Software engineer thinks everyone's hardware is broken, couldn't possibly be bugs in his code" sums it up about right.