For a few months now I saw a huge improvement on Linux regarding the memory management of Firefox. Previously I had to run Firefox in a separate cgroup to limit its memory usage, because it could easily deplete my whole RAM. And if I did close most of my tabs it did not release the memory back to the system. Now I never actually get to the limit I've set before and also with Auto Tab Discard extension it is well managed. So kudos to the team for such improvements.
In a nutshell we're directing the OOM killer towards less interesting processes within Firefox so that when the system is low on memory they'll get killed first. This not only makes it more stable overall but it plays nicer with other applications too. Web pages that leak memory in the background are particularly likely to be killed by this mechanism and that alone is a huge improvement in overall stability.
I was very excited about this improvement (it was covered on phoronix IIRC), but my systems still ends up thrashing when Firefox grows too big, unfortunately.
I usually keep an eye on the memory usage I have displayed in the taskbar, and have sysrq activated in case. I tried multiple things, including avoiding swapping to disk (only using zram), and Zen kernels. I'll have to see if I use MGLRU.
Does this OOM killer direction respect cgroup memory limits? Back when I was using Firefox on a low memory system, I ran it in a limited cgroup, but features like `browser.tabs.unloadOnLowMemory` wouldn't unload tabs on low cgroup memory, but only on low total system memory.
I had abandoned Firefox entirely on MacOS until sometime I decided to try again and it was no longer attempting to claim an entire 32GB memory for itself.
So, I am back as a happy user :)
> and also with Auto Tab Discard extension it is well managed
Have you compared it with the default Hibernation (Tab Unloading) recently? I don't use any extensions apart from UBlock Origin since I lost trust with extensions after seeing a recommended FF extension promoting scam.
When compared to Vivaldi's default hiberation(Where I have to manually trigger bg hibernation every now and then), FF's hibernation seems to do its thing (about:unload) quite well.
Firefox stability is funny ... I was at Mozilla for 10+ years and used Nightly as my daily driver on my work Mac. I don't think I got more than one or two dozen crashes in total. A crash was a special occasion and I would walk to someone's desk to show an explore. It barely every happened. On Nightly. So much love for all the stability work that is happening.
In personal use Chrome and Firefox are both pretty stable for me. But I also write and maintain intranet web sites, and the fleet of devices that use them. In that fleet without a doubt Chrome's auto updates have broken more things more regularly than Firefox. Actually, I can't recall Mozilla releasing a version of Firefox stable with a bug that effected us. Chrome has had many buggy releases that has sent us scrambling.
I've also never seen the memory issues described here. But then I don't leave a lot of tabs open: it's rare my tabs don't fit in the title bar. I think that pattern is true for most people: almost all our fleet had very few tabs open. So perhaps the fix, while welcome, doesn't effect most people.
Personal experiences are funny :) I've been using Firefox on Windows, Linux and Mac for as long as I can remember (first version was maybe Firefox 3 I think? The version that introduced being able to resume downloads I think), and it's the Mac version has always been the most unstable and worst performing one for me, while Linux one being the most stable and performant. I can't even remember the last time the Linux version crashed, while I had the Mac version crash last week. Windows one seems stable, but not as performant as the Linux version.
Comparing to games isn't particularly great since games are ultimately low stakes and companies have learnt that gamers will tolerate pretty much any bug that doesn't literally stop the game from working. See: Bethesda games.
I still remember a few years ago, I had a buggy network driver, that on rare occasion, entered in a rapid memorky leak, eating all my ram, most programs crashed, except firefox, GG.
macOS has awesome memory compression and overcommit -- it's no wonder Firefox never crashed for you. I can't speak for my own experience because I rarely ever used Firefox on my Mac, but the TL;DR is that even if an application has a pretty egregious memory leak, you will probably never even notice, unless that memory actually gets filled up with data.
I can't remember the last time Firefox crashed and I've used it daily on Windows since ... the beginning. Are most issues related to stability more common to Linux/MacOS?
My current instability isn't crashes but with firefox is that it will just randomly "lock up". It will still respond to a click, but will take 10's to 20's seconds to respond. I have to kill and restart firefox to get it working again, but on startup it will usually take on the order of a minute or two to start actually working.
Also, more sites just don't care about having their stuff work in firefox. I use chromium for roll20, because there is just a lot of things that are just broken on firefox.
I see the exact same thing. It’s frustrating because CPU, disk, memory usage, etc are all super low. The machine is idle and there I sit waiting for the first web page to load.
I use it on a mac. The only time I restart the browser are to install a browser update (I'm no the Beta channel) or when my laptop restarts. Crashes are very rare for me.
Here's the thing: they're not! The reason those users where crashing was because something, somewhere (possibly in the graphics stack) was reserving ton of space without using it. We had crashes on file with 20+ GiB of free physical memory which was the reason why I started looking into why a machine with so much free memory could suffer a crash.
I hope I described how this whole things work because Windows memory management is not well known and some things about it are counter-intuitive; especially if you're coming from Linux.
For me, Firefox slows down dramatically as it uses more memory (e.g. more tabs and windows opened), and reaches a less-than-usable state well before it has exhausted the available memory. Tab discarding - automatic or by literally closing the tabs - does seem to recover the memory used, but does not recover much of the performance (if any). My AMD 5800 and tens of gigs of memory runs into the same crushing performance blockers as my old 8GB FX-8300 machine, with virtually the same "workload" and usage profile.
Kinda the opposite of what I'd want; I usually have over 16GB more that Firefox could use if it needed, and that's once it has reached critical mass with maybe hundreds of tabs.
Its memory measurements usually look sane, so I feel like there's some data structure or algorithm that is doing something insane in the background - which is already confirmed to be the case with the History menu, particularly if you select and delete thousands of items at a time.
You don't need a lot to eventually get a crash on Firefox. All you really need is to hibernate/sleep instead of restarting, and so never restart your Firefox, and have some Twitch stream opened - the memory leaks will eventually use up all your memory, in my experience at least.
Firefox has been solid for me since the dawn of time, and I have or currently run it on Windows, Mac, Linux, and FreeBSD.
For a while it used to be kind of slow on Mac and Linux, but I think that was slow graphics calls, which points to a possible issue with the graphics driver I was using. But the last time I checked (many months ago), it was much better.
I recall a particularly strange Firefox bug on Windows where the browser would die with an out of memory error even though there were upwards of 16 GB free to use. As it turns out something was consuming large amounts of swap on the system, and this was where the out of memory condition was happening.
I had to switch from FF to Brave on macOS because FF constantly kept over utilizing the CPU, leading to bad battery life and warmer MBP. This happened on Intel and M1 chips.
Over the last year FF has gotten a lot better about not keeping my MacBook awake because of content on some random background tab that the browser thinks is multimedia content.
Circa 2021 if I wanted my MacBook to go to sleep I had 5o shut down FF first.
I never had it crash on Linux. The only issue is some sites redirect you to a page saying “you must use chrome to proceed,” which is ridiculously lazy of them
This is also a good example of the benefit of telemetry: that they have crash numbers coming back from the field lets them tell that this really did work in practice and get a sense of how much of the problem they've solved.
I would like Linux distributions to ship a system wide telemetry service that can be enabled / disabled at the installation time or anytime later on.
This service would be guaranteed to be unidirectional, would store data publicly on non-profit-run servers and domains and fully comply with GDPR (by not storing any PII and ano/pseudonymising everything).
Developers would connect to this service over dbus and consume the uploaded data in daily batches.
Hosting and hardware fees would come from donations by distributions and other organizations distributing money to the FLOSS ecosystem.
Data also needs to not be shipped to a third-party (e.g. Google) to be correlated with other activity outside the app sending the telemetry. There's likely lots of data going to Google Analytics that the software/service owner never looks at, but Google uses for their own purposes.
Couldn't crash reports be separated from other telemetry data, possibly with a dialog letting the user whether to send a crash report or not? IIRC, the dialog used to actually exist in older Firefox versions. I find the amount of data they collect[1] to be borderline creepy.
The crash reports at https://crash-stats.mozilla.org are a separate opt in bit of telemetry which is a dialog that is shown when Firefox crashes. You can opt into automatically sending them by setting browser.crashReports.unsubmittedCheck.autoSubmit2 to true. It can be true if you opted into a dialog about submitting unsubmitted crash reports.
I get the saying. But in this case, Mozilla gets most of its money from search royalties, primarily from Google. We are Google's product, not Firefox's.
Purely technical telemetry like this is indeed useful. The problem comes when telemetry is used to justify deleting useful features such as compact mode.
The "make it hard to find" to "nobody uses it" to "let's delete it" pipeline is very real. Reminds me of the "defund it" -> "it does not work" -> "let's privatize it" pipeline in right-wing governments.
My personal objective with most situations is to discourage other people from enabling telemetry and then enabling it myself.
As a larger piece of the visible audience, I then hope that more attention is given me. This is especially important for open source projects. And I don't care that much about what the company is getting from me.
But listen, they collect all sorts of stuff and you should disable it unless you understand it. Ideally, privacy laws expand to the point where you need to email a signature saying you understand before you opt in to telemetry. Informed consent is required for any reasonable study.
Telemetry just means any data about operations that are sent back to the mothership. Crash logs are just as much telemetry as a click event log. Equating any telemetry=spying is a knee jerk reaction to tech companies abusing telemetry.
> This is also a good example of the benefit of telemetry:
The benefit can be claimed only if the user consented into their private information being shared with the browser vendor in the first place. With most browser telemetry that is not the case and browser is simply not respecting users' privacy. The right to privacy, as a human right, trumps the 'right' to have the product 'improved'.
Otherwise we can find "benefit" in everything. One of the benefits of hell, for example, is that it is never cold.
If Firefox was selling a physical product in a retail store, they would be able to watch you walk around the store on CCTV, see you avoided an aisle because there is a polar bear lurking, and then remove the polar bear.
But since the product is digital they just have to give it away blind? Never knowing if people even use the features or not?
The author left off the part where it was invented by a mother and how dentists hate it! /s
I suppose it technically improves stability, but the cause seems like a flaw in the Windows operating system, if I'm understanding correctly.
> Stalling the main process led to a smaller increase in tab crashes – which are also unpleasant for the user even if not nearly as annoying as a full browser crash – so we’re cutting those down too.
I wonder if this has anything to do with my recent experience with Firefox Nightly on macOS. In the last few weeks I started to experience spontaneous tab crashes that couldn't be explained by anything from what I could tell. I could open a new tab with the exact same page and it would manage to not crash. Then I noticed Firefox would sometimes prevent me from viewing tabs until I restarted to update the software. Haven't seen it in the last few days, but it was incredibly frustrating. IMO, Firefox should make a best effort to render a page and not have its core functionality stop working entirely until updating.
> I suppose it technically improves stability, but the cause seems like a flaw in the Windows operating system, if I'm understanding correctly.
It's not a flaw at all, when you understand what is going on. Part of the issue is that in 2022 so many developers come from Linux backgrounds that they assume that the Linux way of doing things is the "normal" or "correct" way.
The NT kernel does not overcommit, and thus does not have an OOM killer. If the kernel cannot commit pages, the system call fails. That's it. No process terminations.
Firefox would crash because (even on Windows) it uses a customized version of jemalloc that is configured to be infallible by default. If jemalloc requests pages and those pages cannot be committed, the heap allocation will fail, and thus Firefox will self-terminate. That's simply a policy decision on Firefox's part.
Going back to Windows: suppose that the commit request failed because the swap file was full. Assuming that Windows was configured to automatically resize the swap file, the OS will then grow the swap file. That's why pausing for a bit and then retrying the VM allocation works: the swap file was grown, and now the pages can be committed.
The Linux design to allow for overcommit is weird to me. Why would you not want to be up-front about the fact that you don't have enough memory available to hand over to a process? The process requested a certain amount of memory to be allocated so surely it expects to use it, no?
Operating in a low memory environment is inherently about tradeoffs.
Where is the blame?
Maybe Firefox is too bloated or memory-inefficient. Maybe Mozilla didn't understand Windows's memory management strategy until now.
Or maybe Windows is too
bloated and memory-inefficient. Or maybe the memory management tradeoffs were suboptimal.
Or maybe nobody is to blame, and they are taking advantage of something in a novel way that allows them to squeeze more juice out of the same fruit than others.
> Maybe Mozilla didn't understand Windows's memory management strategy until now.
That is part of it. A lot of FLOSS engineers come from a Linux background and tend to assume that the Linux way of doing things is the "normal" way. While I was there I had to explain to more than one developer that Windows doesn't have an OOM killer because the NT kernel doesn't overcommit.
Not sure, but sound stopped working in VLC about one year ago and was restored this month at the same time that dropping files stopped working. Every time after a Windows update and with the same VLC version.
I don't think Windows is too bloated or memory-inefficient, but I do believe that they're abusing the AV and "monitoring" stuff. It's a cat & mouse game, with me trying to disable crap and they re-enabling it with every update or simply making impossible to disable annoyances.
Also I suspect they're trying to "fix" drivers that worked before the fixes.
If the swap were allocated to the whole HD that wasn't used for actual files, then this hack wouldn't work.
If the swap were leaner that it already is, this hack would be necessary in every program.
If I had to point a finger at who is to blame, it's the Windows swap allocation team. Some combination of predictive analytics or even just a saner ratio of swap to free HD for an incoming file would fix this problem for most users most of the time.
But computers are hard and people want to keep them running for days on end. I get that memory just slowly, slowly gets eaten up by all the zombie procs out there.
That update behavior where ff is still running and existing tabs mostly keep working, but new tabs don't, has been a thing for years and everyone hates it, not just you or me.
I thought it was possibly related to ff on ubuntu switching to being a snap by default (even though I thought I had forced my system to have no snaps and no snapd and added a special ppa for ff) and said something in a comment on hn, and several people clued me in it's way older than that and I'm not the only one who hates it.
It's like ff devs don't actually use browsers, which is crazy of course. But, they really are ok with always having to blow away everything ypu have going at any random time middle of the day? (it's always someone's middle of their day or stretch of involved work)
They never have tabs open with partially filled forms or search results or web apps that "restore tabs" won't restore the way they were? Or this just doesn't bother them?
It feels like a case of "you're holding it wrong", as in the user should shape their usage pattern around ff's update strategy, like, always do and apt upgrade before sitting down, and never after starting to work, and if you leave tabs and work open over night, well I guess just don't do that?
> But, they really are ok with always having to blow away everything ypu have going at any random time middle of the day?
Y'all don't get your tabs restored when you restart your browser?
For me, the restart experience pre-snap was very easy - close, re-open, and you're right back. Most 'serious' webapps will happily save your drafts if, for whatever reason, you don't want to finish and send that half-composed slack message before restarting.
The weird update behaviour is because file got replaced while Firefox is running. On Windows and macOS the updates happen on next start (whenever you choose that to be) so it's not an issue; on Linux updates are handled by your system package manager, so they couldn't line it up as nicely.
Of course that also means you could end up having queued but not applied updates for a long time on Windows and macOS…
You can avoid this issue (which pretty much only ever happens on Linux, mind you) by installing the package directly from their website instead of from your distro's package manager. If you'd like to help improve stability or use try features before they're fully stable, try the beta, dev, or nightly channels.
I run Arch and upgrade ~weekly, but I am very rarely inconvenienced by this restart behavior. Restore Tabs works pretty well on the modern web, and since old tabs still work, you can always complete whatever outstanding forms you have first.
I've been a linux/osx user for 20+ years so I'm not familiar with Windows memory management. It'd be interesting to know why MS chose this approach and if it has any benefits? Why not just let userspace request whatever it wants?
Well, over-committing is an arguable choice. "Here's the memory you requested! I don't know if I have it available, but anyway. Here it is!"
It turns out it probably works well/better in many cases, because apps don't actually use all the memory they request (up front), and for other reasons, but it's not the obvious choice to make. I would intuitively expect my OS to fail malloc if it does not have any enough memory available if I didn't know better.
I would expect an OS capable of expending its swap file to try doing it before failing my malloc call though.
Yeah, I was pretty incredulous when I first discovered over-commit in Linux - I asked for memory and you gave it to me without an error, and now that I am deep in the guts of processing and can't easily recover, now you decide to tell me you don't really have it!
But once you know about over-commit there are workarounds in languages where you control memory allocation, like touching every page right after you allocate it, but before using it. And in a garbage collected language you don't have any control or insight into when OOM exceptions will occur in either approach. So the ability for the OS to not trust lazy and greedy software that asks for memory but doesn't use it seems like a reasonable trade-off.
The Windows approach let you fail at a predictable place failing at the time of memory allocation. The overcommit approach causes OOM crashing at random places depending on how your program touches memory.
But then we end up with the OOM Killer, which is awful - randomly kill the biggest process because "reasons".... It would be better if the OS could say no.
One thing it is worth noting is that the user experience on Windows of running out of memory is a lot better than on a Linux desktop environment. While things usually slow down due to a lot of swapping, the main UI continues to be functional enough to allow you to carry on using it and close things.
Work is being done these days to improve the situation on Linux, but the default experience can be pretty painful. I was using Android Studio on a Fedora machine with on 8Gb of RAM, and sometimes the whole system would completely freeze for 10s of seconds at a time. This is not fun.
OTOH the mere existence of committed memory makes it hellish as a user when *something* leaks committed memory. You start getting out of memory errors while half of your RAM is empty, just because some half-assed driver somewhere leaks. To add insult to injury, task manager / resource monitor is always unable to show the exact process/driver that leaks; I had to randomly kill things to find the culprit.
I'll take the linux behavior any time when dealing with poorly written software (which is most software).
If you're over-committing memory then you don't find out your program ran out of memory until you try to access memory you've previously "allocated". If you're not over-committing, then you'll find out your program ran out of memory on an allocation where you might be better prepared to handle it.
Windows is strictly better here because 1) it will never pretend that it has more memory than it actually does, and yet 2) it allows processes to reserve as much address space as they need without using them (which is the sole justification for overcommit), by providing the APIs to control this in a fine grained way.
The Windows approach is better from a "perfect system" point of view: in theory an application knows all the code that it is loading and has a grasp on its own memory usage. You still have virtual address space because it is used for other things too (like memory mapped files), but you "commit" your current upper limit. You can be sure that if needed you'll actually be able to malloc (well HeapAlloc) that much memory. It might be slow due to swapping but it won't fail.
The Unix approach is better from a "realistic" point of view: most processes have a lot of library code they don't control and don't have time to audit. Usage patterns vary. And most processes end up reserving more memory than they ever actually touch. Note what they mentioned in the article - 3rd party graphics drivers run code in every process on Windows and that code allocates whatever it wants that counts against your commit limit. That isn't under your control at all and worse most of the time the memory is never touched.
Having lived under both systems I think I prefer the Unix view. In practice almost no Windows software does anything useful with commit limits so it just creates extra complexity and failure modes for little benefit.
For me, there's been some kind of weird instability or resource leak that resulted in Windows Firefox getting slower all around and less stable the longer the application is running. It's been around and 100% reproducible (over an active session of a few days) for a couple of years.
The general problem used to feature some sort of bug where some window processes would completely fail to render/paint UI components - instead, rendering them as pure black. The rendering problem is gone, same with a correlated memory leak, but the complete performance slowdown that accompanied it is still there.
One day I'll submit a bug report or profiler trace or something, but I find it odd every time I see a post about stability or performance fix, it never happens to be the big one that I run into, regardless of the window device or extensions.
It makes me wonder if some users just have browsing habits that most others don't, so they hit obscure bugs more frequently. But since everyone has their own obscure habits, and thus bugs, there's a theoretical endless deluge of problems with no critical mass to justify prioritization or investigation.
One thing I can say is that it is extremely rare that Firefox actually crashes on me. The instabilities are in the behaviors of the browser, its UI and the tabs/pages themselves. It can slow to a crawl, and even hourglass on me in an extreme case (I usually get fed up and just restart the browser and all the tabs at once to fix the issue before it gets that bad) but it manages to keep itself some, somehow.
That said, I'll poke around in there next time anyways and see if anything stands out. Thanks!
In a nutshell we're directing the OOM killer towards less interesting processes within Firefox so that when the system is low on memory they'll get killed first. This not only makes it more stable overall but it plays nicer with other applications too. Web pages that leak memory in the background are particularly likely to be killed by this mechanism and that alone is a huge improvement in overall stability.
I usually keep an eye on the memory usage I have displayed in the taskbar, and have sysrq activated in case. I tried multiple things, including avoiding swapping to disk (only using zram), and Zen kernels. I'll have to see if I use MGLRU.
Dead Comment
I had abandoned Firefox entirely on MacOS until sometime I decided to try again and it was no longer attempting to claim an entire 32GB memory for itself. So, I am back as a happy user :)
Have you compared it with the default Hibernation (Tab Unloading) recently? I don't use any extensions apart from UBlock Origin since I lost trust with extensions after seeing a recommended FF extension promoting scam.
When compared to Vivaldi's default hiberation(Where I have to manually trigger bg hibernation every now and then), FF's hibernation seems to do its thing (about:unload) quite well.
Check out `about:memory` and `about:unloads`, the former has memory minimization options and the latter lets you unload background tabs from memory.
I've also never seen the memory issues described here. But then I don't leave a lot of tabs open: it's rare my tabs don't fit in the title bar. I think that pattern is true for most people: almost all our fleet had very few tabs open. So perhaps the fix, while welcome, doesn't effect most people.
But in all seriousness, I can't recall the last time FF crashed on me. My OS hard locks more often than FF crashes.
We are now in an age that think software bug is a norm. It's respectable that mozilla still keep the standard as high as the past days.
Also, more sites just don't care about having their stuff work in firefox. I use chromium for roll20, because there is just a lot of things that are just broken on firefox.
Same here. I will pay for an alternative to Roll20 that works on FF. Bonus points for open source with a hosted version.
I hope I described how this whole things work because Windows memory management is not well known and some things about it are counter-intuitive; especially if you're coming from Linux.
Kinda the opposite of what I'd want; I usually have over 16GB more that Firefox could use if it needed, and that's once it has reached critical mass with maybe hundreds of tabs.
Its memory measurements usually look sane, so I feel like there's some data structure or algorithm that is doing something insane in the background - which is already confirmed to be the case with the History menu, particularly if you select and delete thousands of items at a time.
For a while it used to be kind of slow on Mac and Linux, but I think that was slow graphics calls, which points to a possible issue with the graphics driver I was using. But the last time I checked (many months ago), it was much better.
So, you know, anyone who opens a story on Ars Technica to read later and then forgets about the tab.
I've seen YT do similar things in the past as well.
Deleted Comment
I was pretty much forced to install the Auto Tab Discard extension, which I'm guessing was built-in in Opera 12 ??
Circa 2021 if I wanted my MacBook to go to sleep I had 5o shut down FF first.
FWIW wasn't any better on windows.
This is also a good example of the benefit of telemetry: that they have crash numbers coming back from the field lets them tell that this really did work in practice and get a sense of how much of the problem they've solved.
For what it's worth, I have no issues with telemetry as long as they are opt-in and there is transparency on exactly what is collected.
It's having to opt-out (or not being to opt-out at all) and vague explanation on what and why there is telemetry that I take issues with.
Crash logs are a different beast.
This service would be guaranteed to be unidirectional, would store data publicly on non-profit-run servers and domains and fully comply with GDPR (by not storing any PII and ano/pseudonymising everything).
Developers would connect to this service over dbus and consume the uploaded data in daily batches.
Hosting and hardware fees would come from donations by distributions and other organizations distributing money to the FLOSS ecosystem.
[1] https://data.firefox.com/dashboard/user-activity
https://probes.telemetry.mozilla.org/?search=crash shows automatic telemetry probes. The main bit of data in that set is FX_CONTENT_CRASH_* and you can see the back and forth from the data steward and the engineer adding the probe. https://bugzilla.mozilla.org/show_bug.cgi?id=1269961#c8
What in that report is creepy? Surely knowing the percentage of people on 32- vs 64-bit isn't problematic. Maybe add-ons? I'm genuinely curious.
* telemetry is evil
* if the product is free (Firefox), you are the product
The "make it hard to find" to "nobody uses it" to "let's delete it" pipeline is very real. Reminds me of the "defund it" -> "it does not work" -> "let's privatize it" pipeline in right-wing governments.
As a larger piece of the visible audience, I then hope that more attention is given me. This is especially important for open source projects. And I don't care that much about what the company is getting from me.
But listen, they collect all sorts of stuff and you should disable it unless you understand it. Ideally, privacy laws expand to the point where you need to email a signature saying you understand before you opt in to telemetry. Informed consent is required for any reasonable study.
Dead Comment
The benefit can be claimed only if the user consented into their private information being shared with the browser vendor in the first place. With most browser telemetry that is not the case and browser is simply not respecting users' privacy. The right to privacy, as a human right, trumps the 'right' to have the product 'improved'.
Otherwise we can find "benefit" in everything. One of the benefits of hell, for example, is that it is never cold.
But since the product is digital they just have to give it away blind? Never knowing if people even use the features or not?
Not to mention that Firefox is open source, so you (and GDPR authorities) can check yourself what exactly is being sent...
Chromium developers hate him!
I suppose it technically improves stability, but the cause seems like a flaw in the Windows operating system, if I'm understanding correctly.
> Stalling the main process led to a smaller increase in tab crashes – which are also unpleasant for the user even if not nearly as annoying as a full browser crash – so we’re cutting those down too.
I wonder if this has anything to do with my recent experience with Firefox Nightly on macOS. In the last few weeks I started to experience spontaneous tab crashes that couldn't be explained by anything from what I could tell. I could open a new tab with the exact same page and it would manage to not crash. Then I noticed Firefox would sometimes prevent me from viewing tabs until I restarted to update the software. Haven't seen it in the last few days, but it was incredibly frustrating. IMO, Firefox should make a best effort to render a page and not have its core functionality stop working entirely until updating.
It's not a flaw at all, when you understand what is going on. Part of the issue is that in 2022 so many developers come from Linux backgrounds that they assume that the Linux way of doing things is the "normal" or "correct" way.
The NT kernel does not overcommit, and thus does not have an OOM killer. If the kernel cannot commit pages, the system call fails. That's it. No process terminations.
Firefox would crash because (even on Windows) it uses a customized version of jemalloc that is configured to be infallible by default. If jemalloc requests pages and those pages cannot be committed, the heap allocation will fail, and thus Firefox will self-terminate. That's simply a policy decision on Firefox's part.
Going back to Windows: suppose that the commit request failed because the swap file was full. Assuming that Windows was configured to automatically resize the swap file, the OS will then grow the swap file. That's why pausing for a bit and then retrying the VM allocation works: the swap file was grown, and now the pages can be committed.
Where is the blame?
Maybe Firefox is too bloated or memory-inefficient. Maybe Mozilla didn't understand Windows's memory management strategy until now.
Or maybe Windows is too bloated and memory-inefficient. Or maybe the memory management tradeoffs were suboptimal.
Or maybe nobody is to blame, and they are taking advantage of something in a novel way that allows them to squeeze more juice out of the same fruit than others.
> Maybe Mozilla didn't understand Windows's memory management strategy until now.
That is part of it. A lot of FLOSS engineers come from a Linux background and tend to assume that the Linux way of doing things is the "normal" way. While I was there I had to explain to more than one developer that Windows doesn't have an OOM killer because the NT kernel doesn't overcommit.
Not sure, but sound stopped working in VLC about one year ago and was restored this month at the same time that dropping files stopped working. Every time after a Windows update and with the same VLC version.
I don't think Windows is too bloated or memory-inefficient, but I do believe that they're abusing the AV and "monitoring" stuff. It's a cat & mouse game, with me trying to disable crap and they re-enabling it with every update or simply making impossible to disable annoyances.
Also I suspect they're trying to "fix" drivers that worked before the fixes.
If the swap were allocated to the whole HD that wasn't used for actual files, then this hack wouldn't work.
If the swap were leaner that it already is, this hack would be necessary in every program.
If I had to point a finger at who is to blame, it's the Windows swap allocation team. Some combination of predictive analytics or even just a saner ratio of swap to free HD for an incoming file would fix this problem for most users most of the time.
But computers are hard and people want to keep them running for days on end. I get that memory just slowly, slowly gets eaten up by all the zombie procs out there.
I thought it was possibly related to ff on ubuntu switching to being a snap by default (even though I thought I had forced my system to have no snaps and no snapd and added a special ppa for ff) and said something in a comment on hn, and several people clued me in it's way older than that and I'm not the only one who hates it.
It's like ff devs don't actually use browsers, which is crazy of course. But, they really are ok with always having to blow away everything ypu have going at any random time middle of the day? (it's always someone's middle of their day or stretch of involved work)
They never have tabs open with partially filled forms or search results or web apps that "restore tabs" won't restore the way they were? Or this just doesn't bother them?
It feels like a case of "you're holding it wrong", as in the user should shape their usage pattern around ff's update strategy, like, always do and apt upgrade before sitting down, and never after starting to work, and if you leave tabs and work open over night, well I guess just don't do that?
Y'all don't get your tabs restored when you restart your browser?
For me, the restart experience pre-snap was very easy - close, re-open, and you're right back. Most 'serious' webapps will happily save your drafts if, for whatever reason, you don't want to finish and send that half-composed slack message before restarting.
It got much worse with the switch to snaps.
Of course that also means you could end up having queued but not applied updates for a long time on Windows and macOS…
This is one of those many instances where Windows is doing absolutely the right thing and it's Linux that's screwed up.
Deleted Comment
It turns out it probably works well/better in many cases, because apps don't actually use all the memory they request (up front), and for other reasons, but it's not the obvious choice to make. I would intuitively expect my OS to fail malloc if it does not have any enough memory available if I didn't know better.
I would expect an OS capable of expending its swap file to try doing it before failing my malloc call though.
But once you know about over-commit there are workarounds in languages where you control memory allocation, like touching every page right after you allocate it, but before using it. And in a garbage collected language you don't have any control or insight into when OOM exceptions will occur in either approach. So the ability for the OS to not trust lazy and greedy software that asks for memory but doesn't use it seems like a reasonable trade-off.
It takes A LOT of time to expand the swap file. So failing malloc immeditately seems, to me, the right way to handle it.
Maybe adding an optional callback to malloc to be notified when further allocations are possible would be a better way to handle this.
Work is being done these days to improve the situation on Linux, but the default experience can be pretty painful. I was using Android Studio on a Fedora machine with on 8Gb of RAM, and sometimes the whole system would completely freeze for 10s of seconds at a time. This is not fun.
Some references to work on Linux: https://lwn.net/Articles/317814/https://fedoraproject.org/wiki/Changes/EnableEarlyoom
I'll take the linux behavior any time when dealing with poorly written software (which is most software).
If you're over-committing memory then you don't find out your program ran out of memory until you try to access memory you've previously "allocated". If you're not over-committing, then you'll find out your program ran out of memory on an allocation where you might be better prepared to handle it.
The Unix approach is better from a "realistic" point of view: most processes have a lot of library code they don't control and don't have time to audit. Usage patterns vary. And most processes end up reserving more memory than they ever actually touch. Note what they mentioned in the article - 3rd party graphics drivers run code in every process on Windows and that code allocates whatever it wants that counts against your commit limit. That isn't under your control at all and worse most of the time the memory is never touched.
Having lived under both systems I think I prefer the Unix view. In practice almost no Windows software does anything useful with commit limits so it just creates extra complexity and failure modes for little benefit.
The general problem used to feature some sort of bug where some window processes would completely fail to render/paint UI components - instead, rendering them as pure black. The rendering problem is gone, same with a correlated memory leak, but the complete performance slowdown that accompanied it is still there.
One day I'll submit a bug report or profiler trace or something, but I find it odd every time I see a post about stability or performance fix, it never happens to be the big one that I run into, regardless of the window device or extensions.
It makes me wonder if some users just have browsing habits that most others don't, so they hit obscure bugs more frequently. But since everyone has their own obscure habits, and thus bugs, there's a theoretical endless deluge of problems with no critical mass to justify prioritization or investigation.
That said, I'll poke around in there next time anyways and see if anything stands out. Thanks!