One trope I frequently encounter as a FreeBSD professional is "Linux is used by so many people, it doesn't have vast and sweeping bugs anymore".. the many eyes fallacy. In reality, you get bystander paradox.. everyone wants RedHat or IBM to do all the hard work and they reap all the rewards.
I find my field of systems pretty interesting.. there's so much to do and not a lot of people rushing in to do it as the heavy hitters retire or move to entrepreneurship etc. It's probably much cheaper to harden a *BSD or Illumos system in these domains, and much easier to get the results integrated into mainline. The financial and fame are much greater in doing it on Linux though.
The many eyes idea postulates many eyes looking at the code, not many eyes finding bugs. (Proprietary software has as many or more users finding bugs as open source!) In reality, the number of developers actually reading code to find bugs may not be significantly larger for complex open source like Linux, particularly in esoteric areas like filesystems or device drivers.
The results of "many eyes" seems to be basically moot (i.e. certainly not a 10x magnitude and more likely well below 2x) in the face of the cultural mores of a project. For example, Linux kernel treats userland ABI compatibility quite seriously and there haven't been many faults there. OTOH security and KPI stability are not taken seriously, and are in constant flux. I can dig up papers on the former if needed, but it is the most glaring failure of "many eyes"
What's particularly noteworthy from my perspective is the bandwagon effect toward one kernel is causing the thing that is supposed to be the ultimate best that keeps getting better (Linux kernel) to lose focus, investment (new talent and money), maybe even quality due to decreased competition. Systems software is starting to look a lot more like Electrical Engineering as a field than the greater software ecosystem. Mostly done by large, central organizations. That makes me sad.
One big problem is how many users actually translate into extra eyes contributing to the codebase. Linux has a huge number of users but how many actually do more than find a workaround for a bug, much less contribute a patch back.
It definitely happens but in my experience a lot of people think that’s too much work and assume someone else will do it even if they don’t. It’s been far more common for someone to try hoping it doesn’t happen again, disabling features blindly, etc.
There's a psychological principle called Diffusion of Responsibility that I think applies here. But this is my rusty memory from undergrad around a decade ago.
> This could happen if there’s disk corruption, a transient read failure, or a transient write failure caused bad data to be written.
Wouldn't transient I/O error get corrected by block driver itself by repeating the request a few times? Does a filesystem driver assume any error reported by storage device is permanent?
> The 2017 tests above used an 8k file where the first block that contained file data either returned an error at the block device level or was corrupted, depending on the test.
Are there any tests on how filesystems deal with metadata corruption or I/O error? Apart from NTFS "self-healing" introduced in Windows 7/2008, do modern filesystem drivers attempt to correct broken metadata completely on the kernel side, or just give up to avoid making things worse?
> Wouldn't transient I/O error get corrected by block driver itself by repeating the request a few times?
I am surprised at the seeming assumption here that errors are always transient and you shouldn't worry because the retry will fix it. Sure that probably won't hurt, but it won't address the case where retries fail.
As usual, another example of the Linux echo chamber. It's just one constant inward gaze with that crowd. They learn nothing from anyone, only trial and error.
I don't know that I'd agree. As much as llvm+clang inspired gcc to improve, ZFS seems like a big part of the inspiration for btrfs. I'm not sure that it hits the whole ZFS feature set, but it's probably a step in that direction.
Something to keep in mind is that by default writes go through the page cache. By the time I/O is initiated for a write(2), the application process can have exited.
This is why calling fsync(fd) before closing the file and exiting is a good idea if you need that kind of error to be handled. You should get it as a return of fsync if it happens after the write.
> It’s a common conception that SSDs are less likely to return bad data than rotational disks, but when Google studied this across their drives, they found:
>> The annual replacement rates of hard disk drives have previously been reported to be 2-9% [19,20], which is high compared to the 4-10% of flash drives we see being replaced in a 4 year period. However, flash drives are less attractive when it comes to their error rates. More than 20% of flash drives develop uncorrectable errors in a four year period, 30-80% develop bad blocks and 2-7% of them develop bad chips. In comparison, previous work [1] on HDDs reports that only 3.5% of disks in a large population developed bad sectors in a 32 months period – a low number when taking into account that the number of sectors on a hard disk is orders of magnitudes larger than the number of either blocks or chips on a solid state drive, and that sectors are smaller than blocks, so a failure is less severe.
Unfortunately "return bad data" is slightly ambiguous. IIRC these failures that are reported are ones where the drive itself claims that it discovered invalid blocks which were detected by error detection algorithms. The rate at which drives "return bad data" is likely only to be discovered by a well-controlled test that writes specific data to specific locations, or data patterns, then looks to see whether that data was preserved. This could tell us about data that was corrupted and escaped detection. Google's aggregate disk usage stats are a dump of the devices and their service activity with application-specific load, not the well-controlled test required for detection of the "returned bad data" symptom.
It's possible that you could scour application logs and try to attribute application failures to corrupt data from a disk, but it would be difficult to isolate.
We see a lot of SSD errors that turn out to be problems with RAID controllers. The lost drive rate dropped quite a bit when we started using OS-level RAID functionality rather than whatever the heck is in the firmware of Promise / LSI / Adaptec / etc. cards, and we were able to return a bunch of drives to service after we started using those cards in a simple pass through mode.
Raise your hand if you've lost data that was RAIDed eight ways to Sunday because of a RAID controller failure that turned out to be unrecoverable because of crappy decisions in the firmware, or rebuild times equal to the number of days it took to order, rack and configure a competitor's product.
It could have been that the paper was about errors detected by checksums at the level of Google's distributed storage engine, but that's not the case.
This means that Apple could in fact be using entirely standard SSDs while being correct in their claim that they don't need checksums. Their SSDs may sometimes bubble "uncorrectable errors" up to them, but like the article says, redundancy doesn't help you avoid those on SSDs.
I find my field of systems pretty interesting.. there's so much to do and not a lot of people rushing in to do it as the heavy hitters retire or move to entrepreneurship etc. It's probably much cheaper to harden a *BSD or Illumos system in these domains, and much easier to get the results integrated into mainline. The financial and fame are much greater in doing it on Linux though.
What do you mean by this? Do more eyes not find and point out more problems? Is the effect offset by some counteracting mechanism?
The results of "many eyes" seems to be basically moot (i.e. certainly not a 10x magnitude and more likely well below 2x) in the face of the cultural mores of a project. For example, Linux kernel treats userland ABI compatibility quite seriously and there haven't been many faults there. OTOH security and KPI stability are not taken seriously, and are in constant flux. I can dig up papers on the former if needed, but it is the most glaring failure of "many eyes"
What's particularly noteworthy from my perspective is the bandwagon effect toward one kernel is causing the thing that is supposed to be the ultimate best that keeps getting better (Linux kernel) to lose focus, investment (new talent and money), maybe even quality due to decreased competition. Systems software is starting to look a lot more like Electrical Engineering as a field than the greater software ecosystem. Mostly done by large, central organizations. That makes me sad.
It definitely happens but in my experience a lot of people think that’s too much work and assume someone else will do it even if they don’t. It’s been far more common for someone to try hoping it doesn’t happen again, disabling features blindly, etc.
When it comes to security, there may be more eyes, in which case the fallacy is to assume that the preponderance of them are well-intentioned.
Here we have empirical evidence of problems being overlooked, yet you would prefer to put faith in a comforting aphorism?
Wouldn't transient I/O error get corrected by block driver itself by repeating the request a few times? Does a filesystem driver assume any error reported by storage device is permanent?
> The 2017 tests above used an 8k file where the first block that contained file data either returned an error at the block device level or was corrupted, depending on the test.
Are there any tests on how filesystems deal with metadata corruption or I/O error? Apart from NTFS "self-healing" introduced in Windows 7/2008, do modern filesystem drivers attempt to correct broken metadata completely on the kernel side, or just give up to avoid making things worse?
I am surprised at the seeming assumption here that errors are always transient and you shouldn't worry because the retry will fix it. Sure that probably won't hurt, but it won't address the case where retries fail.
[0] https://github.com/zfsonlinux/zfs/issues/1256#issuecomment-1...
Is this the state of affairs after the improvements described at https://lwn.net/Articles/724307/ have gone in?
> All tests were run on both ubuntu 17.04, 4.10.0-37, as well as on arch, 4.12.8-2. We got the same results on both machines.
That LWN article was from June 2017 and 4.10 was released in February, so I think that means these tests predate any of the changes it discusses.
>> The annual replacement rates of hard disk drives have previously been reported to be 2-9% [19,20], which is high compared to the 4-10% of flash drives we see being replaced in a 4 year period. However, flash drives are less attractive when it comes to their error rates. More than 20% of flash drives develop uncorrectable errors in a four year period, 30-80% develop bad blocks and 2-7% of them develop bad chips. In comparison, previous work [1] on HDDs reports that only 3.5% of disks in a large population developed bad sectors in a 32 months period – a low number when taking into account that the number of sectors on a hard disk is orders of magnitudes larger than the number of either blocks or chips on a solid state drive, and that sectors are smaller than blocks, so a failure is less severe.
Unfortunately "return bad data" is slightly ambiguous. IIRC these failures that are reported are ones where the drive itself claims that it discovered invalid blocks which were detected by error detection algorithms. The rate at which drives "return bad data" is likely only to be discovered by a well-controlled test that writes specific data to specific locations, or data patterns, then looks to see whether that data was preserved. This could tell us about data that was corrupted and escaped detection. Google's aggregate disk usage stats are a dump of the devices and their service activity with application-specific load, not the well-controlled test required for detection of the "returned bad data" symptom.
It's possible that you could scour application logs and try to attribute application failures to corrupt data from a disk, but it would be difficult to isolate.
Raise your hand if you've lost data that was RAIDed eight ways to Sunday because of a RAID controller failure that turned out to be unrecoverable because of crappy decisions in the firmware, or rebuild times equal to the number of days it took to order, rack and configure a competitor's product.
It could have been that the paper was about errors detected by checksums at the level of Google's distributed storage engine, but that's not the case.
This means that Apple could in fact be using entirely standard SSDs while being correct in their claim that they don't need checksums. Their SSDs may sometimes bubble "uncorrectable errors" up to them, but like the article says, redundancy doesn't help you avoid those on SSDs.
Deleted Comment