As a long time sponsor of Kent's on Patreon - $10 a month since 2018, $790 total - I've found this bcachefs debacle really depressing.
I'm not even a bcachefs user, but I use ZFS extensively and I _really_ wanted Linux to get a native, modern COW filesystem that was unencumbered by the crappy corporate baggage that ZFS has.
In the comments on HN around any bcachefs news (including this one) there are always a couple throwaway accounts bleating the same arguments - sounding like the victim - that Kent frequently uses.
To Kent, if you're reading this:
From a long time (and now former) sponsor: if these posts are actually from you, please stop.
Also, it's time for introspection and to think how you could have handled this situation better, to avoid having disappointed those who have sponsored you financially for years. Yes, there are some difficult and flawed people maintaining the kernel, not least of which Linus himself, but you knew that when you started.
I hope bcachefs will have a bright future, but the ball is very clearly in your court. This is your problem to fix.
(I'm Daniel Wilson, subscription started 9th August 2018, last payment 1st Feb 2025)
I am also a Patreon supporter, and I intentionally didn't switch to bcachefs until it was merged into the kernel. After all, Linus would never break userspace right?
I am also frustrated by this whole debacle, I'm not going to stop funding him though Bcachefs is a solid alternative to btrfs. It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
I really wish, though that DKMS was not such a terrible a solution. It _will_ break my boot, because it always breaks my boot. The Linux kernel really needs a stable module API so that out-of-tree modules like bcachefs are not impossible to reliably boot with.
I also waited until bcachefs was in the mainline. And I have been loving it. In fact, I even have multiple systems using it as root. Rock solid.
DMKS is not going to work for me though. Some of the distros I use do not even support it. Chimera Linux uses ckms.
As for how we got here, It was not just one event. Kent repeatedly ignored the merge window, submitting changes too late. This irked Linux. When Linus complained, Kent attacked him. And Kent constantly ran down the LKML and kernel devs by name (especially the btrfs guys). This burned a lot of bridges. When Linus pushed Kent out, many people rushed to make it permanent instead of rushing to his defense. Kent lost my support in the final weeks by constantly shouting that he was a champion for his users while doing the exact opposite of what I wanted him to do. I want bcachefs in the kernel. Kent worked very hard to get it pushed out.
>It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
This isn't just a one time thing, speaking as someone who follows the kernel, apparently this has been going on pretty much since bcachefs first tried to get into Linus's tree. Kent even once told another kernel maintainer to "get your head examined" and was rewarded with a temporary ban.
Edit: To be fair, the kernel is infamous for being guarded by stubborn maintainers but really I guess the lesson to be learned here is if you want your pet project to stick around in the kernel you really can't afford to be stubborn yourself.
> After all, Linus would never break userspace right?
It was marked experimental. No promises were made.
> It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
If you followed the whole saga, Kent had a history of being technically excellent but absolutely impossible to work with.
He did not follow the kernel development process over and over again, instead choosing to post long rants and personal attacks to the mailing lists. Instead of accepting that he has to adapt, he always had (still has) the attitude that he knows best, everybody else are idiots and should just accept his better judgement. Of course the rules should apply… to anyone but him, because his work is special.
Moreover, this was not a one PR thing. Linus actually had a long history of defending Kent, being patient, bending the rules, and generally putting up with him.
This is unambiguously 100% Kent’s fault, which is very sad, because — as mentioned before — Kent’s work is technically excellent. He had a lot of good will that he squandered in spectacular fashion.
> A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
From what I can tell, what happened is that was just a trigger for a clash of personalities between Linus and Kent, both of whom have a bit of a temper, and refused to back down, which escalated to this.
Seems to tick all of the boxes in regard to what you're looking for, and its mature enough that major linux distros are shipping with it as the default filesystem.
Because every time btrfs is mentioned, 5 more people come out of the woodwork saying that it irreparably lost all their data. Sorry but there's just too many stories for it to be mere coincidences.
Your statement is misleading. No one is using btrfs on servers. Debian and Ubuntu use ext4 by default. RHEL removed support for btrfs long ago, and it's not coming back:
> Red Hat will not be moving Btrfs to a fully supported feature. It was fully removed in Red Hat Enterprise Linux 8.
Btrfs has a "happy path" so long as you don't use any features outside of the happy path, your data will generally be fine. Outside of that, your data is less reliably fine.
Btrfs also has issues with large numbers of snapshots, you have to cull them occasionally or things begin to slow down, bcachefs does not.
> In the comments on HN around any bcachefs news (including this one) there are always a couple throwaway accounts bleating the same arguments - sounding like the victim - that Kent frequently uses.
And every time something like this comes up, I end up with every sort of accusation pointed at me, and no one seems to be willing to look at the wider picture - why is the kernel community still unable to figure out a plan to get a trustworthy modern filesystem?
> This is your problem to fix.
No, I've said from the start that this needs to be a community effort. A filesystem is too big for one person.
Be realistic :) If the community wants this to happen, the community will have to step up.
Look, if you're here saying "the throwaway comments aren't me" then I beleive you. It'd be nice if you said that clearly though.
Please please don't forget I want you to succeed - that's why I bunged nearly $800 your way in this endeavour - but I'm not the only person who thinks you come across as completely immune to critisicm, even when it's constructive and from your supporters.
>> This is your problem to fix.
> No, I've said from the start that this needs to be a community effort. A filesystem is too big for one person.
Right now that "community effort" is looking a bit unlikely, eh ?
I would hate to have to deal with these people as my primary occupation and I totally get why you don't want to continue.
That's said, nobody else has the power, skill or inclination to make bcachefs that wonderful filesystem of the future for Linux - only you. That's what I meant by "this is your problem to fix".
I wish you the best of luck with the new DKMS direction. And I'll get on board and actually try it out soon :D
I've been using btrfs for maybe 10 years now? -- on a single Linux home NAS. I use it in a raid1c3 config (I used to do c2). raid1cN is mirroring with N copies. I have compression on. I use snapshots rarely.
I've had a few issues, but no data loss:
* Early versions of btrfs had an issue where you'd run out of metadata space (if I recall). You had to rebalance and sometimes add some temporary space do that.
* One of my filesystems wasn't optimally aligned because btrfs didn't do that automatically (or something like that -- this was a long time ago.) A very very minor issue.
* Corruption (but no data loss, so I'm not sure it's corruption per se...) during a device replacement.
This last one caused no data loss, but a lot of error messages. I started a logical device removal, removed the device physically, rebooted, and then accidentally readded the physical device while it was still removing it logically. It was not happy. I physically removed the device again, finished the logical remove, and did a scrub and the fsck equivalent. No errors.
I think that's a testament to its resiliency, but also a testament how you can shoot yourself in the foot.
I've never used RAID5/6 on btrfs and don't plan to -- partly because of the scary words around it, but I also assume the rebuild time is longer.
btrfs is good, but it's far from perfect. RAID 5 and 6 don't exactly work, it can have problems at high snapshot counts, and there's lots of even recent reports of corruption and other kinds of filesystem damage.
It feels more user friendly than ZFS, but ZFS is much more feature complete. I used to use btrfs for all my personal stuff, but honestly ext4 is just easier.
The one line "article" on lwn.net has a link to this email:
From: Kent Overstreet @ 2025-09-11 23:19 UTC
As many of you are no doubt aware, bcachefs is switching to shipping as
a DKMS module. Once the DKMS packages are in place very little should
change for end users, but we've got some work to do on the distribution
side of things to make sure things go smoothly.
Good news: ...
> Once the DKMS packages are in place very little should change for end users
Doesn't that mean I now have to enroll the MOK key on all my work workstations that use secure boot? If so that's a huge PITA on over 200 machines. As like with the NVIDIA driver you can't automate the facility.
Don't you only have to do that once per machine? After that the kernel should use the key you installed for every module that needs it. It is a pain in the ass for sure, but if you make it part of the deployment process it's manageable.
For sure it's a headache when you install some module on a whole bunch of headless boxes at once and then discover you need to roll a crash cart over to each and every one to get them booting again, but the secure boot guys would have it no other way.
The end result is still positive. Before the mainline submission, Bcachefs could not be DKMSed, as it relied on changes in other subsystems, as opposed to just additions, so you had to compile your own kernel. Now, it is available as something that can be compiled as a module for any recent-enough third-party kernel.
But presumably if said changes preventing DKMS usage were reasonable they would have been merged anyway independent of bcachefs, and likely with less drama and disruption? I'm not suggesting that there aren't some silver linings to the cloud, but it doesn't seem like the result is anywhere near neutral (let alone positive) for anyone involved.
Yes and no - the kernel interfaces only reflect what the kernel itself needs. It doesn't to my knowledge maintain interfaces for the purpose of enabling out-of-tree modules.
Changes would therefore need to be an improvement for in-tree drivers, and not merely something for an out-of-tree driver.
...for now. The policy of Linux is that they don't care about external modules/drivers at all, so once they start removing whatever bcachefs needs because no in-tree filesystem uses it we'll be back to a world of pain. (Unless they make an exception; they sure don't make one for ZFS.)
At least some of the OSS drama still is just purely code-based these days...
The dev acted out of line for kernel development, even if _kind_ of understandable (like with the recovery tool), but still in a way that would set a bad precedent for the kernel, so this appears to be good judgement from Linus.
I was one week away from setting up a new cluster and was all all in on bcachefs, drama be damned … that was until this[1]
Bcachefs is exciting on paper, but even just playing around there are some things that are just untenable imho. Time has proven that the stability of a project stems from the stability of the teams and culture behind it. As such the numbers don’t lie and unless it can be at parity with existing filesystems I can’t be bothered to forgive the misgivings.
I’m looking forward to the day when bcachefs matures… if ever, as it is exciting.
Also if something has changed in the last year I’d love to hear about it! I just haven’t found anything compelling enough yet to risk my time bsing around with it atm.
Yep. All people asked him to do was slow down a bit because they felt it was too much change at once. He refused for any reason other than his own to slow down. H said he only saw three reasons to slow down and none of them applied so Linus should just accept my patch now.
I never understand why some people are unwilling to make any attempt at getting along. Some people seem to feel any level of compromise is too much.
He justified breaking the guidelines to address critical issues. one can hope these kind of problems would not happen that frequently in a stable project, besides it is still experimental.
> I hope it eventually comes back once it is more stable.
Yes, me too.
> Would be great to have an in kernel alternative to ZFS
Yes it would.
> for parity RAID.
No.
Think of the Pareto Principle here. 80% of the people only use 20% of the functionality. BUT they don't all use the same 20% so overall you need 80% of the functionality... or more.
ZFS is one of the rivals here.
But Btrfs is another. Stratis is another. HAMMER2 is another. MDRAID is another. LVM is another.
All proviude some or all of that 20% and all have pros and cons.
The point is that, yes, ZFS is good at RAID and it's much much easier than ext4 on MDRAID or something.
Btrfs can do that too.
But ZFS and Btrfs do COW snapshots. Those are important too. OpenSUSE, Garuda Linux, siduction and others depend on Btrfs COW.
OK, fine, no problem, your use case is RAID. I use that too. Good.
But COW is just as important.
Integrity is just as important and Btrfs fails at that. That is why the Bcachefs slogan is "the COW filesystem that won't eat your data."
Btrfs ate my data 2-3 times a year for 4 years.
Doesn't matter how many people who praise it, what matters are the victims who have been burned when it fails. They prove that it does fail.
The point is not "I can do that with ext4 on mdraid" or "I can do that with LVM2" or "Btrfs is fine for me".
The point is something that can do _all of these_ and do it _better_ -- and here, "better" includes "in a simpler way".
Simpler here meaning "simpler to set up" and also "simpler in implementation" (compared to, say, Btrfs on LVM2, or Btrfs on mdraid, or LVM on mdraid, or ext4 on LVM on RAID.
Something that can remove entire layers of the stack and leave the same functionality is valuable.
Something that can remove 90% of the setup steps and leave identical functionality matters... Because different people do those steps in different order, or skip some, and you need to document that, and none of us document stuff enough.
The recovery steps for LVM on RAID are totally different from RAID on LVM. The recovery for Btrfs on mdraid is totally different from just Btrfs RAID.
This is why tools that eliminate this matter. Because when it matters whether you have
1 - 2 - 3 - 4 - 5
or
1 - 2 - 4 - 3 - 5
Then the sword that chops the Gordian knot here is one tool that does 1-5 in a single step.
This remains true even if you only use 1 and 5, or 2 and 3, and it still matters if you only do 4.
As far as I know, ZFS is either for smart people who want to do something sophisticated or trendy people who want to do something unwise.
> ext4 on MDRAID or something
Are trivially easy to set up, expand, or replace drives; require no upkeep; and no setup when placed into entirely different systems. Anybody using ZFS or ZFS-like to do some trivial standard RAID setup (unless they are used to and comfortable with ZFS, which is an entirely different story) is just begging to lose data. MDADM is fine.
I have a multidevice filesystem, comprised of old HDDs and one sketchy PCI-SATA extension. This FS was assembled in 2019 and, though it went through periods of being non-writable, is still working and I haven't lost any[1] data. This is more than 5 years, multitude of FS version upgrades, multiple device replacements with corresponding data evacuation and rereplication.
[1] Technically, I did lose some, when a dying device started misbehaving and writing garbage, and I was impatient and ran a destructive fsck (with fix_errors) before waiting for a bug patch.
Don't want to compare it to other solutions but this is impressive even on its own merits.
It's sad that it came with this, but in the end Linus and Kent had different ideas on how distribution of fixes should work so it makes sense that we have reached a situation where Kent controls the distribution frequency of the file system.
I have high hopes for bcachefs, but so far the benchmarks[0] are a quite disappointing. I understand it'll have overhead since it does many things, but I'd expect it to perform closer to btrfs or zfs, but it's consistently abysmal (which affects zfs at times, too).
It's hard to take those benchmarks too seriously. ZFS, btrfs and I guess bcachefs - which I've never used and don't have any opinion on - do things XFS and EXT4 don't and can't do.
I know more about ZFS than the others. It wasn't specified here whether ZFS had ashift=9 or 12; it tries to auto-detect, but that can go wrong. ashift=9 means ZFS is doing physical I/O in 512 bytes, which will be an emulation mode for the nvme. Maybe it was ashift=12. But you can't tell.
Secondly, ZFS defaults to a record size of 128k. Write a big file and it's written in "chunks" of 128k size. If you then run a random read/write I/O benchmark on it with a 4k block size, ZFS is going to be reading and writing 128k for every 4k of I/O. That's a huge amplification factor. If you're using ZFS for a load which resembles random block I/O, you'll want to tune the recordsize to the app I/O. And ZFS makes this easy, since child filesystem creation is trivially cheap and the recordsize can be tuned per filesystem.
And then there's the stuff things like ZFS does that XFS / EXT4 doesn't. For example, taking snapshots every 5 minutes (they're basically free), doing streaming incremental snapshot backups, snapshot cloning and so on - without getting into RAID flexibility.
I don't think any of that means the benchmarks shouldn't be taken seriously. GP didn't say they expect Bcachefs to perform like EXT4/XFS, they said they expected more like Btrfs or ZFS, to which it has more similar features.
On the the configuration stuff, these benchmarks intentionally only ever use the default configuration – they're not interested in the limits of what's possible with the filesystems, just what they do "out of the box", since that's what the overwhelming majority of users will experience.
> If you're using ZFS for a load which resembles random block I/O, you'll want to tune the recordsize to the app I/O.
You probably don't want to do that because that'll result in massive metadata overhead, and nothing tells you that the app's I/O operations will be nicely aligned, so this cannot be given as general advice.
> I very much doubt that's the main issue - the multithreaded sqlite performance makes me wonder if something's up with our fsync performance. I'll wait to see the results with the DKMS version, and if the numbers are still bad I'll have to see if I can replicate it and start digging.
I'm not even a bcachefs user, but I use ZFS extensively and I _really_ wanted Linux to get a native, modern COW filesystem that was unencumbered by the crappy corporate baggage that ZFS has.
In the comments on HN around any bcachefs news (including this one) there are always a couple throwaway accounts bleating the same arguments - sounding like the victim - that Kent frequently uses.
To Kent, if you're reading this:
From a long time (and now former) sponsor: if these posts are actually from you, please stop.
Also, it's time for introspection and to think how you could have handled this situation better, to avoid having disappointed those who have sponsored you financially for years. Yes, there are some difficult and flawed people maintaining the kernel, not least of which Linus himself, but you knew that when you started.
I hope bcachefs will have a bright future, but the ball is very clearly in your court. This is your problem to fix.
(I'm Daniel Wilson, subscription started 9th August 2018, last payment 1st Feb 2025)
I am also frustrated by this whole debacle, I'm not going to stop funding him though Bcachefs is a solid alternative to btrfs. It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
I really wish, though that DKMS was not such a terrible a solution. It _will_ break my boot, because it always breaks my boot. The Linux kernel really needs a stable module API so that out-of-tree modules like bcachefs are not impossible to reliably boot with.
DMKS is not going to work for me though. Some of the distros I use do not even support it. Chimera Linux uses ckms.
As for how we got here, It was not just one event. Kent repeatedly ignored the merge window, submitting changes too late. This irked Linux. When Linus complained, Kent attacked him. And Kent constantly ran down the LKML and kernel devs by name (especially the btrfs guys). This burned a lot of bridges. When Linus pushed Kent out, many people rushed to make it permanent instead of rushing to his defense. Kent lost my support in the final weeks by constantly shouting that he was a champion for his users while doing the exact opposite of what I wanted him to do. I want bcachefs in the kernel. Kent worked very hard to get it pushed out.
It really is a great file system though.
This isn't just a one time thing, speaking as someone who follows the kernel, apparently this has been going on pretty much since bcachefs first tried to get into Linus's tree. Kent even once told another kernel maintainer to "get your head examined" and was rewarded with a temporary ban.
Edit: To be fair, the kernel is infamous for being guarded by stubborn maintainers but really I guess the lesson to be learned here is if you want your pet project to stick around in the kernel you really can't afford to be stubborn yourself.
It was explicitly marked experimental.
It was marked experimental. No promises were made.
> It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
If you followed the whole saga, Kent had a history of being technically excellent but absolutely impossible to work with.
He did not follow the kernel development process over and over again, instead choosing to post long rants and personal attacks to the mailing lists. Instead of accepting that he has to adapt, he always had (still has) the attitude that he knows best, everybody else are idiots and should just accept his better judgement. Of course the rules should apply… to anyone but him, because his work is special.
Moreover, this was not a one PR thing. Linus actually had a long history of defending Kent, being patient, bending the rules, and generally putting up with him.
This is unambiguously 100% Kent’s fault, which is very sad, because — as mentioned before — Kent’s work is technically excellent. He had a lot of good will that he squandered in spectacular fashion.
But bcachefs never lived in userspace even before it was merged
From what I can tell, what happened is that was just a trigger for a clash of personalities between Linus and Kent, both of whom have a bit of a temper, and refused to back down, which escalated to this.
Seems to tick all of the boxes in regard to what you're looking for, and its mature enough that major linux distros are shipping with it as the default filesystem.
Your statement is misleading. No one is using btrfs on servers. Debian and Ubuntu use ext4 by default. RHEL removed support for btrfs long ago, and it's not coming back:
> Red Hat will not be moving Btrfs to a fully supported feature. It was fully removed in Red Hat Enterprise Linux 8.
Doesn't btrfs fit that description? I know there are some problems with it, but it is definitely a native COW filesystem, abd AFAIK it is "modern".
Btrfs also has issues with large numbers of snapshots, you have to cull them occasionally or things begin to slow down, bcachefs does not.
And every time something like this comes up, I end up with every sort of accusation pointed at me, and no one seems to be willing to look at the wider picture - why is the kernel community still unable to figure out a plan to get a trustworthy modern filesystem?
> This is your problem to fix.
No, I've said from the start that this needs to be a community effort. A filesystem is too big for one person.
Be realistic :) If the community wants this to happen, the community will have to step up.
Please please don't forget I want you to succeed - that's why I bunged nearly $800 your way in this endeavour - but I'm not the only person who thinks you come across as completely immune to critisicm, even when it's constructive and from your supporters.
>> This is your problem to fix.
> No, I've said from the start that this needs to be a community effort. A filesystem is too big for one person.
Right now that "community effort" is looking a bit unlikely, eh ?
I would hate to have to deal with these people as my primary occupation and I totally get why you don't want to continue.
That's said, nobody else has the power, skill or inclination to make bcachefs that wonderful filesystem of the future for Linux - only you. That's what I meant by "this is your problem to fix".
I wish you the best of luck with the new DKMS direction. And I'll get on board and actually try it out soon :D
Btrfs not good?
(Honest question.)
I've had a few issues, but no data loss:
* Early versions of btrfs had an issue where you'd run out of metadata space (if I recall). You had to rebalance and sometimes add some temporary space do that.
* One of my filesystems wasn't optimally aligned because btrfs didn't do that automatically (or something like that -- this was a long time ago.) A very very minor issue.
* Corruption (but no data loss, so I'm not sure it's corruption per se...) during a device replacement.
This last one caused no data loss, but a lot of error messages. I started a logical device removal, removed the device physically, rebooted, and then accidentally readded the physical device while it was still removing it logically. It was not happy. I physically removed the device again, finished the logical remove, and did a scrub and the fsck equivalent. No errors.
I think that's a testament to its resiliency, but also a testament how you can shoot yourself in the foot.
I've never used RAID5/6 on btrfs and don't plan to -- partly because of the scary words around it, but I also assume the rebuild time is longer.
It feels more user friendly than ZFS, but ZFS is much more feature complete. I used to use btrfs for all my personal stuff, but honestly ext4 is just easier.
Deleted Comment
Doesn't that mean I now have to enroll the MOK key on all my work workstations that use secure boot? If so that's a huge PITA on over 200 machines. As like with the NVIDIA driver you can't automate the facility.
Is this filesystem stable enough for deploying on 200 production machines?
From a cursory look I get things like this:
https://hackaday.com/2025/06/10/the-ongoing-bcachefs-filesys...
For sure it's a headache when you install some module on a whole bunch of headless boxes at once and then discover you need to roll a crash cart over to each and every one to get them booting again, but the secure boot guys would have it no other way.
Deleted Comment
Changes would therefore need to be an improvement for in-tree drivers, and not merely something for an out-of-tree driver.
Deleted Comment
For now
The dev acted out of line for kernel development, even if _kind_ of understandable (like with the recovery tool), but still in a way that would set a bad precedent for the kernel, so this appears to be good judgement from Linus.
Hope the best for Bcachefs's future
Bcachefs is exciting on paper, but even just playing around there are some things that are just untenable imho. Time has proven that the stability of a project stems from the stability of the teams and culture behind it. As such the numbers don’t lie and unless it can be at parity with existing filesystems I can’t be bothered to forgive the misgivings. I’m looking forward to the day when bcachefs matures… if ever, as it is exciting.
Also if something has changed in the last year I’d love to hear about it! I just haven’t found anything compelling enough yet to risk my time bsing around with it atm.
[1] https://youtube.com/watch?v=_RKSaY4glSc&pp=ygUZTGludXMgZmlsZ...
Would be great to have an in kernel alternative to ZFS for parity RAID.
This is not the first project for which this was an issue, and said maintainer has shown no will to alter their behaviour before or since.
I never understand why some people are unwilling to make any attempt at getting along. Some people seem to feel any level of compromise is too much.
Dead Comment
Yes, me too.
> Would be great to have an in kernel alternative to ZFS
Yes it would.
> for parity RAID.
No.
Think of the Pareto Principle here. 80% of the people only use 20% of the functionality. BUT they don't all use the same 20% so overall you need 80% of the functionality... or more.
ZFS is one of the rivals here.
But Btrfs is another. Stratis is another. HAMMER2 is another. MDRAID is another. LVM is another.
All proviude some or all of that 20% and all have pros and cons.
The point is that, yes, ZFS is good at RAID and it's much much easier than ext4 on MDRAID or something.
Btrfs can do that too.
But ZFS and Btrfs do COW snapshots. Those are important too. OpenSUSE, Garuda Linux, siduction and others depend on Btrfs COW.
OK, fine, no problem, your use case is RAID. I use that too. Good.
But COW is just as important.
Integrity is just as important and Btrfs fails at that. That is why the Bcachefs slogan is "the COW filesystem that won't eat your data."
Btrfs ate my data 2-3 times a year for 4 years.
Doesn't matter how many people who praise it, what matters are the victims who have been burned when it fails. They prove that it does fail.
The point is not "I can do that with ext4 on mdraid" or "I can do that with LVM2" or "Btrfs is fine for me".
The point is something that can do _all of these_ and do it _better_ -- and here, "better" includes "in a simpler way".
Simpler here meaning "simpler to set up" and also "simpler in implementation" (compared to, say, Btrfs on LVM2, or Btrfs on mdraid, or LVM on mdraid, or ext4 on LVM on RAID.
Something that can remove entire layers of the stack and leave the same functionality is valuable.
Something that can remove 90% of the setup steps and leave identical functionality matters... Because different people do those steps in different order, or skip some, and you need to document that, and none of us document stuff enough.
The recovery steps for LVM on RAID are totally different from RAID on LVM. The recovery for Btrfs on mdraid is totally different from just Btrfs RAID.
This is why tools that eliminate this matter. Because when it matters whether you have
1 - 2 - 3 - 4 - 5
or
1 - 2 - 4 - 3 - 5
Then the sword that chops the Gordian knot here is one tool that does 1-5 in a single step.
This remains true even if you only use 1 and 5, or 2 and 3, and it still matters if you only do 4.
> ext4 on MDRAID or something
Are trivially easy to set up, expand, or replace drives; require no upkeep; and no setup when placed into entirely different systems. Anybody using ZFS or ZFS-like to do some trivial standard RAID setup (unless they are used to and comfortable with ZFS, which is an entirely different story) is just begging to lose data. MDADM is fine.
I have a multidevice filesystem, comprised of old HDDs and one sketchy PCI-SATA extension. This FS was assembled in 2019 and, though it went through periods of being non-writable, is still working and I haven't lost any[1] data. This is more than 5 years, multitude of FS version upgrades, multiple device replacements with corresponding data evacuation and rereplication.
[1] Technically, I did lose some, when a dying device started misbehaving and writing garbage, and I was impatient and ran a destructive fsck (with fix_errors) before waiting for a bug patch.
Don't want to compare it to other solutions but this is impressive even on its own merits.
IIRC the whole drama began because Kent was constantly pushing new features along with critical bug fixes after the proper merge window.
I meant stable in the sense where most changes are bug fixes, reducing the friction of working within the kernel schedules.
[0] https://www.phoronix.com/review/linux-617-filesystems
has the benchmarks of the dkms module
I know more about ZFS than the others. It wasn't specified here whether ZFS had ashift=9 or 12; it tries to auto-detect, but that can go wrong. ashift=9 means ZFS is doing physical I/O in 512 bytes, which will be an emulation mode for the nvme. Maybe it was ashift=12. But you can't tell.
Secondly, ZFS defaults to a record size of 128k. Write a big file and it's written in "chunks" of 128k size. If you then run a random read/write I/O benchmark on it with a 4k block size, ZFS is going to be reading and writing 128k for every 4k of I/O. That's a huge amplification factor. If you're using ZFS for a load which resembles random block I/O, you'll want to tune the recordsize to the app I/O. And ZFS makes this easy, since child filesystem creation is trivially cheap and the recordsize can be tuned per filesystem.
And then there's the stuff things like ZFS does that XFS / EXT4 doesn't. For example, taking snapshots every 5 minutes (they're basically free), doing streaming incremental snapshot backups, snapshot cloning and so on - without getting into RAID flexibility.
On the the configuration stuff, these benchmarks intentionally only ever use the default configuration – they're not interested in the limits of what's possible with the filesystems, just what they do "out of the box", since that's what the overwhelming majority of users will experience.
You probably don't want to do that because that'll result in massive metadata overhead, and nothing tells you that the app's I/O operations will be nicely aligned, so this cannot be given as general advice.
https://www.phoronix.com/forums/forum/software/general-linux...