Readit News logoReadit News
cesarb · 2 years ago
IMO, part of the issue is that something which used to be just a low-level optimization (don't store large sequences of zeros) became visible to userspace (SEEK_HOLE and friends). Quoting from this article:

"This is allowed; its always safe to say there’s data where there’s a hole, because reading a hole area will always find “zeroes”, which is valid data."

But I recall reading elsewhere a discussion about some userspace program which did depend on holes being present in the filesystem as actual holes (visible to SEEK_HOLE and so on) and not as runs of zeros.

Combined with the holes being restricted to specific alignments and sizes, this means that the underlying "sequence of fixed-size blocks" implementation is leaking too much over the abstract "stream of bytes" representation we're more used to. Perhaps it might be time to rethink our filesystem abstractions?

codys · 2 years ago
> But I recall reading elsewhere a discussion about some userspace program which did depend on holes being present in the filesystem as actual holes (visible to SEEK_HOLE and so on) and not as runs of zeros.

"treatment of on-disk segments as "what was written by programs" can cause areas of 0 to not be written by bmaptool copy":

https://github.com/intel/bmap-tools/issues/75

IMO, the issue here isn't filesystem or zfs behavior, it's that bmap-tool wants an extra "don't care bit" per block, which filesystems (traditionally) don't track, and programs interacting with filesystem don't expect to exist.

Some of the comments I've made in this issue describe options to make things better.

(FWIW: the original hn link discusses a different issue around seek hole/data, and the bmap-tool issue is backwards from the issue the parent posits: bmap-tool relies on explicit runs of zeros written not being holes, and particular behavior from programs writing data)

ajross · 2 years ago
Indeed, sparse files are simply a mistake to have included in Unix in the first place (I think we blame this on early SunOS? Not sure, though almost certain that 3BSD and v7 didn't have them). Yes, they have been used productively for various tricks, but they create a bunch of complexity that every filesystem needs to carry along with it. It's a bad trade.
retrac · 2 years ago
Sparse files make more sense if you see the file system and paging as unified. If you have allocated an array of 1 billion items, accessing the last item doesn't make the OS zero out everything from 0th to the billionth item, allocating millions of pages along the way. Virtual emory is sparse; so just one page of virtual memory is allocated. Mmap'd sparse files behave the same way.
cogman10 · 2 years ago
This a feature I was completely unaware of. Why would you choose to use a sparse file instead of multiple files?
mgerdts · 2 years ago
When I think of a fs corruption bug, I think of something that causes fsck/scrub to have some work to do, sometimes sending resulting in restore from backups. From the early reports of this, I was having a hard time understanding how it was a corruption bug. This excellent write up clears that up:

> Incidentally, that’s why this isn’t “corruption” in the traditional sense (and why a scrub doesn’t find it): no data was lost. cp didn’t read data that was there, and it wrote some zeroes which OpenZFS safely stored.

dannyw · 2 years ago
Fascinating write up. As someone with a ZFS system, how can I check if I’m affected?
moviuro · 2 years ago
It's a very rare race condition, odds are very low that you were impacted. If you were, you would have noticed (heavy builds with files being moved around where suddenly files are zero).

[0] https://bugs.gentoo.org/917224

[1] https://github.com/openzfs/zfs/issues/15526 (referenced in the article)

dist-epoch · 2 years ago
https://github.com/openzfs/zfs/issues/15526#issuecomment-181...

> zpool get all tank | grep bclone

> kc3000 bcloneused 442M

> kc3000 bclonesaved 1.42G

> kc3000 bcloneratio 4.30x

> My understanding is this: If the result is 0 for both bcloneused and bclonesaved then it's safe to say that you don't have silent corruption.

keep_reading · 2 years ago
bclones were only one way to trigger the corruption. This is not a good way to check.

It's also not worth checking for because this bug has existed for many years. Your data probably wasn't affected. None of the massive ZFS storage companies out there ran into it by now either.

Your data is fine. Sleep easy.

LanzVonL · 2 years ago
It's important to note that the recent showstopper bugs have all been in OpenZFS, with the Oracle nee Sun ZFS being unaffected by either.
nimbius · 2 years ago
Oracle laid off basically every Solaris developer in 2017. They are by all observation simply not interersted in the product anymore. its probably the most mournful thing ive seen in tech in a very long time.

OpenZFS is a mighty filesystem hobbled by an absolutely detestable license (the CDDL.) Its greatest single contribution was in all likelyhood to BSD, although it didnt seem to make the OS more popular as a whole.

the latest and greatest from the OpenZFS crowd seems to be bullying Torvalds semi-annually into considering OpenZFS in Linux...which will never happen thanks to CDDL and so the forums devolve into armchair legal discussions of the true implications of CDDL. You'll see a stable BTRFS and a continued effort to polish XFS/LVM/MDRAID before openZFS ever makes a dent.

One could argue OpenZFS is a radioactive byproduct of one of the most lethal forces in open source in the past 20 some years: Oracle. They gobbled up openoffice and MySQL, and went clawing after RedHat just shortly after mindlessly sending Sun to the gallows. Theyre an unmitigated carbunkle on some of the largest corporations in the entire world, surviving solely on perpetual licensing and real-world threat of litigation. That they have a physical product at all in 2023 is a pretty amazing testament to the shambling money-corpse empire of Ellison.

Ultimately the FOSS community under Torvalds is on the right track. Just because Shuttleworth thinks he cant be sued by Oracle for including ZFS in Ubuntu with some hastily reasoned shim doesnt mean Oracle wont nonchalantly send his entire company to the graveyard just for trying. Oracle is a balrog. stay as far away as you can.

p_l · 2 years ago
Oracle isn't copyright holder for OpenZFS. That's one part that OpenSolaris and OpenZFS projects managed to ensure. What Oracle could do was to close OpenSolaris again under proprietary license, something that Brian Cantrill IIRC blamed on the use of copyright assignment, and that open source projects should never use it - with that as a specific example.

OpenZFS devs have openly declared that no, they are not pushing to include OpenZFS into Linux kernel, and that separate arrangement is just fine, especially since it allows different release cadence and keeps code portable.

Mainly there's an issue with certain Linux Kernel big name(s) that like to use GPL-only exports (something that has uncertain legal status) in a rather blunt way, and sometimes the reasoning is iffy.

paldepind2 · 2 years ago
> Just because Shuttleworth thinks he cant be sued by Oracle for including ZFS in Ubuntu with some hastily reasoned shim doesnt mean Oracle wont nonchalantly send his entire company to the graveyard just for trying.

Canonical has been shipping the kernel with ZFS for more than 7 years and so far they have not been sued by Oracle.

mardifoufs · 2 years ago
How is the CDDL any more detestable than the GPL family of licenses? Not saying that they are detestable in any way, but the CDDL is also a free software license so I don't get how it's worse or bad
avianlyric · 2 years ago
> You'll see a stable BTRFS and a continued effort to polish XFS/LVM/MDRAID before openZFS ever makes a dent.

Right now I would put my money on bcachefs[1] rather than BTRFS. bcachefs is currently in the process of being merged into the kernel and will be in the next kernel release. Doesn't currently quite offer everything ZFS does, but it's very close and already appears more reliable than BTRFS, and once stuff like Erasure Coding is stable, it'll be more flexible than ZFS.

[1] https://bcachefs.org

MCUmaster · 2 years ago
Oracle can’t do a thing to an Isle of Man corporation.
rincebrain · 2 years ago
Who on earth is trying to bully Linus into anything? Where have you seen that?
rustcleaner · 2 years ago
This is bull**! It's time for a new license to throw off the old: The Uniform Pirating License! It's a license which you stick on anything which you need a license, and it conveys all rights and zero obligations to you. Possession of the code is sufficient to run, change, and propagate code. Legal system be damned; we have cryptography and Tor, The State's law here is irrelevant (also when did you give The State license to bully you around anyway?)!

My fix: spin up a .onion to host my distribution of the Linux kernel containing ZFS integrated and BtrFS excised, do not answer abuse/legal emails, don't even have email to receive aforementioned emails. What's the pencil-necked shrimpy IP lawyer at Kernel Foundation going to do? Shut down Tor?

frankjr · 2 years ago
I wonder if any large storage provider has been affected by this. I know Hetzner Storage Box and rsync.net both use ZFS under the hood.
mappu · 2 years ago
Wasabi Cloud Storage have a Sponsored-By tag on the git commit fixing the issue, so I assume they're highly involved somehow.
joshxyz · 2 years ago
anyone know what diagram tool did he use? thanks
egberts1 · 2 years ago
Plantuml, doable in.
guiambros · 2 years ago
Any idea which diagram in PlantUML more specifically? I looked at a handful of the PlantUML categories (each one with dozens of examples) and haven't seen anything like the diagrams in OP's post.
commandersaki · 2 years ago
Excellent writeup robn!
lupusreal · 2 years ago
Is anybody using bcachefs yet?
frankjr · 2 years ago
I'm keeping an eye on it but it's not there yet e.g. https://github.com/koverstreet/bcachefs/issues/619#issuecomm...
ktm5j · 2 years ago
Well, to be fair they tried and failed to reproduce the corruption that was reported. While I agree that I'm not ready to dive into bcachefs, I'm not exactly swayed by this bug report.