ZFS silent corruption bug found: replaces chunks inside copied files by zeroes

For those just wondering "am I affected?", my understanding is:

    zpool get all | grep bclone

> If the result is 0 for both bcloneused and bclonesaved then it's safe to say that you don't have silent corruption. [0]

People are using `reproducer.sh` (https://gist.github.com/tonyhutter/d69f305508ae3b7ff6e9263b2...) to see if they can reproduce the bug intentionally: [1]

https://github.com/0x0177b11f/zfs-issue-15526-check-file tries to find potentially corrupted files (with some risk of false positives) by searching for zero-byte blocks: [2]

[0]: https://github.com/openzfs/zfs/issues/15526#issuecomment-181...

[1]: https://github.com/openzfs/zfs/issues/15526#issuecomment-182...

[2]: https://github.com/openzfs/zfs/issues/15526#issuecomment-182...

csdvrx · 2 years ago

> find potentially corrupted files (with some risk of false positives) by searching for zero-byte blocks

I believe it's not the best method as it will require carefully defining how many null-bytes blocks (and of which size) are a sign of corruption.

Another reason is the frequency of corruption was shown to vary during tests, and to be based on the IO load : the observed zero-byte blocks may have been caused by the artificial scenario to intentionally reproduce the bug. Natural occurrences of this bug may have a different pattern.

You should use ctime+checksums from old backups instead.

n8henrie · 2 years ago

Thanks for your input. Do you have any more detailed instructions for the ctime + checksum method?

Would a zfs snapshot be sufficient?

It's a very bad bug, so it is important to note there's an apparently-effective mitigation[1], which doesn't make it impossible to hit, but reduces the risk of it in the real world by a lot.

    # as root, or equivalent:
    echo 0 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync

Before making this change, I was able to easily reproduce this bug on all of my ZFS filesystems (using the reproducer.sh[2] script from the bug thread). After making this change, I could no longer reproduce the bug at all.

It seems that the bug is relatively rare (although still very bad if it happens) in most real-world scenarios; one user doing a heuristic scan to look for corrupted files found 0.00027% of their files (7 out of ~2,500,000) were likely affected[3].

The mitigation above (disabling zfs_dmu_offset_next_sync) seems to reduce those odds of the bug happening significantly. Almost everybody reports they can no longer reproduce the bug after changing it.

Note that changing the setting doesn't persist across reboots, so you have to automate that somehow (e.g. editing /etc/modprobe.d/zfs.conf or whatever the normal way to control the setting is on your system; the GitHub thread has info for how to do it on Mac and Windows).

Why this is a spectacularly bad bug is that it not only corrupts data, but it may be impossible to know if your existing files have been hit by this bug, during its long existence. There is active discussion in the bug thread about what kind of heuristics can be used to find "this file was probably corrupted by the bug" but there's no way (so far, and probably ever) to tell for sure (short of having some known-good copy of the file elsewhere to compare it to).

Which makes the above mitigation all the more important while we wait for the fix!

[1]: the Reddit thread advocating for this is here: https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_bl... rruption/

[2]: https://gist.github.com/masonmark/03c7cb08e22968b1f2feb1a6ac...

[3]: https://github.com/openzfs/zfs/issues/15526#issuecomment-182...

qwertox · 2 years ago

It seems that this doesn't really fixe the bug [0] and also enables another one [1][2]

[0] https://github.com/openzfs/zfs/issues/15526#issuecomment-182...

[1] https://github.com/openzfs/zfs/issues/15526#issuecomment-182...

[2] https://github.com/openzfs/zfs/issues/6958

veidr · 2 years ago

(>_<) Oh man, I knew about [0] when I posted (which is why I said it just reduces the chance of hitting the bug (by a lot)). But after spending all Saturday JST on it, I went to bed before [1] was posted.

Skimming through #6958 though, it seems like it's the lesser of evils, compared to #15526... I think? It's less obvious (to me) what the impact of #6958 is. Is it silent undetectable corruption of your precious data potentially over years, or more likely to cause a crash or runtime error?

Reports like https://github.com/intel/bmap-tools/issues/65 make it seem more like the latter.

But I have to read more. But since the zfs_dmu_offset_next_sync setting was disabled by default until recently, I still suspect (but yeah, don't know for sure) that disabling is the safest thing we can currently do on unmodified ZFS systems.

mustache_kimono · 2 years ago

> It seems that the bug is relatively rare (although still very bad if it happens) in most real-world scenarios; one user doing a heuristic scan to look for corrupted files found 0.00027% of their files (7 out of ~2,500,000) were likely affected.

I'm running the following script to detect corruption.[0] The two files I've found so far seem like false positives.

[0]: https://gist.github.com/kimono-koans/2696a8c8eac0a6babf7b2d9...

veidr · 2 years ago

Yeah, I am running a similar script I got from the GitHub bug thread; so far I have not found any suspected-corrupt files at all, except for in the files generated by the reproducer.sh script, e.g.:

    Possible corruption in /mnt/slow/reproducer_146495_720

veidr · 2 years ago

UPDATE: There is now a very good simple explantaion of the bug, and how it became much more likely to happen recently, even though it has existed for a very long time:

https://github.com/openzfs/zfs/issues/15526#issuecomment-182...

ptx · 2 years ago

The recent FreeBSD 14 release apparently failed to build for one of the platforms because a file "somehow ended up being full of NUL bytes" [0]. I wonder if that's due to this bug? (Could be just a coincidence, of course.)

OpenZFS 2.1.4 was included in FreeBSD 13.1, according to the release notes.

[0] https://www.daemonology.net/blog/2023-11-21-late-breaking-Fr...

cperciva · 2 years ago

I wonder if that's due to this bug?

It's definitely possible. Once we get this bug patched I'll definitely be updating the builders.

inferiorhuman · 2 years ago

looks at the list

looks at the FreeBSD 14.0 errata on freebsd.org

This is precisely why I wouldn't run FreeBSD in production. Look at this shit.

- You need to freebsd-update fetch install before you upgrade

- EC2 AMIs can't handle binary user-data

- FreeBSD Update reports 14.0-RELEASE approaching its EoL

Two ways to break booting and one really stupid mistake and NONE of it is listed on freebsd.org. Sigh.

wkat4242 · 2 years ago

freebsd-update fetch install is part of the normal upgrade process. So yeah of course that's necessary, it's how you upgrade.

Not sure what #2 is about, I don't use cloud.

And the EoL thing did not happen to me, I did hear of it happening though but it's not omnipresent.

The root cause of the bug may have been present for a long time.

It increased in probability a bit after 2.1.4 (when zfs_dmu_offset_next_sync=1 became the default), and even more after 2.2.0 (with block cloning)

Quick workaround: echo 0 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync

AndrewDavis · 2 years ago

RobN has opened a PR consisting of the patch he asked people to test in the issue.

https://github.com/openzfs/zfs/pull/15571

Pretty crazy the bug might date back 10 years.

remram · 2 years ago

dupe: https://news.ycombinator.com/item?id=38380240 (2 days ago)

Thanks, I had seen the title with the cloning issue, and I even commented, but I thought it didn't apply to me as new features are never deployed until at least a few months have passed and other people have confirmed they work well.

I was only worried about the zfs send | zfs receive bug corrupting both pools.

I had been too lazy to check the details, so I had missed that the bug could be triggered even when you DON'T use block cloning at all but just copy files, and even on old versions like 2.1.4: it's only the probability of the bug that increases.

I only caught this issue through reddit.

Now thanks to your link after going through all the github comments I'm rereading all the comments from the HN thread from 2 days ago to decide how to analyze several months worth of backups, some of them not on ZFS, but all of them sourced from ZFS and therefore now suspicious of silent corruption.

Hopefully, this warning will stay long enough in the toplist for affected users to deploy the simple countermeasures or at least preserve their backups until we know how to identify the affected files reliably enough (though number of contiguous zeroes, repeating patterns inside the file etc)

k8svet · 2 years ago

Wait, is scrubbing it and checking for errors not sufficient?

steponlego · 2 years ago

Note this is an issue with OpenZFS, not ZFS.

kadoban · 2 years ago

What other implementation of ZFS is commonly used?

Sun's ZFS, now owned and maintained by Oracle. ZFS is actually a Sun project.

justaj · 2 years ago

Is there any indication this bug was not present in (Oracle) ZFS?

Yeah, right now, it's actually not known whether roots of this bug dates back to even the Sun days. And because Oracle ZFS is not open source we can't know if this or other bugs are lurking.

It seems to require certain extremely specific situations to trigger it. Like using copy_file_range, which is the new feature which exposed it, or writing a file and then writing a larger than recordsize hole in that file in the same transaction group.

kevvok · 2 years ago

Interestingly, a CVE has been assigned for this issue: https://www.cve.org/CVERecord?id=CVE-2023-49298

mekster · 2 years ago

It says because it could overwrite config files but not always a security problem.