The recent FreeBSD 14 release apparently failed to build for one of the platforms because a file "somehow ended up being full of NUL bytes" [0]. I wonder if that's due to this bug? (Could be just a coincidence, of course.)
OpenZFS 2.1.4 was included in FreeBSD 13.1, according to the release notes.
> find potentially corrupted files (with some risk of false positives) by searching for zero-byte blocks
I believe it's not the best method as it will require carefully defining how many null-bytes blocks (and of which size) are a sign of corruption.
Another reason is the frequency of corruption was shown to vary during tests, and to be based on the IO load : the observed zero-byte blocks may have been caused by the artificial scenario to intentionally reproduce the bug. Natural occurrences of this bug may have a different pattern.
You should use ctime+checksums from old backups instead.
It's a very bad bug, so it is important to note there's an apparently-effective mitigation[1], which doesn't make it impossible to hit, but reduces the risk of it in the real world by a lot.
# as root, or equivalent:
echo 0 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync
Before making this change, I was able to easily reproduce this bug on all of my ZFS filesystems (using the reproducer.sh[2] script from the bug thread). After making this change, I could no longer reproduce the bug at all.
It seems that the bug is relatively rare (although still very bad if it happens) in most real-world scenarios; one user doing a heuristic scan to look for corrupted files found 0.00027% of their files (7 out of ~2,500,000) were likely affected[3].
The mitigation above (disabling zfs_dmu_offset_next_sync) seems to reduce those odds of the bug happening significantly. Almost everybody reports they can no longer reproduce the bug after changing it.
Note that changing the setting doesn't persist across reboots, so you have to automate that somehow (e.g. editing /etc/modprobe.d/zfs.conf or whatever the normal way to control the setting is on your system; the GitHub thread has info for how to do it on Mac and Windows).
Why this is a spectacularly bad bug is that it not only corrupts data, but it may be impossible to know if your existing files have been hit by this bug, during its long existence. There is active discussion in the bug thread about what kind of heuristics can be used to find "this file was probably corrupted by the bug" but there's no way (so far, and probably ever) to tell for sure (short of having some known-good copy of the file elsewhere to compare it to).
Which makes the above mitigation all the more important while we wait for the fix!
(>_<) Oh man, I knew about [0] when I posted (which is why I said it just reduces the chance of hitting the bug (by a lot)). But after spending all Saturday JST on it, I went to bed before [1] was posted.
Skimming through #6958 though, it seems like it's the lesser of evils, compared to #15526... I think? It's less obvious (to me) what the impact of #6958 is. Is it silent undetectable corruption of your precious data potentially over years, or more likely to cause a crash or runtime error?
But I have to read more. But since the zfs_dmu_offset_next_sync setting was disabled by default until recently, I still suspect (but yeah, don't know for sure) that disabling is the safest thing we can currently do on unmodified ZFS systems.
> It seems that the bug is relatively rare (although still very bad if it happens) in most real-world scenarios; one user doing a heuristic scan to look for corrupted files found 0.00027% of their files (7 out of ~2,500,000) were likely affected.
I'm running the following script to detect corruption.[0] The two files I've found so far seem like false positives.
Yeah, I am running a similar script I got from the GitHub bug thread; so far I have not found any suspected-corrupt files at all, except for in the files generated by the reproducer.sh script, e.g.:
Possible corruption in /mnt/slow/reproducer_146495_720
UPDATE: There is now a very good simple explantaion of the bug, and how it became much more likely to happen recently, even though it has existed for a very long time:
Thanks, I had seen the title with the cloning issue, and I even commented, but I thought it didn't apply to me as new features are never deployed until at least a few months have passed and other people have confirmed they work well.
I was only worried about the zfs send | zfs receive bug corrupting both pools.
I had been too lazy to check the details, so I had missed that the bug could be triggered even when you DON'T use block cloning at all but just copy files, and even on old versions like 2.1.4: it's only the probability of the bug that increases.
I only caught this issue through reddit.
Now thanks to your link after going through all the github comments I'm rereading all the comments from the HN thread from 2 days ago to decide how to analyze several months worth of backups, some of them not on ZFS, but all of them sourced from ZFS and therefore now suspicious of silent corruption.
Hopefully, this warning will stay long enough in the toplist for affected users to deploy the simple countermeasures or at least preserve their backups until we know how to identify the affected files reliably enough (though number of contiguous zeroes, repeating patterns inside the file etc)
Yeah, right now, it's actually not known whether roots of this bug dates back to even the Sun days. And because Oracle ZFS is not open source we can't know if this or other bugs are lurking.
It seems to require certain extremely specific situations to trigger it. Like using copy_file_range, which is the new feature which exposed it, or writing a file and then writing a larger than recordsize hole in that file in the same transaction group.
OpenZFS 2.1.4 was included in FreeBSD 13.1, according to the release notes.
[0] https://www.daemonology.net/blog/2023-11-21-late-breaking-Fr...
It's definitely possible. Once we get this bug patched I'll definitely be updating the builders.
looks at the FreeBSD 14.0 errata on freebsd.org
This is precisely why I wouldn't run FreeBSD in production. Look at this shit.
- You need to freebsd-update fetch install before you upgrade
- EC2 AMIs can't handle binary user-data
- FreeBSD Update reports 14.0-RELEASE approaching its EoL
Two ways to break booting and one really stupid mistake and NONE of it is listed on freebsd.org. Sigh.
Not sure what #2 is about, I don't use cloud.
And the EoL thing did not happen to me, I did hear of it happening though but it's not omnipresent.
It increased in probability a bit after 2.1.4 (when zfs_dmu_offset_next_sync=1 became the default), and even more after 2.2.0 (with block cloning)
Quick workaround: echo 0 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync
https://github.com/openzfs/zfs/pull/15571
Pretty crazy the bug might date back 10 years.
People are using `reproducer.sh` (https://gist.github.com/tonyhutter/d69f305508ae3b7ff6e9263b2...) to see if they can reproduce the bug intentionally: [1]
https://github.com/0x0177b11f/zfs-issue-15526-check-file tries to find potentially corrupted files (with some risk of false positives) by searching for zero-byte blocks: [2]
[0]: https://github.com/openzfs/zfs/issues/15526#issuecomment-181...
[1]: https://github.com/openzfs/zfs/issues/15526#issuecomment-182...
[2]: https://github.com/openzfs/zfs/issues/15526#issuecomment-182...
I believe it's not the best method as it will require carefully defining how many null-bytes blocks (and of which size) are a sign of corruption.
Another reason is the frequency of corruption was shown to vary during tests, and to be based on the IO load : the observed zero-byte blocks may have been caused by the artificial scenario to intentionally reproduce the bug. Natural occurrences of this bug may have a different pattern.
You should use ctime+checksums from old backups instead.
Would a zfs snapshot be sufficient?
It seems that the bug is relatively rare (although still very bad if it happens) in most real-world scenarios; one user doing a heuristic scan to look for corrupted files found 0.00027% of their files (7 out of ~2,500,000) were likely affected[3].
The mitigation above (disabling zfs_dmu_offset_next_sync) seems to reduce those odds of the bug happening significantly. Almost everybody reports they can no longer reproduce the bug after changing it.
Note that changing the setting doesn't persist across reboots, so you have to automate that somehow (e.g. editing /etc/modprobe.d/zfs.conf or whatever the normal way to control the setting is on your system; the GitHub thread has info for how to do it on Mac and Windows).
Why this is a spectacularly bad bug is that it not only corrupts data, but it may be impossible to know if your existing files have been hit by this bug, during its long existence. There is active discussion in the bug thread about what kind of heuristics can be used to find "this file was probably corrupted by the bug" but there's no way (so far, and probably ever) to tell for sure (short of having some known-good copy of the file elsewhere to compare it to).
Which makes the above mitigation all the more important while we wait for the fix!
[1]: the Reddit thread advocating for this is here: https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_bl... rruption/
[2]: https://gist.github.com/masonmark/03c7cb08e22968b1f2feb1a6ac...
[3]: https://github.com/openzfs/zfs/issues/15526#issuecomment-182...
[0] https://github.com/openzfs/zfs/issues/15526#issuecomment-182...
[1] https://github.com/openzfs/zfs/issues/15526#issuecomment-182...
[2] https://github.com/openzfs/zfs/issues/6958
Skimming through #6958 though, it seems like it's the lesser of evils, compared to #15526... I think? It's less obvious (to me) what the impact of #6958 is. Is it silent undetectable corruption of your precious data potentially over years, or more likely to cause a crash or runtime error?
Reports like https://github.com/intel/bmap-tools/issues/65 make it seem more like the latter.
But I have to read more. But since the zfs_dmu_offset_next_sync setting was disabled by default until recently, I still suspect (but yeah, don't know for sure) that disabling is the safest thing we can currently do on unmodified ZFS systems.
I'm running the following script to detect corruption.[0] The two files I've found so far seem like false positives.
[0]: https://gist.github.com/kimono-koans/2696a8c8eac0a6babf7b2d9...
https://github.com/openzfs/zfs/issues/15526#issuecomment-182...
I was only worried about the zfs send | zfs receive bug corrupting both pools.
I had been too lazy to check the details, so I had missed that the bug could be triggered even when you DON'T use block cloning at all but just copy files, and even on old versions like 2.1.4: it's only the probability of the bug that increases.
I only caught this issue through reddit.
Now thanks to your link after going through all the github comments I'm rereading all the comments from the HN thread from 2 days ago to decide how to analyze several months worth of backups, some of them not on ZFS, but all of them sourced from ZFS and therefore now suspicious of silent corruption.
Hopefully, this warning will stay long enough in the toplist for affected users to deploy the simple countermeasures or at least preserve their backups until we know how to identify the affected files reliably enough (though number of contiguous zeroes, repeating patterns inside the file etc)
It seems to require certain extremely specific situations to trigger it. Like using copy_file_range, which is the new feature which exposed it, or writing a file and then writing a larger than recordsize hole in that file in the same transaction group.