Wouldn't this have been caught by an exhaustive unit test or even a fuzz test? I don't know what kind of testing practices are applied to projects like zfs, nor what kinds or amounts of tests functions like this are subject to, but I would imagine that especially for low-level functions like this, unit tests with all of the known edge cases would be written.
(yes this is very much an armchair opinion, I mostly do front-end JS for a living)
Yeah that surprised me too - I would have assumed that ZFS had a bunch of "store and retrieve" tests in many different configurations that would have caught this.
I hate to be the that OpenBSD guy, but "the people who do the work are the ones to decide how it's done". Yes, people are paid to maintain OpenZFS, but so far nobody is ready to pay for (or volunteer to) meet your bar.
Side note: OpenZFS already has an extensive test suite. Merely hitting a code branch wouldn't have caught this one.
I found it. All that tells you is that it's a simple problem. Had I not been told it was broken I likely would not have.
It's the kind of bug that makes you stop breathing for a brief moment. So I ran this function through Gemini 2.5 Pro, ChatGPT o3 and Grok 3. No context other than the source code fragment was provided. All three also clearly identified the problem.
There is no longer a reason for trivial flaws such as this surviving to release code. We're past that now, and that's an astonishing thing.
These are the questions I ponder: Should we consider the possibility that the ongoing, incomplete and decades long pursuit of "better" languages is now misguided? Or, that perhaps "better" might now mean whatever properties make code easier for LLMs to analyze and validate against specifications, as opposed to merely preventing the flaws humans are prone to make?
No way is it practical to run complex code like this through a sycophantic parrot. Try it again with some old, well known code and see how many “Certainly! Your error is …” you get.
There was an article here about a month or so ago doing exactly that unpractical thing against I think NFS, and finding a small handful of confirmed vulnerabilities that way. Wish granted I suppose?
> whatever properties make code easier for LLMs to analyze
So double down on languages that are more represented in training data? I think it's still worthwhile to actually improve programming languages. Relying only on LLMs is fragile. So ideally do both: improve language and AI.
> So double down on languages that are more represented in training data?
The pragmatic answer to that is: this appears to be highly effective.
What I have in mind is something else, however. Consider the safety nets we try to build with, for example, elaborate type systems. These new analysis tools can operate on enormous contexts and infer and enforce "type" with a capacity far greater what a human mind can hope to approach. So perhaps there is little value to complex types. Instead, a simple type system supported by specification will be the superior approach. seL4 is an example of the concept: C, but exhaustively specified and verified.
Like gibberlink. Programming languages are insane to me. In a way we created them to communicate with machines, but they evolved to complement a very human workflow. It makes little sense outside of assisting humans to train LLMs on human computer languages, why not let the AIs generate their own optimized languages?
>It makes little sense outside of assisting humans to train LLMs on human computer languages, why not let the AIs generate their own optimized languages?
They produce so much BS, but at least now we can catch most of it. If we the languages are suddenly optimised for the AIs we couldn't do that anymore.
The pursuit of "better" languages might be incomplete, but it has long been known that this specific problem can be solved by the newtype pattern. It might have originated in Haskell but it's equally applicable in more traditional languages like C++. All you need is to define a named struct containing an integer called PhysicalSize and another one called AllocatedSize. In C++ add operator overloading to it so you can do arithmetic. In Haskell add instances such as Integral and Num.
> I found it. All that tells you is that it's a simple problem
Not totally clear what you mean here - are you saying you’re the author of the article or PR, or that you independently discovered the same issue (after the fact)?
Ok, so somehow that is causing confusion. I will clarify.
The author asked that the reader attempt to find the flaw by inspecting the provided code. The flaw was obvious enough that I was able to find it. The implication is that if it were less obvious, I might not have. I was not attempting to take any credit at all: exactly the opposite.
I mean, once you pinpoint 5 lines of code and say that there is an error, it is not particularly hard to find it.
Now give the whole repo before the fix to an LLM and ask it to find this bug with some context the author started with. I'm sure you can fix every software issue that way as well!
I missed it, but I was distracted by cols variable being initialised with the original width, but then being immediately overwritten with the logical width.
The blog post suggests that, probably because it’s never been used, it’s just too noisy to turn on.
Arguably every unused variable in the code base is a potential bug like this waiting to chomp on some poor user’s data though, so arguably they should be turning it on & dealing with the consequences?
I'm not sure why C is blamed in this case when you can do exactly the same strong typing fix in C, and with C99 struct literals it's also not much worse to work with:
...and in reverse, Rust wouldn't protect you from that exact same bug if the programmer decided to use usize like in the original C code.
IME overly strong typing is a double-edged sword though, on one hand it makes the code more robust, but on the other hand also more 'change resistant'.
I still would like to see a strong typedef variant in C so that the struct wrapping wouldn't be needed, e.g.:
Most projects I've worked on treat warnings as bugs. It's annoying, sometimes, when you have to fiddle with the code and add lines to prevent bogus warnings from breaking the build, particularly when you're making a temporary change for debugging purposes, and I remember a couple of occasions when we had to insert a comment something like "This was needed to prevent a warning caused by a bug in GCC X.Y.Z (link to compiler bug on issue tracker)". But, on balance, it's worth it.
(yes this is very much an armchair opinion, I mostly do front-end JS for a living)
A single call where the psize is different than the asize would have caught it.
At the same time, the tooling has gotten much better in the last years.
Clang-analyzer is fast enough to run as part of the CI. Newer gcc also give quite a few more warnings for unused results.
My recommendation to the project is to
- Remove all compiler warnings and enable warning-as-error
- Increase the coverage of unit tests to >80%
That is a lot of work. But that's what is required for high-criticality systems engineering.
Side note: OpenZFS already has an extensive test suite. Merely hitting a code branch wouldn't have caught this one.
If you want free software I recommend IKOS - a is a sound static analyzer for C/C++ developed at NASA. Checks: https://github.com/NASA-SW-VnV/ikos/blob/master/analyzer/REA... Numerical abstract domains: https://github.com/NASA-SW-VnV/ikos/blob/master/analyzer/REA...
Commercial tool like Astree https://www.absint.com/astree/index.htm if you have money.
It's the kind of bug that makes you stop breathing for a brief moment. So I ran this function through Gemini 2.5 Pro, ChatGPT o3 and Grok 3. No context other than the source code fragment was provided. All three also clearly identified the problem.
There is no longer a reason for trivial flaws such as this surviving to release code. We're past that now, and that's an astonishing thing.
These are the questions I ponder: Should we consider the possibility that the ongoing, incomplete and decades long pursuit of "better" languages is now misguided? Or, that perhaps "better" might now mean whatever properties make code easier for LLMs to analyze and validate against specifications, as opposed to merely preventing the flaws humans are prone to make?
So double down on languages that are more represented in training data? I think it's still worthwhile to actually improve programming languages. Relying only on LLMs is fragile. So ideally do both: improve language and AI.
The pragmatic answer to that is: this appears to be highly effective.
What I have in mind is something else, however. Consider the safety nets we try to build with, for example, elaborate type systems. These new analysis tools can operate on enormous contexts and infer and enforce "type" with a capacity far greater what a human mind can hope to approach. So perhaps there is little value to complex types. Instead, a simple type system supported by specification will be the superior approach. seL4 is an example of the concept: C, but exhaustively specified and verified.
They produce so much BS, but at least now we can catch most of it. If we the languages are suddenly optimised for the AIs we couldn't do that anymore.
Excellent question. What would it end up looking like? Would it be full of monads and HKTs? Lambda calculus? WebAssembly?
I hope to learn the answer one day.
This is unrelated to "no longer due to ChatGPT times".
We've been able to detect this issue for decades already, see -Wunused-variable.
Not totally clear what you mean here - are you saying you’re the author of the article or PR, or that you independently discovered the same issue (after the fact)?
The author asked that the reader attempt to find the flaw by inspecting the provided code. The flaw was obvious enough that I was able to find it. The implication is that if it were less obvious, I might not have. I was not attempting to take any credit at all: exactly the opposite.
Now give the whole repo before the fix to an LLM and ask it to find this bug with some context the author started with. I'm sure you can fix every software issue that way as well!
Dead Comment
I would have expected that the compiler should complain that psize is computed but unused.
Why isn't -Wunused-variable enabled in OpenZFS?
Arguably every unused variable in the code base is a potential bug like this waiting to chomp on some poor user’s data though, so arguably they should be turning it on & dealing with the consequences?
Most definitely, yes.
IME overly strong typing is a double-edged sword though, on one hand it makes the code more robust, but on the other hand also more 'change resistant'.
I still would like to see a strong typedef variant in C so that the struct wrapping wouldn't be needed, e.g.:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3320.htm
It seems you are dooming your project the minute you start ignoring your first warning.