I wonder why fuzzing is not more popular, even in software.
Almost all people talking about software design put a lot of emphasis on testing. I think a bit too much sometimes, but that's another subject. Where I work, we have budgets dedicated to testing, specialized teams, most customers want some kind of test-related document but I don't remember hearing about fuzzing even once.
For some reason, fuzzing is tied to security, and I don't work on security-critical projects, so I guess that's why no one talks about it. But it doesn't have to be. All projects need some kind of robustness. Even when it is largely inconsequential, like in video games, players get annoyed when their game crash.
But it is not even the best part. The best part is that when you are fuzzing, you don't even have to write the tests! The fuzzer uses its engine to get to the paths you didn't expect. The problem when writing tests, besides me not enjoying it is that you only test what you think about testing, and what you think about you probably also thought about when writing the code, and it is most likely the part you got right (flip that around for TDD, but the problem is the same). That's why ideally, you shouldn't write your own tests, but it is not always an option, and fuzzers do that for you.
A short fuzzing session could be a standard part of CI/CD frameworks, like all tools that go there: linters, tests, coverage, etc...
The fuzzer I use (AFL++), while doing a good job, is, I think, a little cumbersome for anything but parsing files. This, I think, could be greatly improved. And they tend to use rather primitive genetic algorithms. Newest advances in machine learning could certainly help here.
It is popular in domains where it’s effective but it’s not as useful when it’s hard to know if an output is correct. I know several tools which fuzz mobile app UIs to see if you can cause crashes or irrecoverable states with random inputs because those are easy to detect, but beyond that you start needing to have more traditional QA approaches to say whether the resulting state is correct.
One area which is very interesting is the use of OpenAPI schemas to help with APIs since you can use the schemas to guide generation and validation. It’s non-trivial to do with authentication but I found this project of interest:
Testing tests if it responds with the correct output for a known input. Fuzzing can't replace testing because a. you still want to test that it works correctly given known inputs b. not all functions need to accept badly formatted input
Paradigm shift: fuzzing is testing if you write fuzzers that test properties. For instance, if you have a class that can be parsed and serialized, you may write something like this in a fuzzer:
if (serialize(parse(data, size)) != data) {
CRASH();
}
More directly addressing your point about not all functions accepting malformed input, you can also use a fuzzer to exercise sequences of methods on an object. If your class has any invariants, you can test them along the way.
If it is a common task like decoding base64 then you can compare the results of your function with the results of a well established existing implementation and abort when they are not equal.
I don’t know anything about fuzzing (my code promises not to crash for valid inputs… and defines valid inputs as those which don’t cause it to crash, hah!)
I wonder, though, if you are producing a library, you probably expect the user to only provide inputs within some ranges. Is there a nice way for these fuzzing environments to talk back and forth—here’s my calling code, here fuzzer figure out what ranges it can produce and only fuzz the library for those ranges?
There's also something called structure aware fuzzing. Say you're fuzzing a function that takes a JSON string as an input. When fuzzers do traditional mutations like bit flips, or re-inserting parts of the input at random points, you'll get coverage but it might not be deep because the function will fail at the JSON parsing stage instead of going into the actual logic a lot of the time. If they use a well tested JSON parsing library, you probably don't want to spend time fuzzing that because oss-fuzz is already doing it with supercomputers. The solution is that fuzzers such as libfuzzer can generate random data that conforms to a protobuf type you give it, and from there you can use that to properly serialize a JSON message and then pass that to the function.
I don't know of any fuzzers written to handle the case you are describing. You would have to put the ranges yourself in the fuzzing handler. If you had a program that say took integers and you wanted to find out within which bounds does it not crash you could probably write a program using binary search pretty easily.
Note: I learned most of this about a week and a half ago, may be subtly wrong.
It depends on the objective. You can test stuff like is there any set of navigations that results in an exception in the backend.
This is what Sapienz from facebook did, it fuzzed their mobile applications to find paths that caused errors and it would try to find the shortest sequence of actions that reproduces the issue.
I don't understand why fuzzing hardware is being presented as a new thing here...
In silicon validation, constrained random testing has been the standard methodology for at least 10 years. With the complexity of modern CPUs, it's effectively impossible to validate the hardware _without_ using some kind of randomized testing, which looks a whole lot like fuzzing to me.
What is new here? Or is this a case of someone outside the industry rediscovering known techniques.
The claim here is that "existing hardware fuzzers are inefficient in verifying processors because they make many static decisions disregarding the design complexity and the design space explored".
"To address this limitation of static strategies in
fuzzers, we develop an approach to equip any hardware fuzzer
with a dynamic decision-making technique, multi-armed bandit
(MAB) algorithms." (From their paper: https://arxiv.org/pdf/2311.14594.pdf)
They're saying their fuzzer is faster and better at finding bugs than other fuzzing aproaches.
Fuzzing and constrained random, while both based on randomisation, are not the same thing.
A big problem of fuzzers from the point-of-view of hardware validation is that it's unclear what coverage guarantees they give. Would you tape out your processor design, once your fuzzer no longer finds bugs, if you had no idea about test coverage? OTOH, fuzzing has been very effective in software, so it is natural to ask if those gains can also be had for hardware.
The grandparent post is incorrect. H/w silicon validation or constrained random testing hasn't been the norm for 10 years, it's at least 20, which us when I first got into that industry.
And yes, we had coverage driven verification back in 2005 as well. No, we didn't "tape out" our CPUs until we'd hit our testing plan which was defined by coverage metrics.
P.s. pre-si verification had testing pipelines way back then before they became the norm in s/w.
>When people design hardware, they do not think about security up front
This is a pretty big claim that is easily disproven. The way new features are secure will be a part of the design. The overal security model is figured out ahead of time. There are plenty of existing work to reference.
Almost all people talking about software design put a lot of emphasis on testing. I think a bit too much sometimes, but that's another subject. Where I work, we have budgets dedicated to testing, specialized teams, most customers want some kind of test-related document but I don't remember hearing about fuzzing even once.
For some reason, fuzzing is tied to security, and I don't work on security-critical projects, so I guess that's why no one talks about it. But it doesn't have to be. All projects need some kind of robustness. Even when it is largely inconsequential, like in video games, players get annoyed when their game crash.
But it is not even the best part. The best part is that when you are fuzzing, you don't even have to write the tests! The fuzzer uses its engine to get to the paths you didn't expect. The problem when writing tests, besides me not enjoying it is that you only test what you think about testing, and what you think about you probably also thought about when writing the code, and it is most likely the part you got right (flip that around for TDD, but the problem is the same). That's why ideally, you shouldn't write your own tests, but it is not always an option, and fuzzers do that for you.
A short fuzzing session could be a standard part of CI/CD frameworks, like all tools that go there: linters, tests, coverage, etc...
The fuzzer I use (AFL++), while doing a good job, is, I think, a little cumbersome for anything but parsing files. This, I think, could be greatly improved. And they tend to use rather primitive genetic algorithms. Newest advances in machine learning could certainly help here.
One area which is very interesting is the use of OpenAPI schemas to help with APIs since you can use the schemas to guide generation and validation. It’s non-trivial to do with authentication but I found this project of interest:
https://github.com/microsoft/restler-fuzzer
I wonder, though, if you are producing a library, you probably expect the user to only provide inputs within some ranges. Is there a nice way for these fuzzing environments to talk back and forth—here’s my calling code, here fuzzer figure out what ranges it can produce and only fuzz the library for those ranges?
I don't know of any fuzzers written to handle the case you are describing. You would have to put the ranges yourself in the fuzzing handler. If you had a program that say took integers and you wanted to find out within which bounds does it not crash you could probably write a program using binary search pretty easily.
Note: I learned most of this about a week and a half ago, may be subtly wrong.
This is what Sapienz from facebook did, it fuzzed their mobile applications to find paths that caused errors and it would try to find the shortest sequence of actions that reproduces the issue.
https://engineering.fb.com/2018/05/02/developer-tools/sapien...
http://www0.cs.ucl.ac.uk/staff/k.mao/archive/p_issta16_sapie...
"To address this limitation of static strategies in fuzzers, we develop an approach to equip any hardware fuzzer with a dynamic decision-making technique, multi-armed bandit (MAB) algorithms." (From their paper: https://arxiv.org/pdf/2311.14594.pdf)
They're saying their fuzzer is faster and better at finding bugs than other fuzzing aproaches.
A big problem of fuzzers from the point-of-view of hardware validation is that it's unclear what coverage guarantees they give. Would you tape out your processor design, once your fuzzer no longer finds bugs, if you had no idea about test coverage? OTOH, fuzzing has been very effective in software, so it is natural to ask if those gains can also be had for hardware.
And yes, we had coverage driven verification back in 2005 as well. No, we didn't "tape out" our CPUs until we'd hit our testing plan which was defined by coverage metrics.
P.s. pre-si verification had testing pipelines way back then before they became the norm in s/w.
This is a pretty big claim that is easily disproven. The way new features are secure will be a part of the design. The overal security model is figured out ahead of time. There are plenty of existing work to reference.