Machine learning helps fuzzing find hardware bugs

I wonder why fuzzing is not more popular, even in software.

Almost all people talking about software design put a lot of emphasis on testing. I think a bit too much sometimes, but that's another subject. Where I work, we have budgets dedicated to testing, specialized teams, most customers want some kind of test-related document but I don't remember hearing about fuzzing even once.

For some reason, fuzzing is tied to security, and I don't work on security-critical projects, so I guess that's why no one talks about it. But it doesn't have to be. All projects need some kind of robustness. Even when it is largely inconsequential, like in video games, players get annoyed when their game crash.

But it is not even the best part. The best part is that when you are fuzzing, you don't even have to write the tests! The fuzzer uses its engine to get to the paths you didn't expect. The problem when writing tests, besides me not enjoying it is that you only test what you think about testing, and what you think about you probably also thought about when writing the code, and it is most likely the part you got right (flip that around for TDD, but the problem is the same). That's why ideally, you shouldn't write your own tests, but it is not always an option, and fuzzers do that for you.

A short fuzzing session could be a standard part of CI/CD frameworks, like all tools that go there: linters, tests, coverage, etc...

The fuzzer I use (AFL++), while doing a good job, is, I think, a little cumbersome for anything but parsing files. This, I think, could be greatly improved. And they tend to use rather primitive genetic algorithms. Newest advances in machine learning could certainly help here.

acdha · 2 years ago

It is popular in domains where it’s effective but it’s not as useful when it’s hard to know if an output is correct. I know several tools which fuzz mobile app UIs to see if you can cause crashes or irrecoverable states with random inputs because those are easy to detect, but beyond that you start needing to have more traditional QA approaches to say whether the resulting state is correct.

One area which is very interesting is the use of OpenAPI schemas to help with APIs since you can use the schemas to guide generation and validation. It’s non-trivial to do with authentication but I found this project of interest:

https://github.com/microsoft/restler-fuzzer

KTibow · 2 years ago

Testing tests if it responds with the correct output for a known input. Fuzzing can't replace testing because a. you still want to test that it works correctly given known inputs b. not all functions need to accept badly formatted input

bqmjjx0kac · 2 years ago

Paradigm shift: fuzzing is testing if you write fuzzers that test properties. For instance, if you have a class that can be parsed and serialized, you may write something like this in a fuzzer:

    if (serialize(parse(data, size)) != data) {
        CRASH();
    }

More directly addressing your point about not all functions accepting malformed input, you can also use a fuzzer to exercise sequences of methods on an object. If your class has any invariants, you can test them along the way.

costco · 2 years ago

If it is a common task like decoding base64 then you can compare the results of your function with the results of a well established existing implementation and abort when they are not equal.

bee_rider · 2 years ago

I don’t know anything about fuzzing (my code promises not to crash for valid inputs… and defines valid inputs as those which don’t cause it to crash, hah!)

I wonder, though, if you are producing a library, you probably expect the user to only provide inputs within some ranges. Is there a nice way for these fuzzing environments to talk back and forth—here’s my calling code, here fuzzer figure out what ranges it can produce and only fuzz the library for those ranges?

costco · 2 years ago

In libfuzzer you can return -1 to exclude an input from the corpus.

  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size > 256) {
      return -1;
    }
    handle(data, size);
    return 0;
  }

  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 5 || !(data[0] == 'M' && data[1] == 'A' && data[2] == 'G' && data[3] == 'I' && data[4] == 'C')) {
      return -1;
    }
    handle(data, size);
    return 0;
  }

There's also something called structure aware fuzzing. Say you're fuzzing a function that takes a JSON string as an input. When fuzzers do traditional mutations like bit flips, or re-inserting parts of the input at random points, you'll get coverage but it might not be deep because the function will fail at the JSON parsing stage instead of going into the actual logic a lot of the time. If they use a well tested JSON parsing library, you probably don't want to spend time fuzzing that because oss-fuzz is already doing it with supercomputers. The solution is that fuzzers such as libfuzzer can generate random data that conforms to a protobuf type you give it, and from there you can use that to properly serialize a JSON message and then pass that to the function.

I don't know of any fuzzers written to handle the case you are describing. You would have to put the ranges yourself in the fuzzing handler. If you had a program that say took integers and you wanted to find out within which bounds does it not crash you could probably write a program using binary search pretty easily.

Note: I learned most of this about a week and a half ago, may be subtly wrong.

pfdietz · 2 years ago

I use random test generation constantly for compiler testing.

dzhiurgis · 2 years ago

How would you fuzz an app in LAMP stack or some enterprise CRM?

quadrature · 2 years ago

It depends on the objective. You can test stuff like is there any set of navigations that results in an exception in the backend.

This is what Sapienz from facebook did, it fuzzed their mobile applications to find paths that caused errors and it would try to find the shortest sequence of actions that reproduces the issue.

https://engineering.fb.com/2018/05/02/developer-tools/sapien...

http://www0.cs.ucl.ac.uk/staff/k.mao/archive/p_issta16_sapie...