The unreasonable success of Fuzzing

Full Report

Fuzzing is a technique that many of us know and love. But why is it so effective? This talk aims to go through the origins of fuzzing and why it works as well as it does. The origins stem back to software being bad in the 90s and early 2000s. For a while people felt that "you fuzz if you're too stupid to audit code". Over time, this perception changed. At this point, you could send random data to most programs and get a crash from it. This included a remote OpenSSH bug, RealServer (music streaming) RCE, Cisco IKE and an Acrobat font bug. After the introduction of fuzzing and its effectiveness, the author gives us reasons why its so good. First, it's crazily efficient. It parallels well, it's only limited by computing power and gives us very false positives. They do mention that it's worth "being clever" to make it faster, which can make a big difference in some situations. Next, it scales with the complexity of the project. It finds weird states that a human doesn't have time to think about. This seems to be a theme though - the next one is that fuzzers are generally simple designs compared to fully understanding a project. Sending random data and setting this up is much simpler than static analyzers, solvers and pure code review. The final section discusses the similarities between AI and fuzzing. They base this around the bitter lesson that computation search is much better than human intuition. The article linked above discusses Chess and Go AI history and ends with AI and computer vision. I personally fall into the trap that my personal knowledge is going to be better than a computer doing something but that's almost always wrong. Combining the humans ability to optimize and make the computers faster is what we should focus on. In the case of fuzzing, they see it as the same. Fuzzing requires lots of computing power with the smarts of the person who set up the power helping with the efficiency of it. The success of fuzzing depends on large tree searches. They go through the issues with code coverage as the main metric for fuzzers being limited by an implicit state machine. How can this be improved? Should the state machine be modeled? The end asks a question whether the future will be more clever fuzzing or more systems engineering to make the fuzzer run more times. I think it's a combination of both but interesting parallels that to an industry that I had not considered very much in security.

Analysis Summary