Of Bugs and Baselines
Summary : recently published results on the LAVA-M synthetic bug dataset are exciting. However, I show that much simpler techniques can also do startlingly well on this dataset; we need to be cautious in our evaluations and not rely too much on getting a high score on a single benchmark. A New Record The LAVA synthetic bug corpora have been available now for about a year and a half. I've been really excited to see new bug-finding approaches (particularly fuzzers) use the LAVA-M dataset as a benchmark, and to watch as performance on that dataset steadily improved. Here's how things have progressed over time. Performance on the LAVA-M dataset over time. Note that because the different utilities have differing numbers of bugs, this picture presents a slightly skewed view of how successful each approach was by normalizing the performance on each utility. Also, SBF was only evaluated on base64 (where it did very well), and Vuzzer's performance on md5sum is due to lar