Posts

Showing posts with the label corpus

Of Bugs and Baselines

Image
Summary : recently published results on the LAVA-M synthetic bug dataset are exciting. However, I show that much simpler techniques can also do startlingly well on this dataset; we need to be cautious in our evaluations and not rely too much on getting a high score on a single benchmark. A New Record The LAVA synthetic bug corpora have been available now for about a year and a half. I've been really excited to see new bug-finding approaches (particularly fuzzers) use the LAVA-M dataset as a benchmark, and to watch as performance on that dataset steadily improved. Here's how things have progressed over time. Performance on the LAVA-M dataset over time. Note that because the different utilities have differing numbers of bugs, this picture presents a slightly skewed view of how successful each approach was by normalizing the performance on each utility. Also, SBF was only evaluated on base64 (where it did very well), and Vuzzer's performance on md5sum is due to lar...

The LAVA Synthetic Bug Corpora

I'm planning a longer post discussing how we evaluated the LAVA bug injection system, but since we've gotten approval to release the test corpora I wanted to make them available right away. The corpora described in the paper, LAVA-1 and LAVA-M, can be downloaded here: http://panda.moyix.net/~moyix/lava_corpus.tar.xz  (101M) Quoting from the included README: This distribution contains the automatically generated bug corpora used  in the paper, "LAVA: Large-scale Automated Vulnerability Addition". LAVA-1 is a corpus consisting of 69 versions of the "file" utility, each  of which has had a single bug injected into it. Each bug is a named branch in a git repository. The triggering input can be found in the file named CRASH_INPUT. To run the validation, you can use validate.sh, which builds each buggy version of file and evaluates it on the corresponding triggering input. LAVA-M is a corpus consisting of four GNU coreutils programs (base64,  md5sum,...