The LAVA Synthetic Bug Corpora

I'm planning a longer post discussing how we evaluated the LAVA bug injection system, but since we've gotten approval to release the test corpora I wanted to make them available right away.

The corpora described in the paper, LAVA-1 and LAVA-M, can be downloaded here: (101M)

Quoting from the included README:

This distribution contains the automatically generated bug corpora used in the paper, "LAVA: Large-scale Automated Vulnerability Addition".

LAVA-1 is a corpus consisting of 69 versions of the "file" utility, each of which has had a single bug injected into it. Each bug is a named branch in a git repository. The triggering input can be found in the file named CRASH_INPUT. To run the validation, you can use, which builds each buggy version of file and evaluates it on the corresponding triggering input.

LAVA-M is a corpus consisting of four GNU coreutils programs (base64, md5sum, uniq, and who), each of which has had a large number of bugs added. Each injected, validated bug is listed in the validated_bugs file, and the corresponding triggering inputs can be found in the inputs subdirectory. To run the validation, you can use the script, which builds the buggy utility and evaluates it on triggering and non-triggering inputs.

For both corpora, the "backtraces" subdirectory contains the output of gdb's backtrace command for each bug.



Unknown said…
Hey, I download the LAVA corpora, and I run the script and get the result from the ubuntu terminal as following:
Building buggy base64...
Checking if buggy base64 succeeds on non-trigger input...
Success: base64 -d inputs/utmp.b64 returned 127
Validating bugs...
Validated 0 / 44 bugs
You can see validated.txt for the exit code of each buggy version.
which means I don't succeed injecting bugs. One of the codes in is "./configure --prefix=`pwd`/lava-install LIBS="-lacl" &> /dev/null", but I cannot find the directory "lava-install".
So how can I solve the problem? Thanks very much.

127 is the error code bash returns when the program can't be found. So it sounds like some part of the compilation process is failing and none of the coreutils programs have actually been built. I'd recommend running the compile step by hand to see what's going wrong, and then fixing that.
Unknown said…
Mr Dolan-Gavitt,thank you very much. I change the script and remove the "&>/dev/null", and I build the program successfully.
The changed script is as following:
echo "Building buggy ${PROG}..."
cd coreutils-8.24-lava-safe
make clean
./configure --prefix=/home/wendy/lava_corpus/LAVA-M/base64/coreutils-8.24-lava-safe/lava-install LIBS="-lacl"
make install
cd ..
./coreutils-8.24-lava-safe/lava-install/bin/${PROG} ${PROGOPT} ${INPUT_CLEAN}
if [ $rv -lt 128 ]; then
echo "Success: ${PROG} ${PROGOPT} ${INPUT_CLEAN} returned $rv"
echo "ERROR: ${PROG} ${PROGOPT} ${INPUT_CLEAN} returned $rv"
echo "Validating bugs..."
cat validated_bugs | while read line ; do
INPUT_FUZZ=$(printf "$INPUT_PATTERN" $line)
{ ./coreutils-8.24-lava-safe/lava-install/bin/${PROG} ${PROGOPT} ${INPUT_FUZZ} ; }
echo $line $?
done > validated2.txt
awk 'BEGIN {valid = 0} $2 > 128 { valid += 1 } END { print "Validated valid=",valid, "/
", NR, "bugs" }' validated2.txt
echo "You can see validated2.txt for the exit code of each buggy version."

There's a CRASH_INPUT in the the command in

{ ${d}/lava-install/bin/file ${d}/CRASH_INPUT ; } &> /dev/null

But, I do not find the CRASH_INPUT dir.


Unknown said…

I faced exactly the same terminal output as 0/44 bugs validated for base64 in lava-M corpus.

I tried with the new posted script without hte "&>/dev/null", i actually copied the posted script in a new file. However it gave an error that says:

configure: error: in `/home/mark_arsanious/lava_corpus/LAVA-M/base64/coreutils-8.24-lava-safe':
configure: error: C compiler cannot create executables
See `config.log' for more details

and a fatal error:

./lib/acl-internal.h:27:11: fatal error: 'sys/acl.h' file not found
# include

Any clue?

pepper said…
I have trouble confirming all the vulnerable input vectors. What is the toolchain and target architecture that this is supposed to work with?

A recent Ubuntu will be unable to compile due to gnulib changes...porting the source is easy but the validate scripts will only confirm a small fraction of bugs. Its better with Ubuntu 16.04 but still, several hundret bugs could not be confirmed by the validation script. Crosscompiling for i386 also didn't seem to help..
Kristian said…
I get following error when trying to compile base64:

lib/freadseek.c: In function 'freadptrinc':
lib/freadseek.c:68:3: error: #error "Please port gnulib freadseek.c to your platform! Look at the definition of getc, getc_unlocked on your system, then report this to bug-gnulib."
68 | #error "Please port gnulib freadseek.c to your platform! Look at the definition of getc, getc_unlocked on your system, then report this to bug-gnulib."

Popular posts from this blog

Someone’s Been Messing With My Subnormals!

Decrypting LSA Secrets

SysKey and the SAM