The LAVA Synthetic Bug Corpora

October 08, 2016

I'm planning a longer post discussing how we evaluated the LAVA bug injection system, but since we've gotten approval to release the test corpora I wanted to make them available right away.

The corpora described in the paper, LAVA-1 and LAVA-M, can be downloaded here:

http://panda.moyix.net/~moyix/lava_corpus.tar.xz (101M)

Quoting from the included README:

This distribution contains the automatically generated bug corpora used in the paper, "LAVA: Large-scale Automated Vulnerability Addition".

LAVA-1 is a corpus consisting of 69 versions of the "file" utility, each of which has had a single bug injected into it. Each bug is a named branch in a git repository. The triggering input can be found in the file named CRASH_INPUT. To run the validation, you can use validate.sh, which builds each buggy version of file and evaluates it on the corresponding triggering input.

LAVA-M is a corpus consisting of four GNU coreutils programs (base64, md5sum, uniq, and who), each of which has had a large number of bugs added. Each injected, validated bug is listed in the validated_bugs file, and the corresponding triggering inputs can be found in the inputs subdirectory. To run the validation, you can use the validate.sh script, which builds the buggy utility and evaluates it on triggering and non-triggering inputs.

For both corpora, the "backtraces" subdirectory contains the output of gdb's backtrace command for each bug.

Enjoy!

Comments

Unknown said…

Hey, I download the LAVA corpora, and I run the script validate.sh and get the result from the ubuntu terminal as following:
----------------------
Building buggy base64...
Checking if buggy base64 succeeds on non-trigger input...
Success: base64 -d inputs/utmp.b64 returned 127
Validating bugs...
Validated 0 / 44 bugs
You can see validated.txt for the exit code of each buggy version.
--------------------------
which means I don't succeed injecting bugs. One of the codes in validate.sh is "./configure --prefix=`pwd`/lava-install LIBS="-lacl" &> /dev/null", but I cannot find the directory "lava-install".
So how can I solve the problem? Thanks very much.

October 5, 2017 at 11:04 PM

Brendan Dolan-Gavitt said…

Hi,

127 is the error code bash returns when the program can't be found. So it sounds like some part of the compilation process is failing and none of the coreutils programs have actually been built. I'd recommend running the compile step by hand to see what's going wrong, and then fixing that.

October 5, 2017 at 11:55 PM

Unknown said…

Mr Dolan-Gavitt,thank you very much. I change the script validate.sh and remove the "&>/dev/null", and I build the program successfully.
The changed script is as following:
--------------------------------------------------------
#!/bin/bash
PROG="base64"
PROGOPT="-d"
INPUT_PATTERN="inputs/utmp-fuzzed-%s.b64"
INPUT_CLEAN="inputs/utmp.b64"
echo "Building buggy ${PROG}..."
cd coreutils-8.24-lava-safe
make clean
./configure --prefix=/home/wendy/lava_corpus/LAVA-M/base64/coreutils-8.24-lava-safe/lava-install LIBS="-lacl"
make
make install
cd ..
./coreutils-8.24-lava-safe/lava-install/bin/${PROG} ${PROGOPT} ${INPUT_CLEAN}
rv=$?
if [ $rv -lt 128 ]; then
echo "Success: ${PROG} ${PROGOPT} ${INPUT_CLEAN} returned $rv"
else
echo "ERROR: ${PROG} ${PROGOPT} ${INPUT_CLEAN} returned $rv"
fi
echo "Validating bugs..."
cat validated_bugs | while read line ; do
INPUT_FUZZ=$(printf "$INPUT_PATTERN" $line)
{ ./coreutils-8.24-lava-safe/lava-install/bin/${PROG} ${PROGOPT} ${INPUT_FUZZ} ; }
echo $line $?
done > validated2.txt
awk 'BEGIN {valid = 0} $2 > 128 { valid += 1 } END { print "Validated valid=",valid, "/
", NR, "bugs" }' validated2.txt
echo "You can see validated2.txt for the exit code of each buggy version."
--------------------------------------------------------------------------------

October 6, 2017 at 3:18 AM

Lingyun Situ's Blog said…

Hi,

There's a CRASH_INPUT in the the command in validate.sh

{ ${d}/lava-install/bin/file ${d}/CRASH_INPUT ; } &> /dev/null

But, I do not find the CRASH_INPUT dir.

Thanks

March 13, 2018 at 6:02 PM

Unknown said…

Hi,

I faced exactly the same terminal output as 0/44 bugs validated for base64 in lava-M corpus.

I tried with the new posted script without hte "&>/dev/null", i actually copied the posted script in a new file. However it gave an error that says:

configure: error: in `/home/mark_arsanious/lava_corpus/LAVA-M/base64/coreutils-8.24-lava-safe':
configure: error: C compiler cannot create executables
See `config.log' for more details

and a fatal error:

./lib/acl-internal.h:27:11: fatal error: 'sys/acl.h' file not found
# include
^

Any clue?

May 25, 2018 at 11:14 AM

pepper said…

I have trouble confirming all the vulnerable input vectors. What is the toolchain and target architecture that this is supposed to work with?

A recent Ubuntu will be unable to compile due to gnulib changes...porting the source is easy but the validate scripts will only confirm a small fraction of bugs. Its better with Ubuntu 16.04 but still, several hundret bugs could not be confirmed by the validation script. Crosscompiling for i386 also didn't seem to help..

September 15, 2019 at 1:38 PM

Kristian said…

I get following error when trying to compile base64:

lib/freadseek.c: In function 'freadptrinc':
lib/freadseek.c:68:3: error: #error "Please port gnulib freadseek.c to your platform! Look at the definition of getc, getc_unlocked on your system, then report this to bug-gnulib."
68 | #error "Please port gnulib freadseek.c to your platform! Look at the definition of getc, getc_unlocked on your system, then report this to bug-gnulib."

October 31, 2019 at 9:21 AM

Push the Red Button

The LAVA Synthetic Bug Corpora

Comments

Popular posts from this blog

Someone’s Been Messing With My Subnormals!

Decrypting LSA Secrets

SysKey and the SAM