Posts

Showing posts with the label dynamic analysis

The Mechanics of Bug Injection with LAVA

Image
This is the second in a series of posts about evaluating and improving bug detection software by automatically injecting bugs into programs. Part one, which discussed the setting and motivation, is available here . Now that we understand why we might want to automatically add bugs to programs, let's look at how we can actually do it. We'll first investigate an existing approach (mutation testing), show why it doesn't work very well in our scenario, and then develop a more sophisticated injection technique that tells us exactly how to modify the program to insert bugs that meet the goals we laid out in the introductory post. A Mutant Strawman that Doesn't Work One way of approaching the problem of bug injection is to just pick parts of the program that we think are currently correct and then mutate them somehow. This, essentially, is the idea behind mutation testing : you use some predefined mutation operators  that mangle the program somehow and then declare tha...

How to add a million bugs to a program (and why you might want to)

Image
This is the first in a series of posts about evaluating and improving bug detection software by automatically injecting bugs into programs. You can find part two, with technical details of our bug injection technique, here . In this series of posts, I'm going to describe how to automatically put bugs in programs, a topic on which we just published a paper at Oakland, one of the top academic security conferences. The system we developed, LAVA , can put millions of bugs into real-world programs. Why would anyone want to do this? Are my coauthors and I sociopaths who just want to watch the world burn? No, but to see why we need such a system requires a little bit of background, which is what I hope to provide in this first post. I am sure this will come as a shock to most, but programs written by humans have bugs . Finding and fixing them is immensely time consuming; just how much of a developer's time is spent debugging is hard to pin down, but estimates range between 40% ...

100 Days of Malware

Image
It's now been a little over 100 days since I started running malware samples in PANDA  and making the executions publicly available. In that time, we've analyzed 10,794 pieces of malware, which generated: 10,794 record/replay logs , representing 226,163,195,948,195 instructions executed 10,794 packet captures , totaling 26GB of data and 33,968,944 packets 10,794 movies , which are interesting enough that I'll give them their own section 10,794 VirusTotal reports , indicating what level of detection they had when they were run by malrec 107  torrents , containing downloads of the above I've been pleased by the interest malrec has generated. We've had visitors from over 6000 unique IPs, in 89 different countries: The Movies There's a lot of great stuff in these ~10K movies. An easy way to get an idea of what's in there is to sort by filesize; because of the way MP4 encoding works, larger files in general mean that there's more going on o...

Reproducible Malware Analyses for All

Summary : With help from GTISC , I have begun running 100 malware samples per day and posting the PANDA record & replay logs online at http://panda.gtisc.gatech.edu/malrec/ . The goal is to lower the barriers to entry for doing dynamic malware research, and to make such research reproducible . Today, I spoke at the ACSAC Malware Memory Forensics workshop in New Orleans about a problem that I think has been largely ignored in existing dynamic malware analysis research: reproducibility . To make results reproducible, a computer science researcher typically needs to do three things: Carefully and precisely describe their methods. Release the code they wrote for their system or analysis. Release the data the analysis was performed on. Of course, even research published at top conferences may fail at some of these criteria; a recent study by Collberg et al. attempted to obtain the code associated with 613 recent papers from ACM conferences, and were able to obtain, build and...

Replaying Regin in PANDA

Regin, a piece of state-sponsored malware that may have been used to attack telecoms and cryptographers, has recently come to light. There are several good writeups out there, and I encourage you to check them out. Getting access to samples in cases like this is often a challenge. Luckily, both The Intercept and VXShare  ( warning : both links contain live malware) have released samples thought to be associated with Regin, so that others can perform independent analysis. So far, it appears that the samples are all of the "stage1" component of the malware, rather than the initial "stage0" infector or the later stages. In order to allow others to do dynamic analysis of this malware, I built a very small malware sandbox setup using PANDA. The sandbox essentially just executes a sample for five minutes, recording it using PANDA's record and replay facility. The process is slightly complicated by the fact that most of the stage1 samples are kernel-mode compo...

Breaking Spotify DRM with PANDA

Image
Disclaimer : Although I think DRM is both stupid and evil, I don't advocate pirating music. Therefore, this post will stop short of providing a turnkey solution for ripping Spotify music, but it will fully describe the theory behind the technique and its implementation in PANDA. Don't be evil. Update 6/6/2014: The following post assumes you know what PANDA is (a platform for dynamic analysis based on QEMU). If you want to know more, check out my introductory post on PANDA . This past weekend I spoke at REcon, a conference on reverse engineering held every year in Montreal. I had a fantastic time there getting to meet other people interested in problems of memory analysis, reverse engineering, and dynamic analysis. One of the topics of my REcon talk was how to use PANDA to break Spotify DRM, and since the video from the talk won't be posted for a while, I thought I'd write up a post showing how we can use PANDA and statistics to pull out unencrypted OGGs from Spotif...

Announcing PANDA: A Platform for Architecture-Neutral Dynamic Analysis

I'm pleased to announce the initial release of a new open source dynamic analysis platform built on QEMU, named PANDA (Platform for Architecture-Neutral Dynamic Analysis) . It has a number of features that combine to make it a uniquely powerful platform for analyzing software as it executes: Record and Replay : PANDA is capable of recording the non-deterministic inputs during a whole-system execution and later deterministically replaying them. This means that heavyweight analyses that would be too slow to run on a live execution can be decoupled to run on the replayed execution instead. We recently used this in our 2013 ACM CCS paper to monitor every memory access made by an OS and applications, which would not have been feasible without record and replay. Record and replay is currently supported for i386, x86_64, and ARM, with more architectures planned. For more details see the record and replay documentation . Android Support : Thanks to excellent work by Josh Hodosh, PAND...