Tuesday, January 28, 2014

PANDA, Reproducibility, and Open Science

tl;dr: PANDA now supports detached replays (you don't need the underlying VM image to run a replay), and they can be shared at a new site called PANDA Share. Hooray for reproducibility!

One of the most inspiring developments of the past few years has been the push for open science, the movement to ensure that scientific publications, data, and software are freely available to all. In computer science, a big part of this has been a trend towards making software and experimental data available once a paper has been published, so that others can verify experiments and "stand on the shoulders of giants" by extending the software. There have also been initiatives aimed at making sure that the results of experiments in computer science can be replicated.

In the latest release of PANDA, our Platform for Architecture-Neutral Dynamic Analysis, we've taken an important step in ensuring that experiments in dynamic analysis can be freely shared and replicated: as of commit 9139261d70, PANDA creates and loads standalone record/replay logs. This means that you can create a recording of an execution and then share it with others, and they will be able to precisely duplicate the same execution on their own machine, down to the last instruction. Any of PANDA's plugins can be applied to such executions, allowing new analyses to be run on existing, shared executions.

What does this enable? To start with, this makes it possible to share experimental data from research in dynamic analysis. In our paper Tappan Zee (North) Bridge, we performed many experiments that showed how to find useful points to hook in an OS; however, because these were based on executions that were tied to virtual machine disk images, we weren't able to share the data necessary to exactly reproduce our experiments (since that would require sharing a Windows VM with proprietary software). Now, however, we can simply share the detached recordings for the TZB experiments, allowing anyone to verify, for example, that our plugins can find SSL master secrets in IE8 on Windows. We also hope that collections of interesting recordings can form the basis of new benchmarks for dynamic analysis, allowing different implementations and algorithms to be directly compared by running them against a standard set of executions.

Aside from the benefits to reproducibility of dynamic analyses, we hope that this will also permit the creation and sharing of interesting executions that can then be studied by the whole community. For example, we are releasing today a recording of the FBI-authored shellcode that was recently used to identify Tor users connecting to sites hosted by Freedom Hosting. This means that anyone can re-run the recording and analyze every instruction executed by the shellcode to confirm for themselves the information that has appeared in public writeups.

To provide a central location for sharing interesting executions, we have created a site called PANDA Share where PANDA recordings can be uploaded. Each recording comes with a short description and the command line for PANDA needed to reproduce the execution. Right now, the repository contains the recordings of our Tappan Zee Bridge experiments, and the FBI shellcode recording. We are planning to add many more soon, and hope that others will share their own!