Tuesday, January 28, 2014

PANDA, Reproducibility, and Open Science

tl;dr: PANDA now supports detached replays (you don't need the underlying VM image to run a replay), and they can be shared at a new site called PANDA Share. Hooray for reproducibility!

One of the most inspiring developments of the past few years has been the push for open science, the movement to ensure that scientific publications, data, and software are freely available to all. In computer science, a big part of this has been a trend towards making software and experimental data available once a paper has been published, so that others can verify experiments and "stand on the shoulders of giants" by extending the software. There have also been initiatives aimed at making sure that the results of experiments in computer science can be replicated.

In the latest release of PANDA, our Platform for Architecture-Neutral Dynamic Analysis, we've taken an important step in ensuring that experiments in dynamic analysis can be freely shared and replicated: as of commit 9139261d70, PANDA creates and loads standalone record/replay logs. This means that you can create a recording of an execution and then share it with others, and they will be able to precisely duplicate the same execution on their own machine, down to the last instruction. Any of PANDA's plugins can be applied to such executions, allowing new analyses to be run on existing, shared executions.

What does this enable? To start with, this makes it possible to share experimental data from research in dynamic analysis. In our paper Tappan Zee (North) Bridge, we performed many experiments that showed how to find useful points to hook in an OS; however, because these were based on executions that were tied to virtual machine disk images, we weren't able to share the data necessary to exactly reproduce our experiments (since that would require sharing a Windows VM with proprietary software). Now, however, we can simply share the detached recordings for the TZB experiments, allowing anyone to verify, for example, that our plugins can find SSL master secrets in IE8 on Windows. We also hope that collections of interesting recordings can form the basis of new benchmarks for dynamic analysis, allowing different implementations and algorithms to be directly compared by running them against a standard set of executions.

Aside from the benefits to reproducibility of dynamic analyses, we hope that this will also permit the creation and sharing of interesting executions that can then be studied by the whole community. For example, we are releasing today a recording of the FBI-authored shellcode that was recently used to identify Tor users connecting to sites hosted by Freedom Hosting. This means that anyone can re-run the recording and analyze every instruction executed by the shellcode to confirm for themselves the information that has appeared in public writeups.

To provide a central location for sharing interesting executions, we have created a site called PANDA Share where PANDA recordings can be uploaded. Each recording comes with a short description and the command line for PANDA needed to reproduce the execution. Right now, the repository contains the recordings of our Tappan Zee Bridge experiments, and the FBI shellcode recording. We are planning to add many more soon, and hope that others will share their own!

Friday, October 4, 2013

Prebuilt VM for PANDA Now Available

I have just created a prebuilt Virtualbox VM for testing PANDA. It's a current Debian 7.1 install with the latest (as of 10/4/2013) version of PANDA and prerequisites installed. The username and password for the VM are "panda:panda", with root password "panda".

Also included is a Debian i386 QCOW2 image (created by Aurelien Jarno) that can be used to test PANDA. Once you have the VM booted and you're logged in, you can cd into the panda/qemu directory and do:

panda@pandavm:~/panda/qemu$ x86_64-softmmu/qemu-system-x86_64 \
-m 256 -hda ~/qcow/debian_squeeze_i386_standard.qcow2 -monitor stdio

This will start up an instance of PANDA and boot the Debian image. From there you can create recordings and replay them with PANDA's various plugins; see the documentation for more details.

Hopefully this will make it easier for people to get started with PANDA!

Monday, September 30, 2013

Announcing PANDA: A Platform for Architecture-Neutral Dynamic Analysis

I'm pleased to announce the initial release of a new open source dynamic analysis platform built on QEMU, named PANDA (Platform for Architecture-Neutral Dynamic Analysis). It has a number of features that combine to make it a uniquely powerful platform for analyzing software as it executes:

  • Record and Replay: PANDA is capable of recording the non-deterministic inputs during a whole-system execution and later deterministically replaying them. This means that heavyweight analyses that would be too slow to run on a live execution can be decoupled to run on the replayed execution instead. We recently used this in our 2013 ACM CCS paper to monitor every memory access made by an OS and applications, which would not have been feasible without record and replay. Record and replay is currently supported for i386, x86_64, and ARM, with more architectures planned. For more details see the record and replay documentation.
  • Android Support: Thanks to excellent work by Josh Hodosh, PANDA can act as an Android emulator, running modern versions of Android. See the Android documentation for more details.
  • Plugin Architecture: Plugins can be written in C and C++. PANDA supports callbacks for many types of event within QEMU, making it easy to write an analysis plugin; for example, a simple system call tracer is ~60 lines of code. Check out the plugin documentation for more information.
  • LLVM Execution: Borrowed from S2E, this execution mode translates guest code to LLVM and then JIT compiles it to native code; this means that plugins can analyze and transform the LLVM IR rather than working directly on native code. Unique to PANDA is the ability to also translate QEMU's helper functions (which are implemented in C and cover operations too complex to be handled in QEMU's native IR) to LLVM, meaning analyses in PANDA can be complete. This was recently used to implement architecture-neutral dynamic taint analysis.
  • Modern QEMU: PANDA is based on QEMU 1.0.1, with some additional fixes and enhancements backported. Unlike platforms such as BitBlaze/TEMU, which use QEMU 0.9.1, this allows PANDA to support modern OSes such as Windows 8.
If you want to get started, check out the project on GitHub, and read some of the documentation:
Thanks to all the people who have contributed to making PANDA a reality over the past year, including:
  • Josh Hodosh
  • Ryan Whelan
  • Tim Leek
  • Michael Zhivich
  • Patrick Hulin
  • Anthony Eden
  • Sam Coe
  • Nathan VanBenschoten

Tuesday, February 7, 2012

Virtuoso – Initial Code Release

I've just gotten word that the Virtuoso source code has been approved by the sponsor for public release, so I've uploaded version 1.0 to the Virtuoso Google Code site! Thanks to Tim Leek at MIT Lincoln Laboratory for seeing this project through the lengthy release review process!

Also on Google Code, you can find an installation guide and a walkthrough to get you started.

Check it out, and generate some memory analysis tools! If you run into trouble, you can shoot me an email and I'll do my best to help out, but keep in mind that this is a research project, and so there are still lots of rough edges. Enjoy!

Tuesday, September 6, 2011

What I Did on My Summer Vacation

Over the summer I worked at Microsoft Research, which has a fantastically smart bunch of people working on really cool and interesting problems. I just noticed that they've posted the video of my end-of-internship talk, Monitoring Untrusted Modern Applications with Collective Record and Replay. Please take a look if you're curious about what it might look like to try and monitor mobile apps in the wild with low overhead!

Saturday, May 28, 2011

Paper and Slides Available for "Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection"

I've recently returned from Oakland, CA, where the 25 IEEE Symposium on Security and Privacy was held. There were a lot of excellent talks, and it was great to catch up with others in the security community. Now that the conference is over, I'm happy to release the paper and slides of our work, "Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection", which I have described in an earlier post.

The slides contain some animations, and so I've made them available in three formats:
You can also get a copy of the full paper here. I'm also hoping to have the source ready for release soon; when it is available, you'll be able to find it on Google Code under the name Virtuoso.

Once again, thanks to my most excellent co-authors at MIT Lincoln Labs and Georgia Tech for helping me see this project through!

Wednesday, April 6, 2011

Applying Forensic Tools to Virtual Machine Introspection

I've just released a technical report summarizing some work I did a couple years ago that explores how forensic memory analysis and virtual machine introspection are closely linked.

Abstract: Virtual machine introspection (VMI) has formed the basis of a number of novel approaches to security in recent years. Although the isolation provided by a virtualized environment provides improved security, software that makes use of VMI must overcome the semantic gap, reconstructing high-level state information from low-level data sources such as physical memory. The digital forensics community has likewise grappled with semantic gap problems in the field of forensic memory analysis (FMA), which seeks to extract forensically relevant information from dumps of physical memory. In this paper, we will show that work done by the forensic community is directly applicable to the VMI problem, and that by providing an interface between the two worlds, the difficulty of developing new virtualization security solutions can be significantly reduced.

You can read the full paper on SMARTech. Hopefully this will encourage others to start using great memory analysis tools like Volatility for live analysis of virtual machines!