Posts

Showing posts with the label reproducibility

Reproducible Malware Analyses for All

Summary : With help from GTISC , I have begun running 100 malware samples per day and posting the PANDA record & replay logs online at http://panda.gtisc.gatech.edu/malrec/ . The goal is to lower the barriers to entry for doing dynamic malware research, and to make such research reproducible . Today, I spoke at the ACSAC Malware Memory Forensics workshop in New Orleans about a problem that I think has been largely ignored in existing dynamic malware analysis research: reproducibility . To make results reproducible, a computer science researcher typically needs to do three things: Carefully and precisely describe their methods. Release the code they wrote for their system or analysis. Release the data the analysis was performed on. Of course, even research published at top conferences may fail at some of these criteria; a recent study by Collberg et al. attempted to obtain the code associated with 613 recent papers from ACM conferences, and were able to obtain, build and...

Replaying Regin in PANDA

Regin, a piece of state-sponsored malware that may have been used to attack telecoms and cryptographers, has recently come to light. There are several good writeups out there, and I encourage you to check them out. Getting access to samples in cases like this is often a challenge. Luckily, both The Intercept and VXShare  ( warning : both links contain live malware) have released samples thought to be associated with Regin, so that others can perform independent analysis. So far, it appears that the samples are all of the "stage1" component of the malware, rather than the initial "stage0" infector or the later stages. In order to allow others to do dynamic analysis of this malware, I built a very small malware sandbox setup using PANDA. The sandbox essentially just executes a sample for five minutes, recording it using PANDA's record and replay facility. The process is slightly complicated by the fact that most of the stage1 samples are kernel-mode compo...

PANDA, Reproducibility, and Open Science

tl;dr : PANDA now supports detached replays (you don't need the underlying VM image to run a replay), and they can be shared at a new site called PANDA Share . Hooray for reproducibility! One of the most inspiring developments of the past few years has been the push for  open science , the movement to ensure that scientific publications, data, and software are freely available to all. In computer science, a big part of this has been a trend towards making software and experimental data available once a paper has been published, so that others can verify experiments and "stand on the shoulders of giants" by extending the software. There have also been  initiatives  aimed at making sure that the results of experiments in computer science can be replicated. In the latest release of PANDA, our Platform for Architecture-Neutral Dynamic Analysis, we've taken an important step in ensuring that experiments in dynamic analysis can be freely shared and replicated: as of  ...