Wednesday, May 7, 2008

Parsing Windows Minidumps

When a user-mode application crashes in Windows, a built-in debugger known as "Dr. Watson" steps in and captures some basic information that can be sent back to developers to help debug the crash. As part of this process, it creates what's called a minidump that contains portions of the process's memory and a great deal of extra information about the state and attributes of the process. Among the information available is:

  • CPU state for each thread.

  • A list of loaded modules, including their timestamps.

  • The Process Environment Block (PEB) for the process.

  • Basic system information, such as the build number and service pack level of the perating system.

  • The process creation time, and how long it has spent executing in kernel and user space.

  • Detailed information on the exception that was raised.


Using the userdump.exe utility provided by Microsoft, it is also possible to take a complete snapshot of the memory of any running process. This tool also, as it turns out, stores its output using the minidump format. Minidumps made with this tool, in addition to all the information available in a standard minidump, include the full process memory, and (with the -w option), a list of window handles as well.

Unlike many Microsoft formats, the minidump container format is actually fully documented by Microsoft. The relevant data structures and constants can all be found in dbghelp.h, and explanations of each field can be found on MSDN. The basic structure of the file is simple: it starts with the MINIDUMP_HEADER, which gives the offset of the stream directory (a list of MINIDUMP_DIRECTORY structures). Each directory entry has a type code (indicating what the stream is for), the size of the stream, and the offset in the file where the stream begins. Don't be scared by the use of the term "relative virtual offset" (RVA) in the Microsoft documentation; in this context, it just means "offset from the beginning of the file".

The format is not only openly documented, it is also extensible: any application can add a new stream type (using the type codes above the reserved range 0x0000-0xFFFF) and thereby include any sort of extra data in the minidump. The open-source, cross-platform crash reporter, Google Breakpad, actually uses the minidump container as its native crash dump format on all platforms. The project's source includes a set of C++ classes that can parse and work with minidump files, which can be instructive in clearing up any ambiguities in the MS-provided documentation. One final (and somewhat unexpected) source of information is the United States patent on generating minidumps. Putting aside the fact that patenting the process of saving some context to a container format after a crash seems pretty silly, the patent description is full of interesting technical details.

For memory analysis purposes, it is useful to understand the minidump format, as it is the format used by the userdump utility to save the full address space of a process. For minidumps written by userdump.exe, the actual memory ranges are described in the Memory64ListStream stream (type code 9). The stream gives the base offset in the file where the process's memory can be found, and then has a list of structures that give the size and virtual address of each memory region. (it is not necessary to give the file offset for each memory range, since they are all contiguous; the second memory range described appears in the file directly after the end of the first). Additional information on each memory range is found in the MemoryInfoListStream, which lists the protection attributes (read-only, writable, executable), state (free, reserved, or committed) and type (image, mapped file, or private allocation) for each range addressable by the process.

From this information we can reconstruct the entire memory space for a given process, and then examine its virtual address space to find interesting artifacts, such as its list of loaded modules (accessible through the Process Environment Block, or PEB) or any application-specific data it was working with (a notable example would be passwords or encryption key data, as demonstrated at CanSecWest this year). It should be fairly easy to create an address space class within Volatility that can read minidumps, at which point any of the Volatility modules work with user-mode data (currently just dlllist, but more are expected in the future) will be usable on minidumps generated by userdump.exe.

Rather than go into the gory details of the data structures involved in parsing each stream, I have decided to simply release a library written using Python and Construct. The library can be downloaded here; currently every stream type listed in Microsoft's documentation is fully parsed. The library also supports the "Window Handle" stream created by userdump.exe (stream type 0x10000), although some fields are still unknown as they are undocumented (specifically, there are four unknown DWORDS that I have been unable to decipher -- if anyone has any suggestions as to the structure, I would love to hear them!).

You can also run minidump.py as a command line program, and it will print out the entire parsed structure of the minidump, including thread context, open handles, system information, and loaded modules. Enjoy!

Monday, May 5, 2008

DFRWS 2008 - Registry Forensics in Memory

I'm pleased ecstatic to announce that my paper, Forensic Analysis of the Windows Registry in Memory, has been accepted into the 8th annual Digital Forensics Research Workshop! The full program is available, and it looks like there are a lot of really cool presentations scheduled. Memory analysis is heavily featured this year, and has been given a whole session.


As usual, all the papers will be posted on the DFRWS website once the conference begins. Until then, here's the abstract of my paper:


This paper describes the structure of the Windows registry as it is stored in physical memory. We present tools and techniques that can be used to extract this data directly from memory dumps. We also provide guidelines to aid investigators and experimentally demonstrate the value of our techniques. Finally, we describe a compelling attack that modifies the cached version of the registry without altering the on-disk version. While this attack would be undetectable with conventional on-disk registry analysis techniques, we demonstrate that such malicious modifications are easily detectable by examining memory.


I also want to give my ongoing thanks to AAron Walters, who helped me out a ton by providing comments and suggestions on drafts of the paper. He continues to do great work that enriches the entire memory analysis community.


If you're going to be at DFRWS '08 and want to meet up, drop a note in the comments or send me an e-mail! See you in Baltimore!