Tuesday, September 4, 2007

Challenges in Carving Registry Hives from Memory

As I mentioned last week, I moved to a new apartment this week, and as a result I didn't have a lot of time to do any serious work. Still, I didn't want to let the entire week go to waste, so I decided to try and tackle a problem that I thought would be relatively simple: extracting a copy of the binary registry hives out of memory. As it turned out, this was actually a bit more difficult than I expected, and I'll have to get back to the problem at a later date, but I thought in the meantime I'd write about what steps I took, where I ran into trouble, and describe the approach I hope to take when I revisit the issue.

There were several reasons to suspect that one might find at least partial copies of the registry in memory: the registry stores all of the configuration data for the Windows operating system, and its contents are referred to and updated quite often during normal operation (let Sysinternals' Regmon run for few minutes and look at the output if you have doubts about this). In addition, it seemed to me that the binary structure of the registry hives was in many ways well-suited to an in-memory representation: the size of a block in a registry file, 0x1000 bytes (4096 bytes, or 4KB) is the same as the size of a page in memory, and the various data structures present in registry hives have a clear interpretation as C structures in memory.

Indeed, it appears that the entire registry is stored in memory, at least in Windows 2000. Quoting Russinovich and Solomon's excellent Windows Internals:

Windows 2000 keeps a version of every hive in the kernel's address space. When a hive initializes, the configuration manager determines the size of the hive file, allocates enough memory from the kernel's paged pool to store it, and reads the hive file into memory. [...]

In Windows XP and Windows Server 2003, the configuration manager maps portions of a hive into memory as it needs to access them. It uses the cache manager's file mapping functions to map in 16-KB views into the hive files.

Windows Internals, p. 203

For simplicity's sake, I'm not going to deal with XP and Server 2003 in this entry; it seems unlikely that very much of the registry will be recoverable under those OSes, given the mapping mechanism described above. Instead, we'll look at recovering registry hives from Windows 2000, specifically the two memory images released for the DFRWS 2005 Memory Analysis Challenge.

As mentioned earlier, registry hives are divided into fixed blocks of size 0x1000 bytes; the first of these blocks is called the base block, and it begins with the signature "regf". The base block also contains four dwords giving the version of the hive structure in use (at offsets 0x14, 0x18, 0x1C, and 0x20), and in Windows 2000 these values are always 1, 3, 0, and 1. These two facts together make for an excellent signature to search for the start of a hive in a Windows 2000 memory dump. Using XMagic:

# XMagic

# Windows Registry files.
# updated by Joerg Jenderek
0 string = regf Windows 2000 registry file
# Reg version
<0x14 lelong = 1
<<0x18 lelong = 3
<<<0x1c lelong = 0
<<<<0x20 lelong = 1
Using this signature in conjunction with FTimes finds four hive headers in dfrws2005-physical-memory1.dmp, and ten in dfrws2005-physical-memory2.dmp. Going to the offests given in the memory dump and examining the results, there do not appear to be any false positives. The fact that there are so many more results from the second memory dump is likely due to the fact that the hives reside in the kernel's paged pool; this means that they can be swapped out to disk. Since the second image was taken right after the system was booted (according to the challenge details), it seems likely that some of the hive headers in the first image were paged out.

My first, naive attempt to carve out the files was to simply calculate the size of each hive and then use dd to carve out that much data from the offset given by FTimes. The size of the hive can be easily calculated by looking at offset 0x28 in the hive header; the dword at this position gives the offset of the last block in the registry hive. Since each block in the hive is 0x1000 bytes, the size of the hive as a whole should be the offset of the last block plus 0x1000. This method of extracting the registry fails, however – although it is reasonable to expect that the registry is contiguous in memory, this assumption would only hold for virtual memory, and our carving method is trying to extract a contiguous range from physical memory.

Clearly, then, what's needed is the virtual address of the start of the hive in memory. To obtain this, I decided to generate a map of each process's address space, showing each virtual address and its corresponding physical address (if any). A good tool to do this is Andreas Schuster's memdump.pl; I wrote my own version of this in Python, drawing on the address translation routines from x86.py in Volatility (you'll need to copy in the "forensics" directory from Volatility into wherever you save memdump.py in order for it to work). Both memdump.pl and memdump.py take a memory dump and the address of a page directory and generate a list of all offsets by trying to translate each possible virtual address from 0x00000000 up to 0xFFFFFFFF using the given page directory. (Note: PTFinder or my own XMagic signatures can be used to locate processes and their page directories in a memory dump.)

Once such a map has been generated, we find that physical offset 0x01614000 (which appeared to contain the SYSTEM hive) is mapped in at 0xE1012000 in the virtual address space of every process (addresses above 0x80000000 are generally reserved for the kernel and are the same for every process). I wrote a small program to extract a range of virtual memory from an image, given a page directory, a start address, and a length. (Note: this time I've used my own memutil.py to do the address translations; it was easier to modify to continue despite invalid pages and to give verbose debug output.) After invoking it like so:

./regdump.py dfrws2005-physical-memory1.dmp \
0x00030000 0xE1012000 0x28A000 > test.sys
we get something that we might hope is the complete SYSTEM hive for the DFRWS 2005 test system (modulo any memory pages that were swapped out).

Alas, we are not so lucky. Trying to validate this structure by using a small C program I had around that attempts to walk the tree of registry keys and print their names caused it to die after reading only a small portion of the file. An bit of manual examination of the file we have shows why: it appears that for some reason parts of the file are no longer in the correct position. We can tell this because registry hives store their data in bins, and each bin has a header with a signature ("hbin"), its offset from the first hbin block (i.e., its position in the file), and its size. We immediately notice that at offset 0x4000, there is a block that claims its offset is 0, meaning it should be found at file position 0x1000.

Putting aside for now the question of why the hbin blocks are not in the order we expected, could we perhaps get the file to validate by taking each block and placing it in its correct position? To do this, we scan the file on block boundaries for the "hbin" header, read in its offset, and then write it out at the correct position. The program reorder_hbins.py does just this: it takes a list of hbin offsets on standard input (such a list can be trivially generated with FTimes in dig mode) and a file containing the registry data, and writes out a new version with the hbins in their "correct" places.

With this modification, the registry hive comes closer to being parsable – regview.c gives the following output:

Fatal: encountered unknown subkey type at 00277dc0
However, at this point, if we look at offset 0x00277dc0, we find that it is in the middle of a page of zeroes. It seems that there was no hbin that claimed to be at that position in the original section of memory we carved out.

There could be several reasons for this. First, it is possible that the hbin we're looking for is simply paged out; if this is the case, no amount of searching through the memory image will give us the block we need. To fully recover the registry we would need the pagefile from the system, and since the challenge is now two years old, we are unlikely to get a copy of it. Still, we could take the following approach to try and at least partially reconstruct the tree: now that the hbin blocks appear to be in the correct positions, we can scan through to find individual nodes (so-called nk cells), and then reconstruct subtrees from each of those until we run into trouble again. This might at least allow us to reconstruct interesting information such as the BootKey used in SYSKEY mechanism, which we could then use to decrypt the local password hashes for the system from the SAM hive (for more details on how to extract the necessary information from SYSKEY, take a look at the bkhive and samdump2 programs by Nicola Cuomo).

The other possible explanation for the missing block can be found by again consulting Windows Internals – which also may help explain why some of the blocks were in the wrong position.

If hives never grew, the configuration manager could perform all its registry management on the in-memory version of a hive as if the hive were a file. Given a cell index, the configuration manager could calculate the location in memory of a cell by adding the cell index, which is a hive file offset, to the base of the in-memory hive image. [...] Unfortunately, hives grow as they take on new keys and values, which mans the system must allocate paged pool memory to store the new bins that contain the added keys and values. Thus, the paged pool that keeps the registry data in memory isn't necessarily contiguous. [emphasis mine]

Windows Internals, p. 204

So our earlier assumption may not be valid, especially if there have been keys or values added (which is almost certainly the case on a system that has been running for a while).

To deal with this, the configuration manager uses a scheme reminiscent of x86 virtual address translation, and has a translation table that maps each cell index (i.e. offset within the registry hive) to the appropriate location in memory.

Dealing with these new complexities, however, is a bit too much for me, at least this week. The next steps down this path would involve locating the data structures used by the configuration manager in memory, and the parsing them to get an accurate picture of the registry in memory. If anyone wants to do some exploring on their own, a good place to start would be to look at debug symbols starting with _CM, as well as the !reg extension in WinDbg. Hopefully I'll be able to return to this topic in the near future.

For next week, however, I'll be back to parsing PDB files; with some luck, I should have code to extract type information ready by then, and the beginnings of a full-fledged Python module to handle PDB files.

No comments: