Cell Index Translation
Throughout our previous discussions of registry keys, hives, and so on, we have run into the concept of a cell index several times. They appear in kernel memory in places such as the KeyCell member of the _CM_KEY_CONTROL_BLOCK structure; likewise, all of the structures representing portions of the hive itself (_CM_KEY_NODE, _CM_KEY_VALUE, and so on) have members that point to other structures by cell index rather than by virtual address.
The reason for this is that registry hives live two lives: one, as on-disk files, stored in %WINDIR%\system32\config\, and another, as an in-memory structure that the Configuration Manager uses in memory. In the former, cell indices can be thought of essentially as simple offsets from the start of the file; in the latter, however, the situation is more complicated. When registry hives are represented in memory, they must account for the fact that new keys and values may be added at any time. Since space for the hives is allocated out of paged pool, there is no way of guaranteeing that the memory allocated for the hive will continue to be contiguous.
To solve this problem, the Configuration Manager makes use of what Solomon and Russinovich (Windows Internals pp. 203-207) call cell maps. This is done in a very similar way to virtual to physical address translation on x86. Just as different portions of a virtual address on x86 are actually indices into different tables starting with the page directory, each portion of a cell index is an offset into several tables starting at the Storage.Map member of the appropriate _HHIVE.
Cell Index Translation in Memory
As mentioned before, there are several parts to the cell index: the type (1 bit, stable or volatile), the table index (10 bits), the entry index (9 bits), and finally the offset into the block (12 bits). The following diagram illustrates this layout:
1 10 bits 9 bits 12 bits
[V][ Table ][ Entry ][ Offset ]
Or, in terms of masks and shifts:
CI_TYPE_MASK = 0x80000000
CI_TYPE_SHIFT = 0x1F
CI_TABLE_MASK = 0x7FE00000
CI_TABLE_SHIFT = 0x15
CI_ENTRY_MASK = 0x1FF000
CI_ENTRY_SHIFT = 0x0C
CI_OFF_MASK = 0x0FFF
CI_OFF_SHIFT = 0x0
To translate a cell index to a virtual address, first start by selecting the appropriate storage map for the hive using the first bit of the cell index. Each cell index can refer to either the stable or volatile map for the hive, stored in the first and second elements of the Storage array of the _HHIVE, respectively. These elements are of type _DUAL and contain several fields:
lkd> dt nt!_DUAL
+0x000 Length : Uint4B
+0x004 Map : Ptr32 _HMAP_DIRECTORY
+0x008 SmallDir : Ptr32 _HMAP_TABLE
+0x00c Guard : Uint4B
+0x010 FreeDisplay : [24] _RTL_BITMAP
+0x0d0 FreeSummary : Uint4B
+0x0d4 FreeBins : _LIST_ENTRY
The one we care about here is the Map member. This plays a role analogous to the page directory in virtual to physical address mapping; it is the first table we will consult when translating an address, and we will always need to be able to locate it if we are to translate any cell index.
The next 10 bits of the cell index encode the table index; this refers to an entry in the _HMAP_DIRECTORY. The _HMAP_DIRECTORY is an array of 1024 (210) pointers to _HMAP_TABLEs; the table index lets us know which of these we want.
Next we have 9 bits that give the entry index, a value that selects a particular entry from the _HMAP_TABLE, which is an array of 512 (29) _HMAP_ENTRY structures. Finally, each _HMAP_ENTRY looks like:
lkd& dt nt!_HMAP_ENTRY
+0x000 BlockAddress : Uint4B
+0x004 BinAddress : Uint4B
+0x008 CmView : Ptr32 _CM_VIEW_OF_FILE
+0x00c MemAlloc : Uint4B
The BlockAddress member will give us the virtual address in memory of the hbin (remember that hives are structured into a series of bins, which in turn contain cells). At this point, we need only add to it offset (the final 12 bits) to get the address of the cell corresponding to the data we're looking for.
The only thing left to do before we can make use of the address we've computed is add 4. Why? As it turns out, the cell index actually points to a DWORD at the beginning of the cell that gives the length of the data. So, for example, the cell index of a key will point to the size of the key, rather than the start of the key itself. While this may be quite helpful in C (since you can just read that many bytes directly into the appropriate C struct), in higher level languages it's not necessary. So as the final step of the translation, we add 4 to the virtual address to get the actual start of the data.
On-Disk Cell Index Translation
When dealing with hive files on disk, things are much simpler. To translate a cell index into a file offset, we just use the following formula:
FileAddress = CellIndex + 0x1000 + 4
Adding 0x1000 is necessary because cell indexes are relative to the first hbin in the file; before the hbins there is an additional 0x1000 byte block known as the base block that contains metadata about the hive as a whole (the layout of this block can be seen by issuing the dd nt!_HBASE_BLOCK from within WinDbg). In addition, we add 4 to the address for precisely the same reasons we did when translating cell indexes in memory.
This simpler translation scheme is also used in memory if the Flat flag is set in the _HHIVE structure.
Notes on Implementation
In Volatility, the super-cool memory analysis framework from Volatile Systems, we have the concept of stackable address spaces. The "address space" part means that we have can have objects that represent various types of memory spaces (such as standard x86 memory space, PAE memory space, and so on); these objects have, at a minimum, read(), vtop(), and is_valid_address() methods that handle reading from the memory space, translating virtual to physical addresses (whatever virtual and physical mean in that space's context), and checking to see if a given address is valid in that space.
The "stackable" part means that address spaces can be layered, with on address space calling on another to actually read the data. One example in Volatility is the IA32PagedMemory class, which deals with virtual addresses, but relies on having some base address space (in most cases, a FileAddressSpace, which just treats any address as an offset in the underlying file) available to read data from.
In the same way, we can see each hive as its own space, with cell indexes playing the same role as virtual addresses in memory. I have implemented two address spaces within Volatility that demonstrate this: HiveAddressSpace, which uses a virtual memory space like IA32PagedMemory and does cell map translation, and HiveFileAddressSpace, which uses simpler file-based translation method and works with spaces like the FileAddressSpace.
There are several benefits to this approach. First, all of the interfaces used in Volatility to deal with structures (read_obj, read_value, and so on) will work with the new address spaces we have defined. So, for example, assuming we have defined _CM_KEY_NODE, _CHILD_LIST, and _CM_KEY_VALUE in our vtypes.py, the following code will get us the name of the first value of a given key:
val_count = read_obj(hive_space, types,
["_CM_KEY_NODE", "ValueList", "Count"], key_addr)
val_list_ptr = read_obj(hive_space, types,
["_CM_KEY_NODE", "ValueList", "List"], key_addr)
val_ptr = read_value(hive_space, "pointer",
val_list_ptr)
name_len = read_obj(hive_space, types,
["_CM_KEY_VALUE", "NameLength"], val_ptr)
name = read_string(hive_space, types,
["_CM_KEY_VALUE", "Name"], val_ptr, name_len)
The other benefit to implementing the translation as an address space is that the code above will work equally well with a HiveAddressSpace or a HiveFileAddressSpace -- meaning we can deal with hives in memory exactly as we do on-disk hives. To cite a relevant example, all of the code in CredDump works perfectly with a HiveAddressSpace object in place of the HiveFileAddressSpace. This means we can seamlessly extract password hashes and other credentials directly from memory! Cool!
What Can be Done Now
At this point, we can locate hives in memory, find open keys, and make sense of the cell indexes that point into the raw hive data in memory. But how can we actually now traverse the keys and values in the registry? It turns out this is actually not a big problem; many people have already done work on parsing raw registry hives, and we can apply these techniques to memory as well. Samba's regfio library and Tim Morgan's RegLookup are good places to start.
Aside from looking at open source projects, I have also discovered that a large amount of information on internal registry structures already exists in the public symbols distributed by Microsoft. Here are the types I know of that correspond to hive data structures:
- _CM_KEY_NODE
- _CM_KEY_VALUE
- _CHILD_LIST
- _CM_KEY_INDEX
- _CM_KEY_SECURITY
- _CM_BIG_DATA
Using these types and Volatility's built-in data access functions, it is pretty simple to write code that can traverse the registry hive in memory or on disk. A very simple implementation that can traverse key nodes and values can be found in framework/win32/rawreg.py in the CredDump source.
One word of caution, though. As with all tools that work with memory, we must be able to deal with the fact that the data we're looking for may not be there. This is a very real danger with hive files in memory; not only can portions of memory be paged out as usual, but the part of the hive we're looking for may not have ever been loaded into memory at all! Windows XP and above only map in small views of each hive that correspond to the pieces of data they need to access. So while, in general, every key that has a Key Control Block will be mapped in, for any other keys there is no guarantee that the data will be there. Still, with a little bit of careful programming it's not too hard to avoid getting into trouble.
The End
With what we know now about the registry and how it is stored in Windows memory, we should have no trouble extracting all the registry data we can get our hands on from a memory dump. Of course, this is not really the end -- there are still many, many tools that can be developed that examine specific parts of the registry in memory; CredDump is just one small example. However, I believe that all the significant technical challenges involved in accessing registry data in Windows XP memory dumps have been resolved, and so I will leave it to others to make the most of this new capability.
Comments