Thursday, August 27, 2015

(Sys)Call Me Maybe: Exploring Malware Syscalls with PANDA

System calls are of great interest to researchers studying malware, because they are the only way that malware can have any effect on the world – writing files to the hard drive, manipulating the registry, sending network packets, and so on all must be done by making a call into the kernel.

In Windows, the system call interface is not publicly documented, but there have been lots of good reverse engineering efforts, and we now have full tables of the names of each system call; in addition, by using the Windows debug symbols, we can figure out how many arguments each system call takes (though not yet their actual types).

I recently ran 24,389 malware replays under PANDA and recorded all the system calls made, along with their arguments (just the top-level argument, without trying to descend into pointer types or dereference handle types). So for each replay, we now have a log file that looks like:

3f9b2340 NtGdiFlush
3f9b2340 NtUserGetMessage 0175feac 00000000 00000000 00000000
3f9b2120 NtCreateEvent 0058f8d8 001f0003 00000000 00000000 00000000
3f9b2120 NtWaitForMultipleObjects 00000002 0058f83c 00000001 00000000 00000000
3f9b2120 NtSetEvent 000002ec 00000000
3f9b2120 NtWaitForSingleObject 000002f0 00000000 0058f89c
3f9b2120 NtReleaseWorkerFactoryWorker 00000050
3f9b2120 NtReleaseMutant 00000098 00000000
3f9b2120 NtWaitForSingleObject 000005a4 00000000 00000000
3f9b2120 NtWaitForMultipleObjects 00000002 00dbf49c 00000001 00000000 00000000
3f9b2120 NtReleaseMutant 00000098 00000000
3f9b2120 NtWaitForMultipleObjects 00000002 00dbf4a8 00000001 00000000 00dbf4c8
3f9b2120 NtWaitForMultipleObjects 00000002 00dbf49c 00000001 00000000 00000000
3f9b2120 NtClearEvent 000002ec
3f9b2120 NtReleaseMutant 00000098 00000000
3f9b2120 NtWaitForMultipleObjects 00000002 00dbf49c 00000001 00000000 00000000
3f9b2120 NtReleaseMutant 000001e8 00000000
3f9b2120 NtWaitForMultipleObjects 00000002 00dbf3b8 00000001 00000000 00000000
3f9b2120 NtReleaseMutant 00000158 00000000
3f9b2120 NtCreateEvent 00dbeed4 001f0003 00000000 00000000 00000000
3f9b2120 NtDuplicateObject ffffffff fffffffe ffffffff 002edf50 00000000 00000000 00000002

3f9b2120 NtTestAlert
...

The first column identifies the process that made the call, using its address space as a unique identifier. The second gives the name of the call, and the remaining columns show the arguments passed to the function.

As usual, this data can be freely downloaded; the data set is 38GB. Each log file is compressed; you can use the showsc program (included in the tarball) to display an individual log file:

$ ./showsc 32 32bit/008d065f-7f5d-4a86-9995-970509ff3999_syscalls.dat.gz

You can download the data set here:

Interesting Malware System Calls

As a first pass, we can look at what the least commonly used system calls are. These may be interesting because rarely used system calls are more likely to contain bugs; in the context of malware, invoking a vulnerable system call can be a way to achieve privilege escalation.

Here are a few that came out from sorting the list of system calls in the malrec dataset and then searching Google for some of the least common:
  • NtUserMagControl (1 occurrence) One of many functions found by j00ru to cause crashes due to invalid pointer dereferences when called from the context of the CSRSS process
  • NtSetLdtEntries (2 occurrences) Used as an anti-debug trick by some malware
  • NtUserInitTask (3 occurrences) Used as part of an exploit for CVE-2012-2553
  • NtGdiGetNearestPaletteIndex (3 occurrences) Used in an exploit for MS07-017
  • NtQueueApcThreadEx (5 occurrences) Mentioned as a way to get attacker-controlled code into the kernel, allowing one to bypass SMEP
  • NtUserConvertMemHandle (5 occurrences) Used to replace a freed kernel object with attacker data in an exploit for CVE-2015-0058
  • NtGdiEnableEudc (9 occurrences) Used in a privilege escalation exploit where NtGdiEnableEudc assumes a certain registry key is of type REG_SZ without checking, allowing an attacker to overflow a stack buffer (I was unable to find anything about whether this has been patched – Update: Mark Wodrich points out that this is CVE-2010-4398 and it was patched in MS11-011)
  • NtAllocateReserveObject (11 occurrences) Used for a kernel pool spray
  • NtVdmControl (55 occurrences) Used for the famous CVE-2010-0232 bug; Tavis Ormandy won the Pwnie for Best Privilege Escalation Bug in 2010 for this.
Of course, we can't say for sure that the replays that execute these calls actually contain exploitation attempts. After all, there are benign ways to use each of the calls, or they wouldn't be in Windows in the first place :) But these are a few that may reward closer examination; if they are in fact exploit attempts, you can then use PANDA's record and replay facility to step through the exploit in as much detail as you like. You can even use PANDA's recently-fixed QEMU gdb stub to go through the exploit instruction by instruction.

You can peruse the full list of system calls and their frequencies here: 32-bit, 64-bit. Let me know if you find any other interesting calls in there :)

Updates 8/28/2015

If you want to know which log files have which system calls without processing all of them, I have created an index that lists the unique calls for each replay:
Also, Reddit user trevlix wondered whether the lack of pointer dereferencing was inherent to PANDA or something I'd just left out. My response:

Yes, it is possible to do that. I just wasn't able to because I didn't have access to full system call prototypes. E.g., to follow pointers for something like NtCreateFile, you need to know that its full prototype is
NTSTATUS NtCreateFile(
  _Out_    PHANDLE            FileHandle,
  _In_     ACCESS_MASK        DesiredAccess,
  _In_     POBJECT_ATTRIBUTES ObjectAttributes,
  _Out_    PIO_STATUS_BLOCK   IoStatusBlock,
  _In_opt_ PLARGE_INTEGER     AllocationSize,
  _In_     ULONG              FileAttributes,
  _In_     ULONG              ShareAccess,
  _In_     ULONG              CreateDisposition,
  _In_     ULONG              CreateOptions,
  _In_     PVOID              EaBuffer,
  _In_     ULONG              EaLength
);
You furthermore have to know how big an OBJECT_ATTRIBUTES struct is, so that when you dereference the pointer you know how many bytes to read and store in the log.
If you wanted to collect extra information about any of the logs posted, it's possible since they are full-system traces and can be replayed :) Supposing you have a syscall trace file like0a1a1a77-d4f1-43e0-bc14-4f34f7d96820_syscalls.dat.gz, you can use the UUID to find it on malrec and download the log file:
Then you'd just unpack that log (scripts/rrunpack.py in the PANDA directory) and replay it with a PANDA plugin that understands how to dereference the various pointers involved. For reference, you can see the PANDA plugin I originally used to gather the syscall traces:
And you can see on lines 108 and 119 where you'd have to add in code to read the dereferenced values.

No comments: