Thursday, July 3, 2014

Breaking Spotify DRM with PANDA

Disclaimer: Although I think DRM is both stupid and evil, I don't advocate pirating music. Therefore, this post will stop short of providing a turnkey solution for ripping Spotify music, but it will fully describe the theory behind the technique and its implementation in PANDA. Don't be evil.

Update 6/6/2014: The following post assumes you know what PANDA is (a platform for dynamic analysis based on QEMU). If you want to know more, check out my introductory post on PANDA.

This past weekend I spoke at REcon, a conference on reverse engineering held every year in Montreal. I had a fantastic time there getting to meet other people interested in problems of memory analysis, reverse engineering, and dynamic analysis. One of the topics of my REcon talk was how to use PANDA to break Spotify DRM, and since the video from the talk won't be posted for a while, I thought I'd write up a post showing how we can use PANDA and statistics to pull out unencrypted OGGs from Spotify.

Gathering Data


The first step is to gather some data. We want to know what function inside Spotify is doing the actual decryption of the songs, so that we can then hook it and pull out the decrypted (but not decompressed) audio file. So to start with, we'll take a recording of Spotify playing a song; we can then apply whatever analysis we want to the replay. Working with a replay rather than a live system will also make our job considerably easier – no need to worry that we're going to slow things down enough to trip anti-debugging measures or network timeouts. I've prepared a record/repay log of Spotify playing 30 seconds of a song, which you can use to follow along with what comes next. The recording is 12 billion instructions, which gives us a lot of data to work with!

Just for fun, here's a movie of that replay, generated by taking screenshots throughout the replay and then stitching them into a video:



Some Theory


The next challenge is to figure out how we can identify the function that takes in encrypted data and outputs decrypted data. For this we turn to the excellent work of Ruoyu Wang, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna [1]. Their clever insight was that when you look at the distribution of bytes in encrypted vs. compressed streams, the byte entropy of the two is very similar, but compressed streams don't look very random. To illustrate this, let's look at the histograms for an encrypted mp3 file, and its decrypted version. First, encrypted:


Now the same file, decrypted:


You can clearly see that the one on the bottom looks significantly less "random" – or more precisely, the distribution of bytes is not very uniform. However, if we compute the byte entropy of each, they are both very close to the theoretical maximum of 8 bits per byte – the mp3 has 7.968480 bits of entropy per byte, whereas the encrypted file has 7.999981 bits per byte.

We can make this intuition more precise by turning to statistics. The Pearson chi-squared test (χ2) lets us compute a value for how much an observed distribution deviates from some ideal distribution. In this case, we expect the bytes in an encrypted file to be uniformly random, so we can compare with the uniform distribution by computing:


Here, Oi is the observed frequency of each byte, and Ei is the expected frequency, which for a uniform byte distribution with n samples will be (1/256)*n.

Similarly, the entropy of some ovserved data can be computed as:


Where p(xi) is the observed frequency of each byte value in the data.

Based on the work of Wang et al., if we find a function that reads a lot of high-entropy, highly random data, and writes a lot of high-entropy, non-random data, that's likely to be our guy!

Enter the PANDA


But enough theory. How do we actually gather the data we need in PANDA? We will want some way of gathering, for each function, statistics on the contents of buffers read and written by each function in the replay. As it happens, PANDA has a plugin called unigrams that will get us the data we want.

The
unigrams plugin works by tracking every memory read and write made by the system. When it sees a read or write, it looks up the current process context (i.e., CR3 on x86), program counter, and the callsite of the parent function (this last is done with the help of the callstack_instr plugin). Together, these three pieces of information allow us to put the individual memory access in context and separate out memory accesses made in different program contexts into coherent streams of data. So to gather the raw data we want, we can just run:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay spotify \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_callstack_instr.so \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_unigrams.so

This produces two files, unigram_mem_read_report.bin and unigram_mem_write_report.bin. The format of these files isn't terribly interesting, but they can be parsed using the Python code found in the unigram_hist.py script. Essentially, it consists of many, many rows of data that have the (callsite, program counter, CR3) triple followed by an array of 256 integers giving the number of times each byte was read or written at that point in the code.

Armed with this data, we want to now go through each callsite and look for those that meet the following criteria:

  1. The function both reads and writes a lot of data, in roughly equal amounts.
  2. The byte entropy of the data read is high, and its χ2 value (deviation from random) is low.
  3. The byte entropy of the data written is high, and its χ2 value is high.
This is precisely what the find_drm.py script does. We can run it like so:

./find_drm.py unigram_mem_read_report.bin unigram_mem_write_report.bin

Among its output, we find the following promising candidate:


(00719b84 3f1ac2e0): 3 x 1 combinations
  Read sizes:  44033, 701761, 701761
  Write sizes: 701761
  Read rand:  2.238299, 258.176922, 263.599258
  Write rand: 142018.776009
  Best input/output ratio (0 is best possible): 0.0

This function read two buffers of size 701,761 bytes and wrote one of size 701,761 bytes – given that we played 30 seconds of the song, this looks just about right. The randomness of the input buffers was quite high (recall that in the χ2 test, high numbers mean the data observed is less likely to be random), but the output buffer was not very random.


Dumping the Data


So how can we confirm our guess? Well, the easiest thing is to simply dump out the data seen at that point. If we go back up to the beginning of the output of the script, we have a list of all the (callsite, program counter, CR3) identifiers for reads and writes that matched our criteria. Looking through the writes for our candidate callsite (00719b84), we find it here:


(00719b84 0042e2ed 3f1ac2e0): 701761 bytes

We can now use another PANDA plugin, tapdump, to dump out all the data flowing through that point in the program. First we create a text file named tap_points.txt in the QEMU directory, and put in it:


00719b84 0042e2ed 3f1ac2e0

Next we run the replay again with the tapdump plugin enabled.

x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay spotify \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_callstack_instr.so \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_tapdump.so

This produces two files, read_tap_buffers.txt.gz and write_tap_buffers.txt.gz, which contain the data read and written at the specified points. If you examine this with zless, you'll see lots of lines of addresses, followed by a single byte value. Separating out each field onto its own line and annotating, these are: 

0000000082678e78 [Caller 13]
000000008260dcc3 [Caller 12]
[...]
000000000071a1a5 [Caller 2]
0000000000719b84 [Caller 1]
000000000042e2ed [PC]
000000003f1ac2e0 [Address space]
000000000b256570 [Write address]
       269882976 [Index]
              4f [Data]

The extra callstack information is included so that, if necessary, more calling context can be used to pull out just the data we're interested in. In our case, however, just one level turns out to be enough. Finally, we want to turn this text file into a binary stream. In the scripts directory, there is a script called split_taps.py which will go through a gzipped tapdump output file and separate out each distinct stream found in the file (based on our usual identifier of (callsite, program counter, CR3)).

So now we can run this on the writes seen at our candidate for the decryption function:

./split_taps.py write_tap_buffers.txt.gz spotify

And obtain spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat, which contains the binary data written at program counter 0x0042e2ed, called from callsite 0x00719b84, inside of the process with CR3 0x3f1ac2e0. So, is this audio we seek?

$ file spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat 

spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat: Ogg data

This looks good! Of course, the proof of the pudding is in the eating, and the proof of the audio is in the listening, so do...

$ cvlc spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat

And you should hear a rather familiar tune :)


Concluding Thoughts


As I mentioned in the disclaimer, this by itself is just the starting point for what you would need to really break Spotify's DRM. It doesn't give you a way to obtain the key for each song and decrypt it wholesale. Instead, you would have to place a hook in the function identified by this process and pull it out as it's played, which limits it to realtime decryption (and Spotify's packing and anti-debugging may make it hard to place the hook in the first place!). Although I can certainly imagine more efficient processes, I think for now this is a nice balance between enabling piracy and showing off the power of PANDA.

If you now want to get a better understanding of the function we found inside Spotify, you can create a memory dump, extract the unpacked Spotify binary (which is packed with Themida) using Volatility, and the load it up in IDA and go to 0x0042e2ed, which is the location where decrypted data is written out.


Postscript


One may wonder what happens when the function that contains 0x0042e2ed is called by others. As it turns out, this appears to be a generic decryption function that is used for other media throughout Spotify, including album art! It is left as an exercise to the reader to dump and examine the rest of the data that this function decrypts.


References


[1] Steal This Movie: Automatically Bypassing DRM Protection in Streaming Media Services. Wang, R., Shoshitaishvili, Y., Kruegel, C., and Vigna, G. USENIX Security Symposium, Washington, D.C., 2013.

Tuesday, January 28, 2014

PANDA, Reproducibility, and Open Science

tl;dr: PANDA now supports detached replays (you don't need the underlying VM image to run a replay), and they can be shared at a new site called PANDA Share. Hooray for reproducibility!

One of the most inspiring developments of the past few years has been the push for open science, the movement to ensure that scientific publications, data, and software are freely available to all. In computer science, a big part of this has been a trend towards making software and experimental data available once a paper has been published, so that others can verify experiments and "stand on the shoulders of giants" by extending the software. There have also been initiatives aimed at making sure that the results of experiments in computer science can be replicated.

In the latest release of PANDA, our Platform for Architecture-Neutral Dynamic Analysis, we've taken an important step in ensuring that experiments in dynamic analysis can be freely shared and replicated: as of commit 9139261d70, PANDA creates and loads standalone record/replay logs. This means that you can create a recording of an execution and then share it with others, and they will be able to precisely duplicate the same execution on their own machine, down to the last instruction. Any of PANDA's plugins can be applied to such executions, allowing new analyses to be run on existing, shared executions.

What does this enable? To start with, this makes it possible to share experimental data from research in dynamic analysis. In our paper Tappan Zee (North) Bridge, we performed many experiments that showed how to find useful points to hook in an OS; however, because these were based on executions that were tied to virtual machine disk images, we weren't able to share the data necessary to exactly reproduce our experiments (since that would require sharing a Windows VM with proprietary software). Now, however, we can simply share the detached recordings for the TZB experiments, allowing anyone to verify, for example, that our plugins can find SSL master secrets in IE8 on Windows. We also hope that collections of interesting recordings can form the basis of new benchmarks for dynamic analysis, allowing different implementations and algorithms to be directly compared by running them against a standard set of executions.

Aside from the benefits to reproducibility of dynamic analyses, we hope that this will also permit the creation and sharing of interesting executions that can then be studied by the whole community. For example, we are releasing today a recording of the FBI-authored shellcode that was recently used to identify Tor users connecting to sites hosted by Freedom Hosting. This means that anyone can re-run the recording and analyze every instruction executed by the shellcode to confirm for themselves the information that has appeared in public writeups.

To provide a central location for sharing interesting executions, we have created a site called PANDA Share where PANDA recordings can be uploaded. Each recording comes with a short description and the command line for PANDA needed to reproduce the execution. Right now, the repository contains the recordings of our Tappan Zee Bridge experiments, and the FBI shellcode recording. We are planning to add many more soon, and hope that others will share their own!

Friday, October 4, 2013

Prebuilt VM for PANDA Now Available

I have just created a prebuilt Virtualbox VM for testing PANDA. It's a current Debian 7.1 install with the latest (as of 10/4/2013) version of PANDA and prerequisites installed. The username and password for the VM are "panda:panda", with root password "panda".

Also included is a Debian i386 QCOW2 image (created by Aurelien Jarno) that can be used to test PANDA. Once you have the VM booted and you're logged in, you can cd into the panda/qemu directory and do:

panda@pandavm:~/panda/qemu$ x86_64-softmmu/qemu-system-x86_64 \
-m 256 -hda ~/qcow/debian_squeeze_i386_standard.qcow2 -monitor stdio


This will start up an instance of PANDA and boot the Debian image. From there you can create recordings and replay them with PANDA's various plugins; see the documentation for more details.

Hopefully this will make it easier for people to get started with PANDA!

Monday, September 30, 2013

Announcing PANDA: A Platform for Architecture-Neutral Dynamic Analysis

I'm pleased to announce the initial release of a new open source dynamic analysis platform built on QEMU, named PANDA (Platform for Architecture-Neutral Dynamic Analysis). It has a number of features that combine to make it a uniquely powerful platform for analyzing software as it executes:

  • Record and Replay: PANDA is capable of recording the non-deterministic inputs during a whole-system execution and later deterministically replaying them. This means that heavyweight analyses that would be too slow to run on a live execution can be decoupled to run on the replayed execution instead. We recently used this in our 2013 ACM CCS paper to monitor every memory access made by an OS and applications, which would not have been feasible without record and replay. Record and replay is currently supported for i386, x86_64, and ARM, with more architectures planned. For more details see the record and replay documentation.
  • Android Support: Thanks to excellent work by Josh Hodosh, PANDA can act as an Android emulator, running modern versions of Android. See the Android documentation for more details.
  • Plugin Architecture: Plugins can be written in C and C++. PANDA supports callbacks for many types of event within QEMU, making it easy to write an analysis plugin; for example, a simple system call tracer is ~60 lines of code. Check out the plugin documentation for more information.
  • LLVM Execution: Borrowed from S2E, this execution mode translates guest code to LLVM and then JIT compiles it to native code; this means that plugins can analyze and transform the LLVM IR rather than working directly on native code. Unique to PANDA is the ability to also translate QEMU's helper functions (which are implemented in C and cover operations too complex to be handled in QEMU's native IR) to LLVM, meaning analyses in PANDA can be complete. This was recently used to implement architecture-neutral dynamic taint analysis.
  • Modern QEMU: PANDA is based on QEMU 1.0.1, with some additional fixes and enhancements backported. Unlike platforms such as BitBlaze/TEMU, which use QEMU 0.9.1, this allows PANDA to support modern OSes such as Windows 8.
If you want to get started, check out the project on GitHub, and read some of the documentation:
Thanks to all the people who have contributed to making PANDA a reality over the past year, including:
  • Josh Hodosh
  • Ryan Whelan
  • Tim Leek
  • Michael Zhivich
  • Patrick Hulin
  • Anthony Eden
  • Sam Coe
  • Nathan VanBenschoten

Tuesday, February 7, 2012

Virtuoso – Initial Code Release

I've just gotten word that the Virtuoso source code has been approved by the sponsor for public release, so I've uploaded version 1.0 to the Virtuoso Google Code site! Thanks to Tim Leek at MIT Lincoln Laboratory for seeing this project through the lengthy release review process!

Also on Google Code, you can find an installation guide and a walkthrough to get you started.

Check it out, and generate some memory analysis tools! If you run into trouble, you can shoot me an email and I'll do my best to help out, but keep in mind that this is a research project, and so there are still lots of rough edges. Enjoy!

Tuesday, September 6, 2011

What I Did on My Summer Vacation

Over the summer I worked at Microsoft Research, which has a fantastically smart bunch of people working on really cool and interesting problems. I just noticed that they've posted the video of my end-of-internship talk, Monitoring Untrusted Modern Applications with Collective Record and Replay. Please take a look if you're curious about what it might look like to try and monitor mobile apps in the wild with low overhead!

Saturday, May 28, 2011

Paper and Slides Available for "Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection"

I've recently returned from Oakland, CA, where the 25 IEEE Symposium on Security and Privacy was held. There were a lot of excellent talks, and it was great to catch up with others in the security community. Now that the conference is over, I'm happy to release the paper and slides of our work, "Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection", which I have described in an earlier post.

The slides contain some animations, and so I've made them available in three formats:
You can also get a copy of the full paper here. I'm also hoping to have the source ready for release soon; when it is available, you'll be able to find it on Google Code under the name Virtuoso.

Once again, thanks to my most excellent co-authors at MIT Lincoln Labs and Georgia Tech for helping me see this project through!