Breaking Spotify DRM with PANDA

Disclaimer: Although I think DRM is both stupid and evil, I don't advocate pirating music. Therefore, this post will stop short of providing a turnkey solution for ripping Spotify music, but it will fully describe the theory behind the technique and its implementation in PANDA. Don't be evil.

Update 6/6/2014: The following post assumes you know what PANDA is (a platform for dynamic analysis based on QEMU). If you want to know more, check out my introductory post on PANDA.

This past weekend I spoke at REcon, a conference on reverse engineering held every year in Montreal. I had a fantastic time there getting to meet other people interested in problems of memory analysis, reverse engineering, and dynamic analysis. One of the topics of my REcon talk was how to use PANDA to break Spotify DRM, and since the video from the talk won't be posted for a while, I thought I'd write up a post showing how we can use PANDA and statistics to pull out unencrypted OGGs from Spotify.

Gathering Data


The first step is to gather some data. We want to know what function inside Spotify is doing the actual decryption of the songs, so that we can then hook it and pull out the decrypted (but not decompressed) audio file. So to start with, we'll take a recording of Spotify playing a song; we can then apply whatever analysis we want to the replay. Working with a replay rather than a live system will also make our job considerably easier – no need to worry that we're going to slow things down enough to trip anti-debugging measures or network timeouts. I've prepared a record/repay log of Spotify playing 30 seconds of a song, which you can use to follow along with what comes next. The recording is 12 billion instructions, which gives us a lot of data to work with!

Just for fun, here's a movie of that replay, generated by taking screenshots throughout the replay and then stitching them into a video:



Some Theory


The next challenge is to figure out how we can identify the function that takes in encrypted data and outputs decrypted data. For this we turn to the excellent work of Ruoyu Wang, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna [1]. Their clever insight was that when you look at the distribution of bytes in encrypted vs. compressed streams, the byte entropy of the two is very similar, but compressed streams don't look very random. To illustrate this, let's look at the histograms for an encrypted mp3 file, and its decrypted version. First, encrypted:


Now the same file, decrypted:


You can clearly see that the one on the bottom looks significantly less "random" – or more precisely, the distribution of bytes is not very uniform. However, if we compute the byte entropy of each, they are both very close to the theoretical maximum of 8 bits per byte – the mp3 has 7.968480 bits of entropy per byte, whereas the encrypted file has 7.999981 bits per byte.

We can make this intuition more precise by turning to statistics. The Pearson chi-squared test (χ2) lets us compute a value for how much an observed distribution deviates from some ideal distribution. In this case, we expect the bytes in an encrypted file to be uniformly random, so we can compare with the uniform distribution by computing:


Here, Oi is the observed frequency of each byte, and Ei is the expected frequency, which for a uniform byte distribution with n samples will be (1/256)*n.

Similarly, the entropy of some ovserved data can be computed as:


Where p(xi) is the observed frequency of each byte value in the data.

Based on the work of Wang et al., if we find a function that reads a lot of high-entropy, highly random data, and writes a lot of high-entropy, non-random data, that's likely to be our guy!

Enter the PANDA


But enough theory. How do we actually gather the data we need in PANDA? We will want some way of gathering, for each function, statistics on the contents of buffers read and written by each function in the replay. As it happens, PANDA has a plugin called unigrams that will get us the data we want.

The
unigrams plugin works by tracking every memory read and write made by the system. When it sees a read or write, it looks up the current process context (i.e., CR3 on x86), program counter, and the callsite of the parent function (this last is done with the help of the callstack_instr plugin). Together, these three pieces of information allow us to put the individual memory access in context and separate out memory accesses made in different program contexts into coherent streams of data. So to gather the raw data we want, we can just run:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay spotify \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_callstack_instr.so \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_unigrams.so

This produces two files, unigram_mem_read_report.bin and unigram_mem_write_report.bin. The format of these files isn't terribly interesting, but they can be parsed using the Python code found in the unigram_hist.py script. Essentially, it consists of many, many rows of data that have the (callsite, program counter, CR3) triple followed by an array of 256 integers giving the number of times each byte was read or written at that point in the code.

Armed with this data, we want to now go through each callsite and look for those that meet the following criteria:

  1. The function both reads and writes a lot of data, in roughly equal amounts.
  2. The byte entropy of the data read is high, and its χ2 value (deviation from random) is low.
  3. The byte entropy of the data written is high, and its χ2 value is high.
This is precisely what the find_drm.py script does. We can run it like so:

./find_drm.py unigram_mem_read_report.bin unigram_mem_write_report.bin

Among its output, we find the following promising candidate:


(00719b84 3f1ac2e0): 3 x 1 combinations
  Read sizes:  44033, 701761, 701761
  Write sizes: 701761
  Read rand:  2.238299, 258.176922, 263.599258
  Write rand: 142018.776009
  Best input/output ratio (0 is best possible): 0.0

This function read two buffers of size 701,761 bytes and wrote one of size 701,761 bytes – given that we played 30 seconds of the song, this looks just about right. The randomness of the input buffers was quite high (recall that in the χ2 test, high numbers mean the data observed is less likely to be random), but the output buffer was not very random.


Dumping the Data


So how can we confirm our guess? Well, the easiest thing is to simply dump out the data seen at that point. If we go back up to the beginning of the output of the script, we have a list of all the (callsite, program counter, CR3) identifiers for reads and writes that matched our criteria. Looking through the writes for our candidate callsite (00719b84), we find it here:


(00719b84 0042e2ed 3f1ac2e0): 701761 bytes

We can now use another PANDA plugin, tapdump, to dump out all the data flowing through that point in the program. First we create a text file named tap_points.txt in the QEMU directory, and put in it:


00719b84 0042e2ed 3f1ac2e0

Next we run the replay again with the tapdump plugin enabled.

x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay spotify \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_callstack_instr.so \
  -panda-plugin x86_64-softmmu/panda_plugins/panda_tapdump.so

This produces two files, read_tap_buffers.txt.gz and write_tap_buffers.txt.gz, which contain the data read and written at the specified points. If you examine this with zless, you'll see lots of lines of addresses, followed by a single byte value. Separating out each field onto its own line and annotating, these are: 

0000000082678e78 [Caller 13]
000000008260dcc3 [Caller 12]
[...]
000000000071a1a5 [Caller 2]
0000000000719b84 [Caller 1]
000000000042e2ed [PC]
000000003f1ac2e0 [Address space]
000000000b256570 [Write address]
       269882976 [Index]
              4f [Data]

The extra callstack information is included so that, if necessary, more calling context can be used to pull out just the data we're interested in. In our case, however, just one level turns out to be enough. Finally, we want to turn this text file into a binary stream. In the scripts directory, there is a script called split_taps.py which will go through a gzipped tapdump output file and separate out each distinct stream found in the file (based on our usual identifier of (callsite, program counter, CR3)).

So now we can run this on the writes seen at our candidate for the decryption function:

./split_taps.py write_tap_buffers.txt.gz spotify

And obtain spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat, which contains the binary data written at program counter 0x0042e2ed, called from callsite 0x00719b84, inside of the process with CR3 0x3f1ac2e0. So, is this audio we seek?

$ file spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat 

spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat: Ogg data

This looks good! Of course, the proof of the pudding is in the eating, and the proof of the audio is in the listening, so do...

$ cvlc spotify.0000000000719b84.000000000042e2ed.000000003f1ac2e0.dat

And you should hear a rather familiar tune :)


Concluding Thoughts


As I mentioned in the disclaimer, this by itself is just the starting point for what you would need to really break Spotify's DRM. It doesn't give you a way to obtain the key for each song and decrypt it wholesale. Instead, you would have to place a hook in the function identified by this process and pull it out as it's played, which limits it to realtime decryption (and Spotify's packing and anti-debugging may make it hard to place the hook in the first place!). Although I can certainly imagine more efficient processes, I think for now this is a nice balance between enabling piracy and showing off the power of PANDA.

If you now want to get a better understanding of the function we found inside Spotify, you can create a memory dump, extract the unpacked Spotify binary (which is packed with Themida) using Volatility, and the load it up in IDA and go to 0x0042e2ed, which is the location where decrypted data is written out.


Postscript


One may wonder what happens when the function that contains 0x0042e2ed is called by others. As it turns out, this appears to be a generic decryption function that is used for other media throughout Spotify, including album art! It is left as an exercise to the reader to dump and examine the rest of the data that this function decrypts.


References


[1] Steal This Movie: Automatically Bypassing DRM Protection in Streaming Media Services. Wang, R., Shoshitaishvili, Y., Kruegel, C., and Vigna, G. USENIX Security Symposium, Washington, D.C., 2013.

Comments

chx said…
Eh, anyone wanting to pirate songs can do it much easier, ever heard of BitTorrent :) ? Very interesting low level hackery, congrats.
Anonymous said…
This is like fisting, fun but counterproductive.
antek said…
Great stuff. I really liked the method you've used to locate the decryption function.
zeno said…
Does it still work?

Popular posts from this blog

Someone’s Been Messing With My Subnormals!

Decrypting LSA Secrets

SysKey and the SAM