Illegal Numbers
Op-EdCosmic Cutie
This image was illegal to display. Here's why.
In April 2007, the Advanced Access Content System (AACS) License Administrator and the Motion Picture Association (MPA) took legal action against those who published a particular number. This sixteen-byte prime was the encryption key used in HD DVDs and Blu-ray discs. When the cease and desist letters began to be sent to institutions which published this key, there was outrage. People do not take kindly to being told that there are numbers one cannot speak of. The whole idea of "owning" a number or making a number "illegal" seemed absurd. Predictably, the number began to be spread more and more by people who wanted to make it known to the AACS that their heavy-handed response to leaked keys was unacceptable, leading to the number in question becoming hugely popular. The number became a part of popular culture, finding itself in songs, poems, music videos and T-shirts. The number was compressed down to hexadecimal form and used to generate colours in the RGB scale, resulting in the creation of a "free speech flag" (pictured).
Enough beating around the bush.
This is the infamous number, in its entirety:
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
(written in hexadecimal notation)
Hardly looks sinister, does it? Makes one wonder how the AACS could justify trying to censor it.
One of the cornerstones of digital computing is that states are discrete. Consequently, the data used in a digital system is also discrete. When one says that something is discrete, one implies that the thing can be totally represented by a binary string of finite length. This is a foundational part of discrete mathematics. Now consider this interesting fact: every binary string of a finite length has a unique number associated with it. It is easy to see why this is the case. All one has to do is group the strings based on length, arrange the groups in increasing order of length and arrange the strings within each group in increasing order of the value it represents.
0 , 1 ,
00 , 01 , 10 , 11 ,
000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 ...
Now let the number corresponding to the r th string be r. We have just created a one-to-one map between the set of binary strings and the set of natural numbers. This may seem like a trivial result in an esoteric branch of mathematics. But the implications of this are enormous. Every book written, every picture taken, every sound recorded... anything that can be stored on a computer is stored as a binary string. And every binary string is just a natural number. The sum total of all the information generated by humanity is contained in the set of natural numbers. N is the apotheosis of our species.
"All things are Number."
Now, you may still think, "Of course, N can contain every file we have created or will create because it is infinite. What is so revolutionary here?!". Note how N contains every file, including the proprietary ones like x09F9... as well as the highly illegal ones. This leads to an interesting question. If N contains illegal information, isn't N itself illegal? After all, I cannot insert an illegal document into a recipe for cake and claim that the recipe is still legal because it is not strictly the same as the illegal document it contains. It stands to reason then that a file which contains illegal content is itself illegal. N, which contains every illegal file, should surely then be illegal.
This is where it gets interesting. Consider an n-bit file which is illegal to transmit. Now suppose that I create two new n-bit files, such that each of the files individually appears to be a random n-bit binary string but, when combined, produces the illegal file. How would I do this? Good question. (Skip the rest of the paragraph if you are more interested in the philosophical implications than the mathematics). I start by generating an actual random binary string of length n. Nothing illegal about that, right? (Whether it is possible to generate a truly random number is a can of worms I do not wish to open here). I now perform a bitwise XOR between the illegal and random files. The file thus produced, when XOR'ed bitwise with the random file, gives back the illegal file. By the properties of XOR:
A XOR A = 0
0 XOR A = A
A XOR (B XOR C) = (A XOR B) XOR C
thus:
B XOR (A XOR B) = B
B is a bit from the illegal file, and A is a bit from the random file. The neat part of the whole operation is that the random file is indistinguishable from the third file. The third file is produced by a random process, same as the second file. Thus one cannot say which file was generated randomly and which one was generated using XOR. This is because the bitwise XOR of a bit B with another random bit is like flipping a coin and inverting B if the coin lands heads. Thus the resulting bit is also random. When applied to the file as a whole, we see that the illegal file is broken into two files which both appear to be randomly generated.
Now assume I transmit one such file. Would I be in violation of the law? Surely I cannot be, since all I did was transmit random noise. Now I send the other file as well. Transmitting this file cannot be illegal either since this file is just random noise. But by transmitting both at the same time, I break the law. Let's take this a step further. If the two files generated were each broken again to form four total files, they could still be used to reconstruct the illegal file. In fact, I can repeat the breaking process an arbitrary number of times without losing any information about the illegal file. Now imagine I do this a very, very large number of times. What I am left with is every possible n-bit file. Thus if the illegality of a file is invariant under this operation, the illegality of a single file makes every file of the same size illegal. On the other hand, if the illegality of such a file does not persist when the file is broken into two using XOR, the law becomes toothless since it is trivially easy to legally transmit illegal files.
Here is another scheme I can use. Say I want to transmit an illegal number p. All I have to do is locate the pth prime number and transmit it. Since the primes are ordered, the value of p can be calculated from the pth prime number. To prevent this, it must be the case that for every illegal number p, the pth prime number is also illegal. The OEIS (Online Encyclopedia of Integer Sequences) might be in trouble then, since they list a lot of primes.
But there is nothing special about prime numbers per se. I could transmit the pth Fibonacci number, or the pth Catalan number, or p2, or 2p or indeed, any x such that f(x)=p. For any x, there exists a function f such that f(x)=p. This would mean every number is illegal.
But wait. Surely, transmitting x alone is not enough. I would need to transmit information about how to compute f, too. Maybe it is only (f,x) which is illegal to transmit, and f and x are both individually legal to transmit. This leads to another issue. What if I fix either x or f to be something standard and then transmit only the other value? If someone were to say, x is illegal to transmit because f(x)=p, I could defend myself by saying that they were choosing a specific f after the fact to make x illegal. After all, there are lots of standard functions.
For any natural number n, there must exist a binary string which contains as its substrings all possible binary strings of length n. This is trivially true since every such binary string of length n can be concatenated. For example, let n be 2. The strings are 00 , 01 , 10 , 11. Via concatenation, we get 00011011, which contains all binary strings of length 2.
B(n) = {s:s∈{0,1}*, ∀x∈{0,1}ⁿ x is a substring of s}
where
{0,1}ⁿ = set of all binary strings of length n
{0,1}* = set of all binary strings of finite length
Let s(n) be the first string when the elements of B(n) are ordered. s(n) must exist (since B(n) cannot be empty) and is unique. Thus given n, s(n) can be computed. Now all I need to do to specify an n-bit string x is to specify:
- n
- the starting index k in s(n) where x can be found.
Thus transmitting (n,k) becomes sufficient to transmit an illegal file of n-bits. n does not carry much information; all the information is carried in k. So the rational response seems to be to make k also illegal.
But enough mathematics. Let's conduct a very simple thought experiment. Imagine a monkey on a typewriter. This monkey is immortal, needs no food or water, and types random characters at an insanely high rate. Given enough time, the monkey will generate an illegal file purely due to luck. The monkey has broken the law while having no understanding of files or binary strings, or illegality. To take this further, the monkey is not strictly necessary. A simple machine which generates random numbers will do the trick. But such machines are all around us... the eddies in a pool, the swirling of the wind, the random nonsense scribbled into answer sheets during exams, even atoms colliding with each other can be used to generate random numbers. If a soulless unthinking automaton, placed in a solitary box to pick out random numbers, can break the law, what does it say about the law?
Maybe it doesn't say much since laws are not meant to be unambiguous, for if they were, we wouldn't need courts to determine the edge cases. Those who intend to transmit illegal files are punished, while those who generate random numbers are not. This seems like a pragmatic approach. It seems reasonable too, until we dig a little deeper and see that the line between what is protected as free speech and what is an illegal file can be blurred.
I now bring you the curious tale of DeCSS, a program written to rip DVDs (essentially copy them without permission). It first surfaced in the late 90's and spread quickly among the Linux community since open-source operating systems could not play DVDs, which were encrypted. (Why exactly open-source operating systems could not play DVDs is another very interesting story, involving shadowy cabals in smoke-filled rooms plotting to "protect their copyright"). Predictably, the usual suspects at the MPA were not happy with what was happening. Jon Lech Johansen, one of the three developers of DeCSS, had his home raided by the Norwegian police and was put on trial. The other two developers had the good sense to remain anonymous. For all intents and purposes, DeCSS.exe was an illegal file.
Enter a hacker named Seth Schoen, who proceeded to turn DeCSS.exe into a 465-stanza haiku which gave a perfect description of the algorithm used by DeCSS.
"(I abandon my
exclusive rights to make or
perform copies ofthis work, U. S. Code
Title Seventeen, section
One Hundred and Six.)Muse! When we learned to
count, little did we know all
the things we could dosome day by shuffling
those numbers: Pythagoras
said "All is number"long before he saw
computers and their effects,
or what they could doby computation,
naive and mechanical
fast arithmetic.It changed the world, it
changed our consciousness and lives
to have such fast mathavailable to
us and anyone who cared
to learn programming.Now help me, Muse, for
I wish to tell a piece of
controversial math,for which the lawyers
of DVD CCA
don't forbear to sue:that they alone should
know or have the right to teach
these skills and these rules.(Do they understand
the content, or is it just
the effects they see?)And all mathematics
is full of stories (just read
Eric Temple Bell);and CSS is
no exception to this rule.
Sing, Muse, decryptiononce secret, as all
knowledge, once unknown: how to
decrypt DVDs."
Censoring a 465-stanza haiku is considerably harder than censoring an executable file since censoring art results in much more backlash than censoring a binary string. Even though the poem is mathematically equivalent to a binary string, the emotions associated with a poem are simply not found in a binary string. Poems speak to the soul (in this case, to the compiler as well) in a way that numbers simply do not. Besides giving Schoen a place in the annals of hacker history, the haiku showed the world that source code must be protected by free speech laws, even if (and especially when) the algorithm it describes is a way to break the law.