Patching Splay Encryption to Weaken Chosen Plaintext Attacks

Douglas W. Jones
University of Iowa Department of Computer Science

The Weakness

To attack a splay-tree based encryption program, all that is needed is a string that erases the state of the tree set by the key and sets the tree to a known state. On an 8-bit alphabet, the string

aaaaaaaa

will do this for the letter a. The first occurance of a shortens the code for a from a possible (but unlikely) 256 bits to 128 bits, and each repetition of the letter shortens the code by a factor of two. Thus, after 8 repetitions, the code for a is guaranteed to be 1 bit.

If we include 8 repetitions of each possible letter, for example, with the string starting

	aaaaaaaabbbbbbbbccccccccddddddddeeeeeeee

we will gain control of the state of the entire splay tree.

Injecting Random Characters

One way to defeat the above attack is to disrupt the controlled plaintext. For example, consider injecting a random character into the string every 8 characters. The controlled plaintext suggested above would then be compressed as:

        aaaaaaaaVbbbbbbbbWccccccccXddddddddYeeeeeeeeZ

Where VWXYZ is a random string. While the code for a after the string aaaaaaaaV is only slightly disrupted (it will be a 2 bit code with 1 bit unknown), the effects of the disruptions are cumulative, and the attacker must work out the details of the random string VWXYZ in order to complete the attack.

Of course, the random string must either be genuinely random or it must be produced by a cryptographically secure pseudorandom number generator. Simple pseudorandom strings won't suffice!

Modern computers are fast and inexpensive enough that an attacker could easily afford a large number of machines running in parallel. Dragos Ruiu has proposed a simple change to this scheme that greatly increases the difficulty the attacker would face. Instead of injecting the random noise at fixed intervals, the noise is injected at pseudorandom intervals. The sender and receiver must use the same pseudorandom sequence for the intervals so that the receiver can discard the noise, and the seed for this sequence can be part of or can be derived from the key.

Run Length Encoding

Another way to make the controlled plaintext attack outlined above more difficult is to eliminate repetition from the source string. This can be done, for example, by using run-length encoding, so the attack string given above would be converted to the following prior to compression:

a8b8c8d8e8

Of course, we don't want to use characters from within the plaintext alphabet to represent repeat counts; instead, we extend the splay tree by adding extra leaves that are used for repeat counts. Thus, for example, on a tree used to encode an 8-bit alphabet, we would use leaf numbers above 256 to encode repeat counts.

Simple run-length encoding does not cure the problem! Consider, for example, the attack string

	abababababababab

This string reveals almost as much about a and b as the string

	aaaaaaaabbbbbbbb

It is tempting to include a mechanism to apply repeat counts to substrings, so that this new attack would be coded as follows

	(ab)8

Unfortunately, this does not entirely solve the problem. Consider, for example, this controlled plaintext

	aVaWaXaYaZa

Here, even if VWXYZ is an entirely random string, each repetition of the letter a will shorten the code for a by a factor of 2, and each random insert will lengthen the code for a by at most 1. Thus, in the end, we get a 2-bit code for a instead of a 1-bit code for a, but the structure of the tree is still revealed! These random insertions, however, will quickly scramble the code for a, so that, by the time the code for b is revealed, the code for a will be at least 8 bits long.

If the user is allowed to control the intervening characters VWXYZ, the entire tree can be revealed!

An Idea

Combining run-length encoding with sufficiently frequent random insertions of random data may be sufficient to disrupt controlled-plaintext attacks. Given the limited amount that is known about splay-tree based codes, however, it would be unwise to rely on the security of such a combination!

I wish to thank Dragos Ruiu (dr@kyx.net) for permission to incorporate his proposals from our discussion of May 3 and 4 2001 into this document.