Patents on compression of random data

Patent 5,533,051 on compression of random data

The US patent office no longer grants patents on perpetual motion machines, but has recently granted at least two patents on a mathematically impossible process: compression of truly random data. This document is an analysis of patent 5,533,051; see below for an analysis of another patent on the same subject.

  Patent 5,533,051 "Method for Data Compression"
  Filed Mar. 12, 1993; granted Jul. 2, 1996
  Inventor David C. James; Assignee: The James Group

The first line of the patent abstract is:

Methods for compressing data including methods for compressing highly randomized data are disclosed.

Page 3, line 34 of the patent states:

A second aspect of the present invention which further enhances its ability to achieve high compression percentages, is its ability to be applied to data recursively. Specifically, the methods of the present invention are able to make multiple passes over a file, each time further compressing the file. Thus, a series of recursions are repeated until the desired compression level is achieved.

Page 27, line 18 of the patent states that the claimed method can compress without loss all files by at least one bit:

the direct bit encode method of the present invention is effective for reducing an input string by one bit regardless of the bit pattern of the input string.

A very simple counting argument shows that this is mathematically impossible:

Assume that the program can compress without loss all files of size N bits or more. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless.

For more information about the counting argument, see section 9.2 of the FAQ for newsgroup comp.compression.

If the method were indeed able to shrink any file by at least one bit, applying it recursively would shrink gigabytes down to a few bits.

The patent contains evasive arguments to justify the impossible claims:

Page 12, line 22:

Of course, this does not take into account any overhead registers or other "house-keeping" type information which must be tracked. However such overhead tends to be negligible when processing the large quantities of data typically encountered in data compression applications.

Page 27, line 17:

Thus, one skilled in the art can see that by keeping the appropriate counters, the direct bit encode method of the present invention is effective for reducing an input string by one bit regardless of the bit pattern of the input string. Although a certain amount of "loss" is necessary in keeping and maintaining various counters and registers, for files which are sufficiently large, this overhead is insignificant compared to the savings obtained by the direct bit encode method.

The flaw in these arguments is that the the "house-keeping" type information is not negligible. If it is properly taken it into account, it cancels any gains made elsewhere when attempting to compress random data.

The patent contains even more evasive arguments:

Page 22, line 31:

It is commonly stated that perfectly entropic data streams cannot be compressed. This misbelief is in part based on the sobering fact that for a large set of entropic data, calculating the number of possible bit pattern combinations is unfathomable. For example, if 100 ones and 100 zeros are randomly distributed in a block 200 bits long, there are
200C100 = 9.055 10^58
combinations possible. The numbers are clearly unmanageable and hence the inception that perfectly entropic data streams cannot be compressed. The key to the present compression method under discussion is that it makes no attempt to deal with such large amounts of data and simply operates on smaller portions.

The actual claims of the patent are harmless since they only describe methods which cannot work (they actually expand random data instead of compressing it). For example, claims 6 and 7 are:

6. A method of compressing a stream of binary data, comprising the steps of:

parsing n-bits from said stream of binary data;
determining the value of said parsed n-bits;
based on the results of step B, coding said values of said n-bits in at least one of a first, second, and third target string, wherein coding said value includes generating a plurality of code strings and correlating said value with one of said code strings and dividing said correlated code string variable length codes and dividing at least some of said into at least first and second segments, and assigning at least one of said correlated code string segments to at least one of said first, second, and third target strings, wherein at least one of said plurality of codes is not greater than n-1 bits long.

7. The method of compressing a stream of binary data of claim 6, wherein n=2.

Making abstraction of the legalese, claim 7 says in short that you can compress an arbitrary sequence of two bits down to one bit.

It took three years to the patent office to ascertain the validity of such a patent. A person with basic knowledge in mathematics and data compression can find the flaws immediately upon first reading.

Check here for an analysis of patent 5,488,364 on the same subject.

Jean-loup Gailly

Back to the gzip home page
back to Jean-loup's page