Patent 5,533,051 "Method for Data Compression" Filed Mar. 12, 1993; granted Jul. 2, 1996 Inventor David C. James; Assignee: The James Group

The first line of the patent abstract is:

Methods for compressing data including methods for compressing highly randomized data are disclosed.Page 3, line 34 of the patent states:

A second aspect of the present invention which further enhances its ability to achieve high compression percentages, is its ability to be applied to data recursively. Specifically, the methods of the present invention are able to make multiple passes over a file, each time further compressing the file. Thus, a series of recursions are repeated until the desired compression level is achieved.Page 27, line 18 of the patent states that the claimed method can compress without loss

the direct bit encode method of the present invention is effective for reducing an input string by one bit regardless of the bit pattern of the input string.A very simple counting argument shows that this is mathematically impossible:

Assume that the program can compress without loss all files of size N bits or more. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless.For more information about the counting argument, see section 9.2 of the FAQ for newsgroup comp.compression.

If the method were indeed able to shrink any file by at least one bit, applying it recursively would shrink gigabytes down to a few bits.

The patent contains evasive arguments to justify the impossible claims:

Page 12, line 22:

Of course, this does not take into account any overhead registers or other "house-keeping" type information which must be tracked. However such overhead tends to be negligible when processing the large quantities of data typically encountered in data compression applications.Page 27, line 17:

Thus, one skilled in the art can see that by keeping the appropriate counters, the direct bit encode method of the present invention is effective for reducing an input string by one bit regardless of the bit pattern of the input string. Although a certain amount of "loss" is necessary in keeping and maintaining various counters and registers, for files which are sufficiently large, this overhead is insignificant compared to the savings obtained by the direct bit encode method.The flaw in these arguments is that the the "house-keeping" type information is

The patent contains even more evasive arguments:

Page 22, line 31:

It is commonly stated that perfectly entropic data streams cannot be compressed. This misbelief is in part based on the sobering fact that for a large set of entropic data, calculating the number of possible bit pattern combinations is unfathomable. For example, if 100 ones and 100 zeros are randomly distributed in a block 200 bits long, there areThe actual claims of the patent are harmless since they only describe methods which cannot work (they actually expand random data instead of compressing it). For example, claims 6 and 7 are:200C100 = 9.055 10^58combinations possible. The numbers are clearly unmanageable and hence the inception that perfectly entropic data streams cannot be compressed. The key to the present compression method under discussion is that it makes no attempt to deal with such large amounts of data and simply operates on smaller portions.

6. A method of compressing a stream of binary data, comprising the steps of:Making abstraction of the legalese, claim 7 says in short that you can compress an arbitrary sequence of two bits down to one bit.

- parsing n-bits from said stream of binary data;
- determining the value of said parsed n-bits;
- based on the results of step B, coding said values of said n-bits in at least one of a first, second, and third target string, wherein coding said value includes generating a plurality of code strings and correlating said value with one of said code strings and dividing said correlated code string variable length codes and dividing at least some of said into at least first and second segments, and assigning at least one of said correlated code string segments to at least one of said first, second, and third target strings, wherein at least one of said plurality of codes is not greater than n-1 bits long.
7. The method of compressing a stream of binary data of claim 6, wherein n=2.

It took three years to the patent office to ascertain the validity of such a patent. A person with basic knowledge in mathematics and data compression can find the flaws immediately upon first reading.

Check here for an analysis of patent 5,488,364 on the same subject.

Jean-loup Gailly