Patents on compression of random data

Patent 5,488,364 on compression of random data

This document is an analysis of patent 5,488,364 on the compression of random data. Read first the analysis of patent 5,533,051 on the same subject if you have not yet done so.

  Patent 5,488,364 "Recursive data compression".
  Filed Feb. 28, 1994; granted Jan. 30, 1996
  Inventor Michael L. Cole

The full text of the patent is available here (uncompressed or compressed). Thanks to Gregory Aharonian for providing this text.

The patented method is claimed to be able to compress random data, and thus to be applicable recursively to already compressed data. However it merely transforms a random stream to another almost equally random stream:

step 29 determines a "bit-reversal key": all data will be xored with this key. Random input data is thus mapped to random data by the xor operation.
in step 44, find pattern p so that data & p == p most often, excluding p == 0. (This generally results in a p with only one bit set.) Use this p as "keyword".
at step 66, move the bits selected by the keyword to a first block, other bits to a second block. Both sequences of bits were random, so both blocks are still random. The systematic bit inversion also maps random data to random data.

The patent summary states:

Averaged over a statistically significant sample of random input data, the first block will have a percentage of adjacent bits of the "1" state that is high relative to that of the original input data, and the second block will have a percentage of adjacent bits of the "0" state that is high relative to that of the original input data.

This is the flaw. For random input data, the first block will have by construction a simple majority of "1" bits, but not a high percentage of "1" bits. The very small reduction in the entropy of each block is cancelled by the bits required to specify the "bit-reversal key" and the "keyword".

The patent claims themselves are harmless. They describe a compression method which will actually slightly expand random data instead of compressing it. It is therefore useless for recursive compression.

Jean-loup Gailly

Back to the gzip home page
back to Jean-loup's page