History of lossless data compression algorithms (2014)

Data compression is apparently equivalent to general intelligence:

http://mattmahoney.net/dc/rationale.html

...which gives way to the theory of mind known as compressionism:

http://ceur-ws.org/Vol-1419/paper0045.pdf

Perfect compression is equivalent to finding the Kolmogorov complexity of a string. That's incomputable and so perfect compression is beyond any algorithm, even a "generally intelligent" algorithm. Generality in compression might increase an algorithms effectiveness but there's no proof this generality is equivalent to generality in dealing with the world humans deal with.

pessimizer · 4 years ago

Not "perfect compression" but any compression. Perfect compression would be equivalent to perfectly digested experiences i.e. that anything that could be deduced from a collection of sensations had been deduced. The link above explains it well.

> In 2000, Hutter proved that finding the optimal behavior of a rational agent is equivalent to compressing its observations. Essentially he proved Occam's Razor, the simplest answer is usually the correct answer. The proof applies to any goal-seeking agent in any unknown environment which can be simulated by a computer. For example, an agent could be a person or computer whose goal is to accumulate money or some other reward signal in an unknown world.

> Formally, the agent and environment are a pair of interacting Turing machines that pass messages back and forth at each time step. In addition, the environment outputs a reward signal (a number) to the agent, and the agent's goal is to maximize the accumulated reward. What Hutter proved is that the optimal behavior of an agent is to guess that the environment is controlled by the shortest program that is consistent with all of the interaction observed so far.

> The problem of finding this program known as AIXI. AIXI implies a solution to optimal algorithmic compression: coding a string as the shortest program that outputs it. The problem is worth studying because it is hard. The general problem is not computable, although Hutter proved that if we assume time bounds t and space bounds l on the environment, then this restricted problem, known as AIXItl, can be solved in O(t2l) time.

edit: I mean, it makes intuitive sense. To have perfect compression of a string would be to have extracted all of the order out of it, every possible pattern, and every possible pattern of patterns etc. It's obviously impossible to know when you've gotten to that point.

joquarky · 4 years ago

Is there a way to do a simulated annealing approach that lets me put as much compute as I want into it for some marginal gains?

Dylan16807 · 4 years ago

That's a factor to some extent, but if you're sticking with lossless then the better your compression gets the more your remaining data is dominated by noise.

gfody · 4 years ago

that's why you need two brains, one to losslessly compress through symbolic expressiveness and another to lossily compress by generating noise like the incompressible noise

tomcam · 4 years ago

Username particularly appropriate

Not mentioned here is Fabrice Bellards record breaking (at an enormous compute cost) compressor https://bellard.org/nncp/.

lucb1e · 4 years ago

What does "ratio (bpb)" mean? I'd guess bytes-per-byte or something, like how many bytes of original you get for each byte of compression, but it doesn't work out: the original size is 1e9 bytes, compressed (rounded) 3.2e8, so that's a ratio of 3.1 (1e9/3.2e8=3.09989). The program size amounts to a rounding error on that figure. The bpb value given is 2.58, nowhere near 3.1.

Edit: the paper defines it as "bits per input byte". What kinda measure is that, it's like "how well did it compress as compared to a factor 8", why 8?!

jltsiren · 4 years ago

The bit is the most fundamental unit of information. A base-e unit might be more elegant from a certain mathematical perspective, but the connections to formal logic and the ease of implementation make the base-2 bit the natural choice. At least when talking about things like information, entropy, and compression.

Bytes, on the other hand, are entirely arbitrary. At some point, the industry converged to using groups of 8 bits as the primary semantically meaningful unit smaller than a word. Probably because people at that time thought that having 256 distinct characters would be more or less the right choice. And because groups of power-of-2 bits are convenient on hardware level.

Entropy is usually expressed as bits per symbol (or bits per character), because that's what you get when you sum -P(c) log P(c) over all symbols c. People who are used to that convention often extend it to representing compression ratios. Using bits per byte is rare, because bytes are rarely semantically meaningful.

isaacimagine · 4 years ago

Given 8 bits of input, how many bits does it take to reconstruct the original 8, on average?

8 / 2.58 = 3.1

dgacmu · 4 years ago

It's a common way to represent entropy (the information content). One could measure bits per x for any x, of course, but bits per character (nee byte) is quite common and goes back to Shannon.

optimalsolver · 4 years ago

natly · 4 years ago

ctur · 4 years ago

This seems pretty incomplete if it doesn't include anything about brotli, lz4, or zstd, or Finite State Entropy in general. It's basically missing the past decade or more.

There's a lot of good history here, but just be aware it is missing the state of the art and thus shouldn't be used for deciding anything about algorithms or approaches to use today.

_delirium · 4 years ago

> It's basically missing the past decade or more.

The page history [1] shows that the article was mostly written in December 2011, with only minor edits since then, so that sounds about right.

[1] https://ethw.org/w/index.php?title=History_of_Lossless_Data_...

Does that mean this is also outdated? I'm not sure if zstd, brotli, etc. are PAQ-like

> on the bleeding edge of archival software are the PAQ* formats

For "a history of" I don't mind if they don't include the latest decade which is still subject to interpretation for how to structure the text and emphasize key moments/inventions and such, but if they make statements pertaining to the present that's a different story.

felixhandte · 4 years ago

No widely deployed compressors are PAQ-like. The speed-up anticipated by the OP never materialized.

pixelpoet · 4 years ago

The story of Phil Katz (the PK in PKZIP) is a fascinating and very sad one: https://web.archive.org/web/20000829071343/http://www2.jsonl...

motohagiography · 4 years ago

Is the theoretical limit to compression basically what Kolmolgorov complexity is, or is there another shannon/information theory theorem that defines the maximum possible lossless compression in terms of symbols over a channel, where the channel represents the algorithm? ("compression theorem" refers to something else more abstract I think.)

Intuitively it seems like there would have to be one for lossless compression, and a proof that a theorem for the limits of lossy compression was either impossible or an open loop/Hard problem.

Yes, the Kolmolgorov complexity of a string X is the minimum length of a program/string that when fed into an efficient universal computer will yield back the string X. The Kolmolgorov complexity of a string is also undecideable - it's essentially equivalent to the halting problem.

"Best Lossy Compression" is ambiguous - how much loss is acceptable is isn't defined in the phrase.

dhdc · 4 years ago

The theorem you are looking for is Shannon's Source Coding Theorem[1]. It basically states that no encoding scheme can losslessly compress data beyond the given data's Shannon entropy.

[1]: https://en.m.wikipedia.org/wiki/Shannon's_source_coding_theo...

Amazing, thank you, it had to be something like that. It makes much more sense as an encoding limitation than as something abstract as the "shortest possible program" via Kolmolgorov complexity. I was wondering whether KC may just be an inconsistently defined thought experiment compared to concrete ideas from information theory, and if it's not, how much in common it would necessarily have with the coding theorem, as how are they not talking about the same thing.

bumper_crop · 4 years ago

Very timely; I found out yesterday that the UPX program (EXE specific compressor from many years ago) was made by the same guys who made LZO. I had this realization that there is a progeny from people who write compression stuff. In a similar vein, The Rob Pike post from yesterday mentioned Ken Thompson had made a custom audio compression for the Plan 9 release. He also made the UTF-8 compressor too. I love seeing how these people keep refining the state of the art.

Jerrrry · 4 years ago

The thoughts around information theory, entropy, compression, emergent behavior, and the intersectional limits of all of them are quite reminiscent and memetic to those who are interested.

Complexity is fuckin cool.

a-dub · 4 years ago

you may enjoy this text: http://www.inference.org.uk/itila/book.html

mtn_nerd · 4 years ago

How could they not include the middle-out algorithm?

dang · 4 years ago

Discussed at the time:

History of Lossless Data Compression Algorithms - https://news.ycombinator.com/item?id=8088842 - July 2014 (44 comments)