Readit News logoReadit News
ctur · 2 years ago
What a great historical summary. Compression has moved on now but having grown up marveling at PKZip and maximizing usable space on very early computers, as well as compression in modems (v42bis ftw!), this field has always seemed magical.

These days it generally is better to prefer Zstandard to zlib/gzip for many reasons. And if you need seekable format, consider squashfs as a reasonable choice. These stand on the shoulders of the giants of zlib and zip but do indeed stand much higher in the modern world.

michaelrpeskin · 2 years ago
I had forgotten about modem compression. Back in the BBS days when you had to upload files to get new files, you usually had a ratio (20 bytes download for every byte you uploaded). I would always use the PKZIP no compression option for the archive to upload because Z-Modem would take care of compression over the wire. So I didn't burn my daily time limit by uploading a large file and I got more credit for my download ratios.

I was a silly kid.

EvanAnderson · 2 years ago
That's really clever and likely would have gone unnoticed by a lot of sysops!
stavros · 2 years ago
That sounds like it can be fooled by making a zip bomb that will compress down to a few KB (by the modem), but will be many MB uncompressed. Sounds great for your ratio, and will upload in a few seconds.
lxgr · 2 years ago
> These days it generally is better to prefer Zstandard to zlib/gzip for many reasons.

I'd agree for new applications, but just like MP3, .gz files (and by extension .tar.gz/.tgz) and zlib streams will probably be around for a long time for compatibility reasons.

pvorb · 2 years ago
I think zlib/gzip still has its place these days. It's still a decent choice for most use cases. If you don't know what usage patterns your program will see, zlib still might be a good choice. Plus, it's supported virtually everywhere, which makes it interesting for long-term storage. Often, using one of the modern alternatives is not worth the hassle.
emmelaich · 2 years ago
Fun fact: in a sense. gzip can have multiple files, but not in a specially useful way ...

    $ echo meow >cat                                                            
    $ echo woof > dog                                                           
    $ gzip cat                                                                  
    $ gzip dog                                                                  
    $ cat cat.gz dog.gz >animals.gz                                             
    $ gunzip animals.gz                                                         
    $ cat animals                                                               
    meow                                                                        
    woof

koolba · 2 years ago
> ... but not in a specially useful way ...

It can be very useful: https://github.com/google/crfs#introducing-stargz

DigiDigiorno · 2 years ago
It is specially useful, it is not especially/generally useful lol

It could be a typo, though I think when we say something "isn't specially/specifically/particularly useful" we mean "compared to the set of all features, specifically this subset feature is not that useful" not that the feature isn't useful for specific things

ericpauley · 2 years ago
Imo all file formats should be concatenable when possible. Thankfully ZStandard purposefully also supports this, which is a huge boon for combining files.

Fun fact, tar-files are also (semi-) concatenable, you'll just need to `-i` when decompressing. This also means compressed (using gz/zstd) tarfiles are also (semi-)concatenable!

billyhoffman · 2 years ago
WARC files (used by the Internet Archive to power the Wayback machine, among others) use this trick too to have a a compressed file format that is seek-able to individual HTTP request/response records
lxgr · 2 years ago
Wow, that's surprising (at least to me)!

Is there a limit in the default gunzip implementation? I'm aware of the concept of ZIP/tar bombs, but I wouldn't have expected gunzip to ever produce more than one output file, at least when invoked without options.

tedunangst · 2 years ago
It only produces one output. It's just a stream of data.
abhibeckert · 2 years ago
The limit is it doesn't do filenames or other metadata — it's limited to contents.
cout · 2 years ago
Interesting -- I did not realize that the zip format supports lzma, bzip2, and zstd. What software supports those compression methods? Can Windows Explorer read zip files produced with those compression methods?

(I have been using 7zip for about 15 years to produce archive files that have an index and can quickly extract a single file and can use multiple cores for compression, but I would love to have an alternative, if one exists).

ForkMeOnTinder · 2 years ago
7zip has a dropdown called "Compression method" in the "Add to Archive" dialog that lets you choose.
pixl97 · 2 years ago
Until windows 11, no, windows zip only seems to deal with COMPRESS/DEFLATE zip files.
melagonster · 2 years ago
For people who first read this: the sweet part is in the comments :)
dcow · 2 years ago
What’s even more sad is that the SO community has since consequently destroyed SO as the home for this type of info. This post would now be considered off topic as it’s “not a good format for a Q&A site”. You’d never see it happen today. Truly sad.
barrkel · 2 years ago
Thing is, it could only be that way in its early days, when the vanguard of users came to it from word of mouth, from following Joel Spolsky or Coding Horror or their joint podcast. The audience is much bigger now and with the long tail of people, the number willing to put effort into good questions is too low, and on-topicness is a simple quality bar which can improve the signal to noise ratio.
zxt_tzx · 2 years ago
Relatedly, I have seen the graph showing the dip in SO traffic by ~30% if I'm not wrong (and the corresponding hot takes that attribute that to the rise of LLMs).

I know most people are pessimistic that LLMs will lead to SO and the web in general to be overrun by hallucinated content and AI-training-on-AI-ouroboros, but I wonder if it might instead allow for curious people to query an endlessly patient AI assistant about exactly this kind of information. (A custom GPT perhaps?)

twic · 2 years ago
A rephrasing of this might be on-topic on retrocomputing: https://retrocomputing.stackexchange.com/q/3083/21450

But almost nobody reads that.

BeetleB · 2 years ago
This is somewhat revisionist. They would mark stuff like this as off topic even in the early days.
miyuru · 2 years ago
His stackexchange profile is a gold mine itslef.

https://stackexchange.com/users/1136690/mark-adler#top-answe...

stavros · 2 years ago
Hah, imagine asking Mark Adler for gzip history references.
dustypotato · 2 years ago
Found this hilarious:

> This post is packed with so much history and information that I feel like some citations need be added

> I am the reference

(extracted a part of the conversation)

tyingq · 2 years ago
Maybe a spoiler, but the "I" in "I am the reference" is Mark Adler:

https://en.wikipedia.org/wiki/Mark_Adler

signaru · 2 years ago
It's awesome how he is active on stack overflow for almost anything DEFLATE related. I once tried stuffing deflate compressed vector graphics into PDFs. Among other things, it turns out an Adler-32 checksum is necessary for compliance (some newer PDF viewers will ignore its absence though).
gmgmgmgmgm · 2 years ago
That's disallowed on Wikipedia. There, you must reference some "source". That "source" doesn't need to be reliable or correct, it just needs to be some random website that's not the actual person. First sources are disallowed.
bombela · 2 years ago
I learned this when I tried correcting the wikipedia page on Docker. I literally wrote the first prototype. But this wasn't enough source for wikipedia. And to this day the English page is still not truthfull (interestingly enough, the french version is closer to the truth).
kibwen · 2 years ago
And that's for good reason. Encyclopedias are supposed to be tertiary sources, not primary sources. Having an explicit cited reference makes it easier to judge the veracity of a statement compared to digging through the page history to figure out if a line was added by a person who happens to be an expert.
whalesalad · 2 years ago
Reminds me of when I was inadvertently arguing here on HN with the inventor of the actor model about what actors are
demondemidi · 2 years ago
That sounds like something I’d do too. If that makes you feel better.
chupasaurus · 2 years ago
Oh, I remember that conversation, it was fun.
FartyMcFarter · 2 years ago
"I'm the one who knocks".
matheusmoreira · 2 years ago
"I am the hype."
HexDecOctBin · 2 years ago
Is there an archive format that supports appending diff's of an existing file, so that multiple versions of the same file are stored? PKZIP has a proprietary extension (supposedly), but I couldn't find any open version of that.

(I was thinking of a creating a version control system whose .git directory equivalent is basically an archive file that can easily be emailed, etc.)

pizza · 2 years ago
New versions of zstd allow you to produce patches using the trained dictionary feature
kissgyorgy · 2 years ago

Deleted Comment