I will add a section to the readme, this is a good question that other people might have too!
I will add a section to the readme, this is a good question that other people might have too!
Piping a seekable file for decompression via stdin isn't possible unfortunately. Decompression of seekable files requires to read the seek table first (which is usually at the end of the file) and eventually seek to the desired frame position, so zeekstd needs to able to seek the file.
If you want to decompress the complete file, you can use the regular zstd tool: "cat seekable.zst | zstd -d"
This is basically the only function I use from zstd_seekable, so it would be nice to have that in zeekstd as well.
The decompress function in zstd-seekable starts decompression at the beginning of the frame to which the offset belongs and discards data until the offset is reached. It also just stops decompression at the specified offset. Zeekstd uses complete frames as the smallest possible decompression unit, as only the checksum data of a complete frame can be verified.
The way that’s handled in the bgzip/gzip world is with an external index file (.gzi) with compressed/uncompressed offsets. The index could be auto-computed, but would still require reading the header for each frame.
I vastly prefer the idea of having the index as part of the file. Sadly, gzip doesn’t have the concept of a skippable frame, so that would break naive decompressors. I’m still not sure the file size savings would be big enough to switch over to zstd, but I like the approach.
https://github.com/facebook/zstd?tab=readme-ov-file#the-case...
Has zstd actually standardized the seekable version? Last I checked (which was quite a while ago) it had not been declared a standard, so I was reluctant to write a filter for nbdkit, even though it's very much a requested feature.
Why zeek, BTW? Is it a play on "zstd" and "seek"? My employer is also the custodian of the zeek project (https://zeek.org), so I was confused for a second.
Yes, the name is a combination of zstd and seek. Funnily enough, I wanted to name it just zeek first before I knew that it already exists, so I switched to zeekstd. You're not the first person asking me if there is any relation to zeek and I understand how that is misleading. In hindsight the name is a little unfortunate.
1. Huge compression window (like 100+MB, so "chunking" won't work)
2. Random seeking into compressed payload
Anyone know of any projects that can provide both of these at once?
[1] https://github.com/rorosen/zeekstd/blob/main/cli/src/compres...