Readit News logoReadit News
rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
wyager · 8 months ago
I have a project where I want two properties which are not inherently contradictory, but don't seem to be available together:

1. Huge compression window (like 100+MB, so "chunking" won't work)

2. Random seeking into compressed payload

Anyone know of any projects that can provide both of these at once?

rorosen · 8 months ago
If seeking to frames is good enough for random seeking, this can be done with zeekstd already. The cli sets a custom window log (ZSTD_c_windowLog) on the compression context when creating binary patches[1], I regularly use it with a window size above 1G.

[1] https://github.com/rorosen/zeekstd/blob/main/cli/src/compres...

rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
threeducks · 8 months ago
Assuming that frames come at a cost, how much larger are the seekable zstd files? Perhaps as a graph based on frame size and for different kinds of data (text, binaries, ...).
rorosen · 8 months ago
It depends on the frame size you choose. Every frame requires a few bytes of additional metadata, how much exactly depends on other compression settings (e.g. frame checksums, which are 4 byte, are only present if enabled). I just tested with a 1G file and compression level 3. zstd compresses it to 559M, zeekstd with a 2M frame to 565M. If I increase the frame size to 4M, zeekstd yields 562M.

I will add a section to the readme, this is a good question that other people might have too!

rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
b0a04gl · 8 months ago
how do you handle cases where the seek table itself gets truncated or corrupted? do you fallback to scanning for frame boundaries or just error out? wondering if there's room to embed a minimal redundant index at the tail too for safety
rorosen · 8 months ago
Zeekstd will just error when the seek table is corrupted. Scanning for frame boundaries should also be possible, though it isn't very efficient. If you don't need the seek table, you can just write it to /dev/null or not write it at all when using the lib.
rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
DesiLurker · 8 months ago
great I can use it to pipe large logfiles and store for later retrival. is there something like zcat also?
rorosen · 8 months ago
You can decompress a complete file with "zeekstd d seekable.zst".

Piping a seekable file for decompression via stdin isn't possible unfortunately. Decompression of seekable files requires to read the seek table first (which is usually at the end of the file) and eventually seek to the desired frame position, so zeekstd needs to able to seek the file.

If you want to decompress the complete file, you can use the regular zstd tool: "cat seekable.zst | zstd -d"

rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
tyilo · 8 months ago
Correct me if I'm wrong, but it doesn't seem like you provide the equivalent of Seekable::decompress in zstd_seekable which decompresses at a specific offset, without having to calculate which frame(s) to decompress.

This is basically the only function I use from zstd_seekable, so it would be nice to have that in zeekstd as well.

rorosen · 8 months ago
From what I can see zstd-seekable is more closely aligned to the C functions in the zstd repo.

The decompress function in zstd-seekable starts decompression at the beginning of the frame to which the offset belongs and discards data until the offset is reached. It also just stops decompression at the specified offset. Zeekstd uses complete frames as the smallest possible decompression unit, as only the checksum data of a complete frame can be verified.

rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
mbreese · 8 months ago
Got it. That’s incredibly helpful. Thank you!

The way that’s handled in the bgzip/gzip world is with an external index file (.gzi) with compressed/uncompressed offsets. The index could be auto-computed, but would still require reading the header for each frame.

I vastly prefer the idea of having the index as part of the file. Sadly, gzip doesn’t have the concept of a skippable frame, so that would break naive decompressors. I’m still not sure the file size savings would be big enough to switch over to zstd, but I like the approach.

rorosen · 8 months ago
Writing the seek table to an external file is also possible with zeekstd, the initial spec of the seekable format doesn't allow this.
rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
ikawe · 8 months ago
Custom dictionaries are a feature of vanilla (non-seekable) zstd. As I understand it, all seekable-zstd are valid zstd, so it should be possible?

https://github.com/facebook/zstd?tab=readme-ov-file#the-case...

rorosen · 8 months ago
Yes, dictionaries should be totally possible. However, I've never tried them to be honest because I usually only compress big files. They can be set on the (de)compression contexts the same way as with regular zstd.
rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
rwmj · 8 months ago
Seekable formats also allow random reads which lets you do trickery like booting qemu VMs from remotely hosted, compressed files (over HTTPS). We do this already for xz: https://libguestfs.org/nbdkit-xz-filter.1.html https://rwmj.wordpress.com/2018/11/23/nbdkit-xz-curl/

Has zstd actually standardized the seekable version? Last I checked (which was quite a while ago) it had not been declared a standard, so I was reluctant to write a filter for nbdkit, even though it's very much a requested feature.

rorosen · 8 months ago
It's not standardized as far as I know.
rorosen commented on Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format   github.com/rorosen/zeekst... · Posted by u/rorosen
simeonmiteff · 8 months ago
This is very cool. Nice work! At my day job, I have been using a Go library[1] to build tools that require seekable zstd, but felt a bit uncomfortable with the lack of broader support for the format.

Why zeek, BTW? Is it a play on "zstd" and "seek"? My employer is also the custodian of the zeek project (https://zeek.org), so I was confused for a second.

[1] https://github.com/SaveTheRbtz/zstd-seekable-format-go

rorosen · 8 months ago
Thanks! I was also surprised that there are very few tools to work with the seekable format. I could imagine that at least some people have a use-case for it.

Yes, the name is a combination of zstd and seek. Funnily enough, I wanted to name it just zeek first before I knew that it already exists, so I switched to zeekstd. You're not the first person asking me if there is any relation to zeek and I understand how that is misleading. In hindsight the name is a little unfortunate.

u/rorosen

KarmaCake day82June 15, 2025View Original