Readit News logoReadit News
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
squirrellous · 5 months ago
One of the mentioned examples sounds like the compressor is taking advantage of the SDDL by treating row-oriented data as stripes of column-oriented data, and then compressing that. This makes me curious - for data that’s already column-oriented like Parquet, what’s the advantage of OpenZL over zstd?
felixhandte · 5 months ago
SDDL (and the front-end task of reshaping data in general) is only one component of OpenZL. Once you have the streams, you can do all sorts of transformations to them that Zstd doesn't.
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
adrianmonk · 5 months ago
This is great stuff!

Any plans to make it so one format can reference another format? Sometimes data of one type occurs within another format, especially with archive files, media container files, and disk images.

So, for example, suppose someone adds a JSON format to OpenZL. Then someone else adds a tar format. While parsing a tar file, if it contains foo.json, there could be some way of saying to OpenZL, "The next 1234 bytes are in the JSON format." (Maybe OpenZL's frames would allow making context shifts like this?)

A related thing that would also be nice is non-contiguous data. Some formats include another format but break up the inner data into blocks. For example, a network capture of a TCP stream would include TCP/IP headers, but the payloads of all the packets together constitute another stream of data in a certain format. (This might get memory intensive, though, since there's multiplexing, so you may need to maintain many streams/contexts.)

felixhandte · 5 months ago
The OpenZL core supports arbitrary composition of graphs. So you can do this now via the compressor construction APIs. We just have to figure out how to make it easy to do.
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
bede · 5 months ago
For BAM this could be a good place to start: https://www.htslib.org/benchmarks/CRAM.html

Happy to discuss further

felixhandte · 5 months ago
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
felixhandte · 5 months ago
It was really hard to resist spilling the beans about OpenZL on this recent HN post about compressing genomic sequence data [0]. It's a great example of the really simple transformations you can perform on data that can unlock significant compression improvements. OpenZL can perform that transformation internally (quite easily with SDDL!).

[0] https://news.ycombinator.com/item?id=45223827

felixhandte · 5 months ago
Update: let's continue discussing genomic sequence compression on https://github.com/facebook/openzl/issues/76.
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
Havoc · 5 months ago
That looks great

Are the compression speed chart all like-for-like in terms of what is hw accelerated vs not?

felixhandte · 5 months ago
Yes. None of the algorithms under test used any hardware acceleration in the benchmarks we ran.
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
viraptor · 5 months ago
I wonder, given the docs, how well could AI translate imhex and Kaitai descriptions into SDDL. We could get a few good schemas quickly that way.
felixhandte · 5 months ago
Ooh, thanks for mentioning these! I wasn't aware of the existence of these tools but yes it seems very possible that you could transform these other spec formats into SDDL descriptions. I'll check them out.
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
michalsustr · 5 months ago
Are you thinking about adding stream support? I.e something along the lines of i) build up efficient vocabulary up front for the whole data and then ii) compress by chunks, so it can be decompressed by chunks as well. This is important for seeking in data and stream processing.
felixhandte · 5 months ago
Yes, definitely! Chunking support is currently in development. Streaming and seeking and so on are features we will certainly pursue as we mature towards an eventual v1.0.0.
felixhandte commented on OpenZL: A novel data compression framework   github.com/facebook/openz... · Posted by u/felixhandte
dang · 5 months ago
Comments moved to https://news.ycombinator.com/item?id=45492803, which was posted a bit earlier. I hope that's ok. Congratulations on the release!
felixhandte · 5 months ago
Thanks @dang!
felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
felixhandte · 5 months ago
It was really hard to resist spilling the beans about OpenZL on this recent HN post about compressing genomic sequence data [0]. It's a great example of the really simple transformations you can perform on data that can unlock significant compression improvements. OpenZL can perform that transformation internally (quite easily with SDDL!).

[0] https://news.ycombinator.com/item?id=45223827

felixhandte commented on OpenZL: An open source format-aware compression framework   engineering.fb.com/2025/1... · Posted by u/terrelln
kingstnap · 5 months ago
Wow this sounds nuts. I want to try this on some large csvs later today.
felixhandte · 5 months ago
Let us know how it goes!

We developed OpenZL initially for our own consumption at Meta. More recently we've been putting a lot of effort into making this a usable tool for people who, you know, didn't develop OpenZL. Your feedback is welcome!

u/felixhandte

KarmaCake day426April 8, 2014
About
Software Engineer working on Data Compression at Facebook.

[ my public key: https://keybase.io/felix; my proof: https://keybase.io/felix/sigs/mjJ1GvUGKTRmYImkgCu8Z04zn4AetQ7MsGOGRCSMh-A ]

View Original