Readit News logoReadit News
mbitsnbites commented on Show HN: A luma dependent chroma compression algorithm (image compression)   bitsnbites.eu/a-spatial-d... · Posted by u/mbitsnbites
mbitsnbites · a month ago
Just a note for the posterity: The continuation of the project is called Bitfrost CC and lives here: https://codeberg.org/mbitsnbites/bitfrostcc
mbitsnbites commented on Show HN: A luma dependent chroma compression algorithm (image compression)   bitsnbites.eu/a-spatial-d... · Posted by u/mbitsnbites
ffworld · a month ago
You talk about an algorithm. However, you never visited any of the practical implementations of Chroma from Luma, which is used in Production.

It is part of the AV1/2 video codec; for instance, it has been widely adopted too since 2018. https://arxiv.org/pdf/1711.03951

So do IETF early draft of the idea. https://datatracker.ietf.org/doc/draft-midtskogen-netvc-chro...

Give a read of the work if not:)

mbitsnbites · a month ago
Thanks for the references! After writing the blog I was looking for such references.
mbitsnbites commented on Show HN: A luma dependent chroma compression algorithm (image compression)   bitsnbites.eu/a-spatial-d... · Posted by u/mbitsnbites
fleabitdev · a month ago
You've rediscovered a state-of-the-art technique, currently used by JPEG XL, AV1, and the HEVC range extensions. It's called "chroma from luma" or "cross-component prediction".

This technique has a weakness: the most interesting and high-entropy data shared between the luma and chroma planes is their edge geometry. To suppress block artefacts near edges, you need to code an approximation of the edge contours. This is the purpose of your quadtree structure.

In a codec which compresses both luma and chroma, you can re-use the luma quadtree as a chroma quadtree, but the quadtree itself is not the main cost here. For each block touched by a particular edge, you're redundantly coding that edge's chroma slope value, `(chroma_inside - chroma_outside) / (luma_inside - luma_outside)`. Small blocks can tolerate a lower-precision slope, but it's a general rule that coding many imprecise values is more expensive than coding a few precise values, so this strategy costs a lot of bits.

JPEG XL compensates for this problem by representing the local chroma-from-luma slope as a low-resolution 2D image, which is then recursively compressed as a lossless JPEG XL image. This is similar to your idea of using PNG-like compression (delta prediction, followed by DEFLATE).

Of course, since you're capable of rediscovering the state of the art, you're also capable of improving on it :-)

One idea would be to write a function which, given a block of luma pixels, can detect when the block contains two discrete luma shades (e.g. "30% of these pixels have a luminance value close to 0.8, 65% have a luminance value close to 0.5, and the remaining 5% seem to be anti-aliased edge pixels"). If you run an identical shade-detection algorithm in both the encoder and decoder, you can then code chroma information separately for each side of the edge. Because this would reduce edge artefacts, it might enable you to make your quadtree leaf nodes much larger, reducing your overall data rate.

mbitsnbites · a month ago
Thanks for the feedback, and the interesting ideas. It's good to know that I was on to something and not completely off :-)

I'm mostly doing this for learning purposes, but a hidden agenda is to create a low-latency codec that can be used in conjunction with other codecs that deal primarily with luma information. AV1 and friends are usually too heavy in those settings, so I try to keep things simple.

mbitsnbites commented on Show HN: A luma dependent chroma compression algorithm (image compression)   bitsnbites.eu/a-spatial-d... · Posted by u/mbitsnbites
schobi · a month ago
Image and video compression has become a field that is painfully hard to enter. State of the art is complex and exhaustive, the functionality of reference encoders and comments/versions among them is really a lot.

We are well beyond where a dedicated individual can try an idea, show that it is better and expect that others can pick it up (e.g. in standardization). It is not sufficient to run a few dozen images and judge by yourself, you are expected to demonstrate the benefit integrated into the latest reference encoders and need a sponsor to join standardization efforts.

For educational purpose? Sure - do whatever you want - but any discussion "is it novel" or "is it useful for others" is moot, unfortunately.

mbitsnbites · a month ago
I truly get that. That's also one of the reasons why I started from scratch once I got the idea, rather than researching all the available papers and implementations etc (because the latter is quite overwhelming, while the former took me about a week of spare time hacks).

My scope is also a bit unusual, I think, because one of the applications I'm thinking about is to "augment" luma-only codecs with chroma. One such codec is https://gitlab.com/llic/llic

But most of all, I wanted to learn.

mbitsnbites commented on Zed now predicts your next edit with Zeta, our new open model   zed.dev/blog/edit-predict... · Posted by u/ahamez
levzzz · a year ago
yeah, i'd like to be able to run it locally. it should fit well onto my 12gb gpu
mbitsnbites · a year ago
The model is based on Qwen2.5-Coder-7b it seems. I currently run some quantized variant of Qwen2.5-Coder-7b locally with llama.cpp and it fits nicely in the 8GB VRAM of my Radeon 7600 (with excellent performance BTW), so it looks like it should be perfectly possible.

I would also only use Zeta locally.

mbitsnbites commented on Reciprocal Approximation with 1 Subtraction    · Posted by u/mbitsnbites
dahart · a year ago
Ah very nice, I was close with using max error - 0.05051 is the same number I got. Pretty sure 0x7ef311c2 came up for me at least a few times as I was fiddling with parameters. Is this using minimum good bits as the deciding criteria, or is it the best overall number using one of the averages and also 1-3 NR steps? Did you limit the input range, or use all finite floats? Having the min/avg error in bits is nice, it’s more intuitive than relative error.

I like the FMA simulation, that’s smart; I didn’t think about it. I did my search in Python. I don’t have it in front of me right now, and off the top of my head I’m not even sure whether my NR steps are in Python precision or fp32. :P My posts in this thread were with NR turned off, I wanted to find the best raw approximation and noticed I got a different magic number when using refinement. It really is an amazing trick, right? Even knowing how it works it still looks like magic when plotting the result.

Thanks for the update!

BTW I was also fiddling with another possible trick that is specific to reciprocal. I suspect you can simply negate all the bits except the sign and get something that’s a decent starting point for Newton iters, though it’s a much worse approximation than the subtraction. So maybe (x ^ 0x7fffffff). Not sure if negating the mantissa helps or if it’s better to negate only the exponent. I haven’t had time to analyze it properly yet, and I don’t know of any cases where it would be preferred, but I still think it’s another interesting/cute observation about how fp32 bits are stored.

mbitsnbites · a year ago
When measuring the errors I exhaustively iterate over all possible floats in the range [1, 2), by enumerating all IEEE 754 single precision representations in that range. That's "only" 2^23 numbers, so perfectly doable.

My selection criteria was abit complex, but something like this:

1. Maximize number of accurate bits in the approximation.

2. Same in NR step 1, then NR step 2 etc.

3. Minimize the max error in the approximation, and then the avg ertor in the approximation.

4. Same for NR step 1, 2, ...

mbitsnbites commented on Reciprocal Approximation with 1 Subtraction    · Posted by u/mbitsnbites
mbitsnbites · a year ago
Your suggestion got me intrigued. I have a program that does an exhaustive check for maximum and average error, so I'll give your numbers a spin.
mbitsnbites · a year ago
Given my search criteria, the optimal magic number turns out to be: 0x7ef311c2

  Initial approximation:
    Good bits min: 4
    Good bits avg: 5.242649912834
    Error max: 0.0505102872849 (4.30728 bits)
    Error avg: 0.0327344845327 (4.93304 bits)

  1 NR step:
    Good bits min: 8
    Good bits avg: 10.642581939697
    Error max: 0.00255139507338 (8.61450 bits)
    Error avg: 0.00132373889641 (9.56117 bits)

  2 NR steps:
    Good bits min: 17
    Good bits avg: 19.922843217850
    Error max: 6.62494557693e-06 (17.20366 bits)
    Error avg: 2.62858584054e-06 (18.53728 bits)

  3 NR steps:
    Good bits min: 23
    Good bits avg: 23.674004554749
    Error max: 1.19249960972e-07 (22.99951 bits)
    Error avg: 3.44158509521e-08 (24.79235 bits)
Here, "good bits" is 24 minus the number of trailing non-zero-bits in the integer difference between the approximation and the correct value, looking at the IEEE 754 binary representation (if that makes sense).

Also, for the NR steps I used double precision for the inner (2.0 - x * y) part, then rounded to single precision, to simulate FMA, but single precision for the outer multiplication.

mbitsnbites commented on Reciprocal Approximation with 1 Subtraction    · Posted by u/mbitsnbites
mitthrowaway2 · a year ago
I wonder to what extent the dedicated hardware is essentially implementing the same steps but at the transistor level.
mbitsnbites · a year ago
The big cores do. They essentially pump division through something like an FMA (fused multiply-add) unit, possibly the same unit that is used for multiplication and addition. That's for the Newton-Raphson steps, or Goldschmidt steps.

In hardware it's much easier to do a LUT-based approximation for the initial estimate rather than the subtraction trick, though.

It's common for CPUs to give 6-8 accurate bits in the approximation. x86 gives 13 accurate bits. Back in 1975, the Cray 1 gave 30 (!) accurate bits in the first approximation, and it didn't even have a division instruction (everything about that machine was big and fast).

u/mbitsnbites

KarmaCake day289March 21, 2018View Original