Color-Diffusion: using diffusion models to colorize black and white images

erwannmillon · 3 years ago

Btw, I did this in pixel space for simplicity, cool animations, and compute costs. Would be really interesting to do this as an LDM (though of course you can't really do the LAB color space thing, unless you maybe train an AE specifically for that color space. )

I was really interested in how color was represented in latent space and ran some experiments with VQGAN clip. You can actually do a (not great) colorization of an image by encoding it w/ VQGAN, and using a prompt like "a colorful image of a woman".

Would be fun to experiment with if anyone wants to try, would love to see any results if someone wants to build

carbocation · 3 years ago

> I did this in pixel space for simplicity, cool animations, and compute costs

A slight nitpick, wouldn't doing diffusion in the latent space be cheaper?

erwannmillon · 3 years ago

Depends, given the low res, the 3x64x64 pixel space image is smaller than the latents you would get from encoding a higher-res image with models like VQGAN or the stablediff VAE at their native resolutions.

It's easier to get a sense of what's going wrong with a pixel space model though. With latent space, there's always the question of how color is represented in latent space / how entangled it is with other structure / semantics.

Starting in pixel space removed a lot of variables from the equation, but latent diffusion is the obvious next step

ShamelessC · 3 years ago

Not necessarily if you don’t already have a pretrained autoencoder.

xigency · 3 years ago

Question, how long did it take to train this model and what hardware did you use?

erwannmillon · 3 years ago

Took a lot of failed experiments, the model would keep converging to greyscale / sepia images. Think one of the ways I fixed was by adding an greyscale encoder to the arch. Used its output embedding as additional conditioning. Can't remember if I only added it to the Unet input or injected it during various stages of the unet down pass.

data-ottawa · 3 years ago

Off topic: this has an absolutely 90’s sci-fi movie effect watching the gifs, it’s funny how the tech just wound up looking like that.

erwannmillon · 3 years ago

hahaha it reminded me of some "zoom and enhance" stuff when I was making the animations

barrkel · 3 years ago

It reminded me of the days of antenna pass-through VCR players, where you had to tune into your VCR's broadcast signal when you couldn't use SCART.

nerdponx · 3 years ago

Looks like something you'd see in an X Files episode.

ChrisArchitect · 3 years ago

Author's writeup on this from May: https://medium.com/@erwannmillon/color-diffusion-colorizing-...

asciimov · 3 years ago

I’m not a fan of b&w colorization. Often the colors are wrong, either outright color errors (like choices for clothing or cars) or often not taking in to account lighting conditions (late in day shadows but midday brightness).

Then there is the issue of B&W movies. Using this kind of tech might not give pleasing results as the colors used for sets and outfits were chosen to work well for film contrast and not for story accuracy. That “blue” dress might really be green. (Please, just leave B&W movies the way they are.)

imoverclocked · 3 years ago

I think keeping the art as it was produced is important but there is also a good history of modifying art to produce new art too. In the digital age, we aren’t losing the original art so it seems even stranger to be against modification of the “original.”

However, just applying a simple filter (or single transform without effort) definitely feels derivative to me.

SiempreViernes · 3 years ago

Additionally, colorisations very commonly present themselves as showing a "more true" version of how things looked and not as creative art projects.

qiqitori · 3 years ago

Maybe you're used to looking at B&W stuff and effortlessly figuring out what the scene is depicting, but for me at least it's very hard. Adding a little color makes it much easier. In that regard, it doesn't matter to me if the colors are wrong.

(Perhaps it just takes some getting used to. Back when I read a black and white comic for the first time (as a child), I had a hard time figuring out things at first but got used to it at some point.)

brianpan · 3 years ago

I think the point being made is that movies were made for the B&W end result, not just shooting color with B&W film.

For instance, fake blood in B&W was often produced with black liquid. Colorizing it correctly just doesn't make sense. Or a green or blue dress can be chosen because of the way it looks on film, not because it's supposed to BE a green or blue dress.

jojobas · 3 years ago

It's not as if B&W movies or pictures are taken away, it's just a fun exercise for NN to play with.

hedora · 3 years ago

Due to the magic of the DMCA’s anti-circumvention clauses, the B&W movie can be taken away.

The last time I checked, “the source is public domain” is not a valid defense against the pro-DRM parts of that law.

solumunus · 3 years ago

I don’t see why it matters if the blue dress was really green. The result is either an enhanced experience or not, if it is then minor inaccuracies don’t seem relevant.

Cthulhu_ · 3 years ago

If there's a source that a blue dress was green, then that could be taken into consideration for recoloring, but as you said, it's to enhance the experience, not to be 100% accurate.

tgv · 3 years ago

Quite often, colorized pics and movies have people wear blue-ish clothing, which is fairly unbelievable. It's a gimmick that produces an effect that's not quite right for a goal for which it's not suitable. Because what is it that colorizations try to achieve? To make people think "Oh, so that's how it looked back then"? Then there shouldn't be errors in the image. And if it makes the pictures more relatable, or whatever handwaivy arguments are being thrown around, then non-colorized pictures will become even less relatable, in effect alienating people from recent history (if you believe such arguments).

I'd like to make one exception, though, for They Shall not Grow Old. That was impressive.

zamadatix · 3 years ago

I think colorization with some effort put in can be pretty decent. E.g. I prefer the 2007 colorization of It's a Wonderful Life to the original. It's never perfect but I don't think that's a prerequisite to being better. Some will always disagree though.

About every completely automated colorized video tends to be pretty bad though. Particularly the YouTube "8k colorized interpolated" kind of low effort channels where they just let them pump out without caring if it's actually any good.

blululu · 3 years ago

Yeah it's cool tech but I really don't appreciate how it is just straight up deceitful and spreading misinformation. A lot of hues are underdetermined and the result is more or less arbitrary in a historical context. If one were to research and fine-tune the model such that ambiguous shades are historically accurate I would be less annoyed by the sense that these images are just spreading misinformation. Compare this with Sergey Prokudin-Gorsky's photos of the Russian Empire or autochromes of Paris in 1910 which are actual windows into a lost world.

*for works of fiction these issues vanish, but for any historical or documentary photographs/films, I really hate that I am being lied to.

bavell · 3 years ago

I suppose you must be driven mad by astrophotography.

buildbot · 3 years ago

Does it work on arbitrary image sizes?

One of the nice features of the somewhat old Deoldify colorizer is support for any resolution. It actually does better than photoshops colorization: https://blog.maxg.io/colorizing-infrared-images-with-photosh...

Edit - technically, I suppose, the way Deoldify works is by rendering the color at a low resolution and then applying the filter to a higher resolution using OpenCV. I think the same sub-sampling approach could work here...

erwannmillon · 3 years ago

Technically yes, the encoder and unet are convolutional and support arbitrary input sizes, but the model was trained at 64x64px bc of compute limitations. You could probably resume the training from a 64x64 resolution checkpoint and train at a higher resolution.

But like most diffusion models, they don't generalize very well to resolutions outside of their training dataset

snvzz · 3 years ago

All the examples are portraits of people.

I have to wonder whether it works well with anything else.

erwannmillon · 3 years ago

trained on celebA, so no, but you could for sure train this on a more varied dataset

Eisenstein · 3 years ago

Would it be as simple as feeding it a bunch of decolorized images along with the originals?

realusername · 3 years ago

Is there anything that exists right now with diffusion models to improve poor VHS coloring? The coloring does exist so I would not want to replace a red shirt by a blue shirt for example but it's just not very accurate.

quickthrower2 · 3 years ago

That is so 2023 and so 1993 at the same time!