Readit News logoReadit News
Teslazar · 3 years ago
Is there a way to have current AI tools maintain consistency when generating multiple images of a specific creature or object? For example, if there are images of 'Dr. Venom' they need to look similar, or if there are images of the same space ship.
ftufek · 3 years ago
Yes, right now you have 3 options:

- dreambooth, ~15-20 minutes finetuning but generally generates high quality and diverse outputs if trained properly,

- textual inversion, you essentially find a new "word" in the embedding space that describes the object/person, this can generate good results, but generally less effective than dreambooth,

- LORA finetuning[1], similar to dreambooth, but you're essentially finetuning the weight deltas to achieve the look, faster than dreambooth, much smaller output.

1: https://github.com/cloneofsimo/lora

wokwokwok · 3 years ago
> Is there a way to have current AI tools maintain consistency when generating multiple images of a specific creature or object?

...but, all of these can't maintain consistency.

All they can do is generate the same 'concept'. For example, 'pictures of batman' will always generate pictures that are recognizably batman.

However, good luck generating comic cells; there is nothing (that I'm aware of) that will let you generate consistency across images; every cell will have a subtly different batman, with a different background, different props, different lighting, etc.

The image-to-image (and depth-to-image) pipelines will let you generate structurally consistent outputs (eg. here is a bed, here is a building), but they will still be completely distinct in detail, and lack consistency.

This is why all animations using this tech have that 'hand drawn jitter' to them, because it's basically not possible (currently) to say: "an image of batman in a new pose, but that is like this previous frame".

So... to the OP's question:

Recognizable outputs? Yes sure, you've already been able to generate 'a picture of a dog'.

New outputs? Yeah! You can train it for something like 'a picture of 'Renata Glasc the Chem-Baroness' now.

Consistency across outputs? No, not really. Not at all.

mmahemoff · 3 years ago
This is the next frontier for AI art as it will let you build a series, graphic novel, or even video with consistent objects. There’s techniques like textual inversion that let you associate a label with an object, but they rely on having multiple images of that object already, so it won’t work for an image you just generated. To get around that, some people have tried using tools to generate multiple images of a synthetic object, eg Deep Nostalgia that can animate a static portrait photo.

So in theory you select one photo with the AI image generator, create variants of it with separate image tools, then build a fine-tuned model based on some cherry-picked variants.

I think this will get easier as AI image tools focus more on depth and 3D modelling.

The “aiactors” subreddit has some interesting experiments along these lines.

michaelbuckbee · 3 years ago
Check out this video by Corridor Crew -> they're able to use Stable Diffusion to consistently transfer the style of an animated film (Spiderverse) onto real world shots.

https://youtu.be/QBWVHCYZ_Zs

astrange · 3 years ago
The concept of “similar” is AI-complete (ie, only you knows what seems acceptably similar to you), so basically, no.

You can force a model to generate nearly the same actual pixels with DreamBooth, which can be interesting for putting people’s faces in a picture, but otherwise I’d call it overfitting.

enchiridion · 3 years ago
Is AI-complete an actual complexity class? Genuinely curious, I’ve never heard of it.
washadjeffmad · 3 years ago
Two parameters in stable diffusion webui are denoising (similarity to source) and cfg scale (adherence to prompt). img2img does as it sounds, and inpainting allows masked modifications to a base image with a great deal of control and variability.

I'd recommend giving it a shot if you have an Nvidia GPU with ≥4GB VRAM.

Edit: There are also training and hypernetworks, but they require a body of source material, keywording, and significantly more time and compute resources, so I haven't attempted either.

CoffeePython · 3 years ago
Seems like there is some way. There's a startup[1] that I've been seeing around on twitter[2] which makes it easy to create in-game assets that are style-consistent. Haven't tried it yet but it looks promising!

[1] - https://www.scenario.gg/

[2] - https://twitter.com/Beekzor/status/1608862875862589441?s=20

djur · 3 years ago
Textual inversion can kind of do this, but I haven't been impressed by examples I've seen. It seems more suited to "Shrek as a lawnmower" than "Shrek reading a book".
pkdpic · 3 years ago
Hugging face has everything you need to get started with stable diffusion textual inversion training here. It's awesome to get it running but as others have said it has limitations if you're trying to get multiple images for a narrative made etc.

https://huggingface.co/docs/diffusers/training/text_inversio...

lilgreenland · 3 years ago
midJourney lets you upload a reference image. For a portrait of someone's face this produces consistently the same person.
Kerbonut · 3 years ago
I would recommend looking into depth map of the source material then generating off of the resulting depth map. That will keep the structure the same so things don’t pop in and out. Then the suggestions of dreambooth or textual inversion to get the colors etc right.
tniemi · 3 years ago
You can teach the AI a new item/character. https://www.youtube.com/watch?v=W4Mcuh38wyM
smusamashah · 3 years ago
There is a way which someone recently discovered to work great on reddit.

In automatic1111 UI you can alternate between prompts e.g. "Closeup portrait of (elon musk | Jeff bezos | bill gates)". Final image will be a face that look like all three. See this https://i.redd.it/8uq52mnausu91.png

Now do the same with two people but invert the gender. The female version of what I gave example of won't look like anything you know about. And it will remain consistent.

It kind of works.

theCrowing · 3 years ago
You can work with embeddings.
kache_ · 3 years ago
Yeah, look up textual inversion
hnbad · 3 years ago
This is a great example of AI image generation being unable to generate "art" and instead just replicating a naive approximation of what it was trained on. There's no coherence or consistency between the images and while they all look "shiny", they also look incredibly dull and generic.

It's a cool exercise but using this for a real-world project would eliminate any attempt at producing an artistic "voice". AI image generation excels only at generating stock art and placeholder content.

washadjeffmad · 3 years ago
That's more an issue with prompts and training. For as functional is it is, the established models are all very preliminary, and we're in a Cambrian explosion of sorts wrt tooling.

Also, one model doesn't speak for them all. I have the problem that my results are often too consistent with certain models, largely because of prompt complexity, lack of wildcarding, etc.

siraben · 3 years ago
I was trying this recently with the Sierra Christmas Card from 1986![0] The images that I generated were[1], and I was trying to tweak the model parameters with different denoising and CFG scales. When you get the parameters just right you can preserve the composition of the input image very well while still adding a lot of detail. This isn't a completely automatic process though, with Stable Diffusion you have to provide the right prompt otherwise the generation process isn't guided correctly, so approach works better for aesthetics and style transfer than regular image super-resolution such as ESRGAN.

[0] https://archive.org/details/sierra-christmas-card-1986

[1] https://i.imgur.com/WxD05gX.jpeg

Agentlien · 3 years ago
If you're using stable diffusion 2.0 or later you can use its depth-to-image mode[0] to create variations of an image which respect its composition without having to keep your parameters within a narrow range.

[0] https://github.com/Stability-AI/stablediffusion#image-modifi...

siraben · 3 years ago
I still struggle with SD 2.0 prompting because many of the tricks (greg, artstation) don't work anymore, have people had success with it or do I have to use custom models?
LudwigNagasena · 3 years ago
That’s just yet another exercise in prompt engineering. I expected something more interesting.
carrolldunham · 3 years ago
The title excited me - maybe someone succeeded making new art that looks like the old pre-renders, maybe a convincing imitation of the scanline render look. Instead it was a vapid article about tossing pixel art into img-2-img and getting some tenuously related junk.
Jiro · 3 years ago
He didn't even do that. He used text descriptions; the original pixel art wasn't involved.
layer8 · 3 years ago
I guess that explains why the results were only vaguely related to the original pictures.

Dead Comment

vyrotek · 3 years ago
This would be super fun to use with "Return to Zork" clips.
shdon · 3 years ago
Is it just me, or was anybody else reminded of the Carmageddon cover art after seeing the image they ended up using for Dr. Venom?
js8 · 3 years ago
I have a question. Stable diffusion is based on gradually processing noise into a coherent image, by training a denoiser. Would it be possible to feed low-fidelity image (such as pixel art, or pixelated image) directly into the denoiser step and get a higher-fidelity image that would match the original?
mschulkind · 3 years ago
Yes, stable diffusion has an img2img mode that does essentially exactly that.
sigmar · 3 years ago
I think what you are describing is called super-resolution https://ai.googleblog.com/2021/07/high-fidelity-image-genera...
charcircuit · 3 years ago
It's noise in a latest space which is different from noise in the image that the vector represents.