Try Stable Diffusion's Img2Img Mode

If you have a GPU with >4GB of VRAM and you want to run this locally, here's a fork of the Stable Diffusion repo with a convenient web UI:

https://github.com/hlky/stable-diffusion

It supports both txt2img and img2img. (Not affiliated.)

Edit: Incidentally, I tried running it on a CPU. It is possible, but it took 3 minutes instead of 10 seconds to produce an image. It also required me to hack up the script in a really gross way. Perhaps there is a script somewhere that properly supports this.

kgwgk · 3 years ago

There are as well forks for the GPU in Apple’s M1 chips:

https://github.com/magnusviri/stable-diffusion

francisduvivier · 3 years ago

Anyone knows how fast this runs on an m1 macbook air?

BatteryMountain · 3 years ago

Nvidia GTX 1660 Super with 6GB of VRAM.

I do runs at 384px by 384px, with batch size of 1. Sampling method has almost no impact on memory. Using k_euler with 30 steps renders an image in 10 to 20 seconds. The biggest thing that affect rending speed is the steps and the resolution, so 512x512 with C 50 using ddim is much slower than 256x256 with C 25 using k_euler.

The sampling methods run mostly in the same timelines, but the k_euler one can produce viable output at lower C values, meaning it is faster than the rest.

Don't add gfpgan in the same pipeline, as it takes more vram.

I'm running it on Windows 10 with latest drivers. I set the python process to Realtime priority in task manager (makes a slight difference!). Have not tried it on Linux.

gitfan86 · 3 years ago

I'm running 1660 ti on Windows 11.

I'm thinking about getting a 3090 so that I can make higher resolution images.

Gfpgan runs much faster for me 5 seconds per picture

jtolmar · 3 years ago

I'm impressed that running it on CPU only made it ~20x slower. How did you do it?

lkois · 3 years ago

Nah that's normal. It's why GPUs are the usual thing for AI. Any crap, old, weak gpu with 4gb memory would run circles around a cpu

It's often easier to actually get models to run on CPU, due to simpler install configs and more available memory. Just painful to get a result out of it. Which might help keep the install simple, because it's not even worth optimizing

acidburnNSA · 3 years ago

Just tried it on Ubuntu 22.04. And it's working! Had to install python-is-python3 and conda, but it's up now. Fun, thanks.

TuringTest · 3 years ago

How do you set up the model? Instructions only say "Download the model checkpoint. (e.g. from huggingface)", but I can't find instructions there on how to find a ckpt file, nor exactly what file should I look for.

asicsp · 3 years ago

Can you share your script for running on CPU?

ArneBab · 3 years ago

python optimizedSD/optimized_txt2img.py --device cpu --precision full --prompt "…" --H 512 --W 512 --n_iter 1 --n_samples 1 --ddim_steps 50

MichaelDickens · 3 years ago

What kind of GPU do you have? It takes several minutes to produce an image on my 1070.

elaus · 3 years ago

Takes 3 minutes (for a prompt resulting in a set of 4 images) on my 1080 as well. Really astonished that it takes GP about the same time using just a CPU. Seems like the older generation of GPUs isn't much better than CPUs in regards to ML stuff.

stingraycharles · 3 years ago

A 1070 isn’t very powerful for ML compared to more recent GPUs, so several minutes sounds about right.

SirYandi · 3 years ago

To add another data point, my GTX1080 takes ~60 sec to generate a pair of 500x500 images using txt2img. Haven't tried img2img yet as the UI package I went with is a bit buggy with it

Deleted Comment

anhner · 3 years ago

also on a 1070, I can generate an image in ~15 seconds, surely you're doing something wrong.

bambax · 3 years ago

What are good and reasonably priced GPUs for this (<$250, possibly less)?

simcop2387 · 3 years ago

Not sure exact pricing but look for a used maxwell (geforce 1000 series) nvidia gpu i'd bet. A quadro m2000 with 4gb of ram was about 100 on ebay a short bit ago

ilaksh · 3 years ago

Amazing. Anyone know of a fork where this is hosted on a cloud GPU? Or any existing hosting of this?

fragmede · 3 years ago

There’s a Colab implementation (Google hosted GPU) linked to from https://www.youtube.com/watch?v=Xur1JeRjjOI

outdoorblake · 3 years ago

here is a serverless GPU template for Stable Diffusion hosted on Banana's cloud platform. template: https://github.com/bananaml/serverless-template-stable-diffu... setup demo: https://www.banana.dev/blog/how-to-deploy-stable-diffusion-t...

nickthegreek · 3 years ago

Got this working on my 8gb 3070. 7-8s per image with default settings. Thanks for posting this!

Deleted Comment

I've been trying to get some sensible images out of my descriptions, but I fail miserably.

In this case I had the prompt "cow chewing bone" with 4 squares representing the two pair of feet, the body and the head. None cared about chewing on a bone.

With DALL·E 2 I tried to get an image of a little girl building sandcastles and a monster threatening her:

"little scared girl building a sandcastle and a big angry monster is looking at her."

"little scared girl building a sandcastle six damaged sandcastles are to her side. a big angry monster is threatening her. it is dark." https://imgur.com/a/f5FFKOi

"little scared girl building a sandcastle with six damaged sandcastles to her side and a big angry monster threatening her"

Is there some kind of structure the sentences should follow?

BatteryMountain · 3 years ago

Yes, checkout examples on lexica or use a prompt builder to help, like promptmania.

Also, most of the good ones you see online are cherry picked from hundreds of runs, so set your batch size too 1000 and go to bed! After that, people then tend to run some of the good results through img2img, also with a lot of variations produced from a single image. Finally, some people also run them at higher resolutions if they have enough VRAM, as smaller resolution can distort or generate rubbish. For the messed up faces, they run it through gfpgan a few times to get prettier faces. Other than that, it is pure luck (using random seeds) to figure out what works and what doesn't. Use the 2 sites above to help you improve your prompts.

(meant in the context of stable diffusion)

BatteryMountain · 3 years ago

Just know that if you let it run over night often, you will see it on your electricity bill. My GTX 1660 runs at max while rendering, which is 125W. Leaving it running over night can easily eat 2 to 6 kw's, depending on your system.

bemmu · 3 years ago

I managed to get one that was correct with "A little scared girl is building a sandcastle, while a monster is looking at her. Award-winning photograph.", but I couldn't figure out a phrasing where it wouldn't most of the time get confused thinking that the sandcastle is the monster, or that the girl is the monster.

Marazan · 3 years ago

Dalle is bad at being instructed to have an exact count of items in the picture. Ask for 6 kittens and you get 7 & each kitten will be much more "wrong" than a piture of a single kitten.

Dalle is is bad at positional prompts. Ask for somethi g to be in the top rightbhand corner and it will appear bottom centre

comex · 3 years ago

joshuahedlund · 3 years ago

Some amazing examples of what people have done with img2img with Stable Diffusion:

https://old.reddit.com/r/StableDiffusion/comments/wy7oa5/img...

https://old.reddit.com/r/StableDiffusion/comments/wyq04v/usi...

https://old.reddit.com/r/StableDiffusion/comments/wzlmty/its...

emrah · 3 years ago

If i ran the same input image several times, would it produce the same output?

supermatt · 3 years ago

if you use the same parameters (e.g. image, seed, noise strength, guidance scale, sample count - which are not exposed on this UI), yes.

Kiro · 3 years ago

old.reddit is truly horrible on mobile. Once you click on an image you can't go back. Off topic, but what is the other alternative UI called that people sometimes use?

bj-rn · 3 years ago

https://teddit.net?

babuskov · 3 years ago

I use i.reddit.com on mobile.

xd1936 · 3 years ago

I use the Slide for Reddit app (one of many choices) and links open automatically because of the app's URL handler.

mishig · 3 years ago

Hello, everyone I'm Mishig, one of the engineers worked on setting up the demo. Happy to answer any questions if you got some :)

You can find the announcement tweet here: https://twitter.com/mishig25/status/1563226161924407298?s=20...

Just wanted to say thanks. It’s really cool!

NOTE: for some reason this is NOT using huggingface for inference, resulting in huge queues, slow performance, etc. It is sending requests to https://sdb.pcuenca.net/i2i (https://huggingface.co/spaces/huggingface/diffuse-the-rest/b...)

freeqaz · 3 years ago

I've been playing with this for a few hours. It's slow going -- you really need a fast GPU with a lot of RAM to make this very usable.

I ended up paying the $10 for Google Colab Pro and that's how I've been using this. Maybe I'll figure out how to get this working on my old 1080 TI to see if it's faster.

Anyway, for the one that I'm using which has a web UI, you can use this Colab link. It's pretty great! https://colab.research.google.com/drive/1KeNq05lji7p-WDS2BL-...

What I really wish was that the img2img tool could be used to take a text2img output and then "refine" it further. As it is, the img2img tool doesn't seem particularly great.

People on Reddit are talking about "I just generate 100 images and pick the best one"... but this is incredibly slow on the P100 GPU that Google has me on. Does this just require a monster GPU like a 3080/3090 in order to get any decent results?

StevenWaterman · 3 years ago

You can feed a txt2img output into the img2img pipeline as an init, that's something that I do quite often, eg https://twitter.com/SteWaterman/status/1563872748161613826

Also how slow is your p100? I'm usually getting around 3 it/s. Maybe it's just because I'm used to disco diffusion where a single image took over an hour, but this is ungodly fast to me

ShamelessC · 3 years ago

Colab can feel slow for other reasons, such as throttled download speeds making it very slow to download weights on a cold boot.

Jach · 3 years ago

FWIW I'm using an old gtx 1080 Ti to play around, it takes about 21 seconds per image. You can make it go even faster by lowering the timesteps taken from the default 50 (--ddim_steps), though both lowering and raising the value can result in quite different first-iteration images (though they tend to be similar) and seems to guarantee totally different further iteration images (as counted by --n_iter)... I'm with you on the feeling that it's hard to control, whether in refinement or in other ways, but I suspect that'll get a lot better in the next couple years (if not weeks or dare I say days).

epups · 3 years ago

With a 3090 it's about 10s to generate a 512x512 image from another, maybe less.

orbital-decay · 3 years ago

You're probably using the default PLMS sampler with 50 steps. There are better samplers, the best seem to be Euler (more predictable in regards to the number of steps) and Euler ancestral (gives more variation). Both typically need much less steps to converge, speeding up the generation.

googlryas · 3 years ago

What's huggingface's business model? How do they pay for all this compute and (apparently) 140+ employees?

tmabraham · 3 years ago

HuggingFace is a company that mainly builds open-source libraries and platforms to support open-source ML projects. They started out with their famous Transformers library and have many other libraries including the diffusion model one that is actually what this application here is using. They also have this model/dataset hub and interactive application platform known as "Spaces". Their goal is to be the "GitHub of machine learning".

Their business model is basically supporting enterprise and private use-cases. For example, getting expert support for using these libraries, or hosting models and datasets privately. You can see more information about the pricing here: https://huggingface.co/pricing

They reached a $2 billion valuation after a recent round of funding so overall they're probably pretty flush with cash lol

w-ll · 3 years ago

They are building a model of what everyone else is typing into these open models.

NoToP · 3 years ago

Aren't we all?

bee_rider · 3 years ago

I did see a couple people in the reddit posted here

https://news.ycombinator.com/item?id=32634139

complaining about the price of getting their images done. So part of it may actually exchanging money for a service. I bet a good chunk of it is investor cash though.

simonw · 3 years ago

You can also try it out on Replicate: https://replicate.com/stability-ai/stable-diffusion

gyuopy · 3 years ago

Note that it eventually asks for your credit card, it's not indefinitely free like the linked site.

bfirsh · 3 years ago

Here's an example of doing img2img: https://replicate.com/stability-ai/stable-diffusion?predicti...

qwertox · 3 years ago