I used Stable Diffusion and Dreambooth to create an art portrait of my dog

Some feedback on workflow:

  - Automatic1111 outpainting works well but you need to enable the outpainting script. I would recommend Outpainting MK2. What the author did was just resize with fill which doesn't do any diffusion on the outpainted sections.
  - There are much better resizing workflows, at a minumum I would recommend using the "SD Upscale Script". However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control. In this case "SD Upscale" is fine but the inpaint based upscale works well with complex compositions.
  - When training I would typically recommend to keep the background. This allows for a more versitile finetuned model.
  - You can get a lot more control of final output by using ControlNet. This is especially great if you have illustration skills. But it is also great to generate varitions in a different style but keep the composition and details. In this case you could have taken a portrait photo of the subject and used ControlNet to adjust the style (without and finetuning required).

raincole · 3 years ago

> However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control.

Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?

thot_experiment · 3 years ago

For what it's worth if you actually want to get help on the state of the art on this stuff the best place to ask is the 4chan /g/ /sdg/ threads, and you can absolutely diffuse images that large using TiledVAE and Mixture of Diffusers or Multidiffusion, both of which are part of the Tiled Diffusion plugin for auto1111.

https://i.imgur.com/zOMarKc.jpg

Here's an example using various techniques I've gathered from those 4chan threads. (yes I know it's 4chan but just ignore the idiots and ask for catboxes, you'll learn much faster than anywhere else, at least that was the case for me after exhausting the resources on github/reddit/various discords)

SV_BubbleTime · 3 years ago

That confused me at first too.You aren’t diffusing the 8k image.

You are upsampling, then inpainting sections that need it. So if you took your 8K and inpainted a section with 1024x1024 that works well with normal ram usages. In Auto1111, you need to select “inpainted masked area” to do that.

asynchronous · 3 years ago

To clarify when things are upscaled like that they typically mean a section of img2img in a grid pattern to make up the full picture so it doesn’t overuse vram.

smusamashah · 3 years ago

For outpainting there are these two amazing tools which give you a canvas to do stuff

https://github.com/zero01101/openOutpaint

https://github.com/BlinkDL/Hua

Both use automatic1111 API for the work.

jakedahn · 3 years ago

Thank you for these recommendations! I'll definitely be trying them next time 'round.

pjgalbraith · 3 years ago

Good luck! I have some workflow videos on Youtube https://youtube.com/pjgalbraith. But I haven't had a chance to show off all the latest techniques yet.

I love how much work went into this.

There's a great deal of pushback against AI art from the wider online art community at the moment, a lot of which is motivated by a sense of unfairness: if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

(I do not share this opinion myself, but it's something I've seen a lot)

This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.

madeofpalk · 3 years ago

> a lot of which is motivated by a sense of unfairness

This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.

Most legitimate pushback I've seen has been more on the non-consensual training of models. Many artists don't want their work to be sucked up into the "AI Borg Model" and then regurgitated by someone else, removing the artists consent, credit, and compensation.

scheeseman486 · 3 years ago

I found it rare that those dead-set against AI art actually concede that it has value after you take copyright out of the equation, bringing up Adobe Firefly instead pivots the conversation to other, considerably weaker arguments.

Using stock art is just further appropriation, which is silly considering the intent and licensing of stock artwork is clearly intended by all parties to turn works into commodities for commercial exploitation.

The old ways are best, the new ways are bad and take away the soul from the creation process and resulting works. Also unconvincing considering that most of the people saying that are using radically different, digitized, heavily time optimized art workflows compared to norm of the industry even 30 years ago.

Not that I don't see the problems, the potential for job losses due to the optimizations to workflow requiring less work and therefore less workers is an actual risk, but one that happens regardless of copyright enforcement of AI models. The problems commercialized AI art workflows cause may even be exacerbated by enforcement of copyright on training data by handing a monopoly of all higher quality generative AI models into already entrenched multinational intellectual property rightsholders hands. I think a lot of artists forget that copyright isn't as much for them as it's for the Disneys of the world.

regularfry · 3 years ago

I absolutely have seen it. A lot. It's dressed up as Luddism, more often expressed as "you shouldn't be able to have those results because I spent years honing my craft" which may or may not be followed by "...and if we allow this, those years were wasted and I'm out of a job, along with millions of others".

zirgs · 3 years ago

SD base models can't really be used to imitate style of other artists reliably, because the datasets that they were trained on are a huge mess. Caption accuracy is all over the place. For example - Joan Cornella's work and Cyanide & Happiness comics are in LAION5B, but if you prompt SD to make art in their style you'll get something completely different. Try prompting for a "minigun" - you will also get something weird.

In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.

Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).

true_religion · 3 years ago

>> a lot of which is motivated by a sense of unfairness > This is not something I've seen once in any sort of criticism of "AI art"

I've actually seen this a lot.

In my view, it's not coming from professional artists working in the field. Their concern is more that people are ripping off their style, or that AI is making their efforts unnecessary (e.g. lots of people who made a living by copying the style of particular anime & cartoons for fans, no longer have a purpose since AI can do that given enough source material).

Non-professional artists, on the other hand, are still learning and have put a lot of time into their craft and it hasn't paid off yet. They seem to be annoyed that other people are getting results (via AI), without actually having to learn the mechanics of art.

AI basically lets your generic art history major produce lots and lots of pieces, because they can describe artwork well enough and know where to find good samples for the AI. The only thing stopping them was mere mechanical inability, not knowledge of the art space.

numpad0 · 3 years ago

> and compensation.

Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?

I think compensation as a condition is only assumed implied, that financial gains are artists’ motives and they actually live off that income. Rather, I see a lot of vocal oppositions to AI image generators that aren’t drawing for profit at all.

So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?

whywhywhywhy · 3 years ago

>Most legitimate pushback I've seen has been more on the non-consensual training of models

Look at the pushback to Adobe’s model.

“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.

The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.

minimaxir · 3 years ago

Unfortunately it's become a meme among AI art haters that AI art is "just inputing text into a text box" despite the fact that is far from the truth, particularly if you want to get specific results as this blog post demonstrates.

Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

squidsoup · 3 years ago

Only if you exclude the countless hours an illustrator has spent developing their craft.

tester457 · 3 years ago

It's a meme because 99% of the ai art creators don't go that deep, they only prompt.

Even if they did have a more complex workflow most of them are still based on copyrighted training data, so there will be many lawsuits.

dorkwood · 3 years ago

> Some modern AI art workflows often require more effort than actually illustrating using conventional media.

Then why don’t they illustrate it instead, and save themselves some time?

capableweb · 3 years ago

> Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

Indeed. Another criticism that I can definitely somewhat see the idea behind, is that the barrier to entry is very different from for example drawing. To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware. One way or another, if you want to practice AI generated art, you need more money than what a pen and paper cost.

quadcore · 3 years ago

From what I read on the internet, people assume AI generated art is a difficult question legaly speaking. Some literally assume artists complain only because there are out competed.

I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak. I really dont see what's difficult with that case. I think the internet assume a bit to quickly it's a difficult question and a grey area when maybe it just isnt.

It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.

dragonwriter · 3 years ago

> I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

> That’s because you can’t find an artist for a generated picture other than the ones in the training set.

First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…

> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.

That’s not how copyright law works. The clearest example (not particularly germane to the computer generation case, but clearly illustrative of the fact that “can’t find another artist” is far from dispositive) is Fair Use noncommercial timeshifting of an existing work: it is extremely clear there is no artist but that of the original work, and yet it is not copyright infringement.

> I really dont see what’s difficult with that case.

You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.

Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.

> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.

IP law experts have said that the Fair Use argument is hard to resolve.

Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.

circuit10 · 3 years ago

It’s not as simple as that though because the algorithm does learn by itself and mostly just uses the training data to score itself against, it doesn’t directly copy it as some people seem to think. It can end up learning to copy things if it sees them enough times though

“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”

I don’t think that’s valid on its own as a way to completely discount considering how directly it’s using the data. As an extreme example, what if I averaged all the colours in the training data together and used the resulting colour as the seed for some randomly generated fractal or something? You could apply the same arguments - there is no artist except the original ones in the training set - and yet I don’t think any reasonable person would say that the result obviously belongs to every single copyright owner from the training set

mdp2021 · 3 years ago

> an artist for a generated picture

Normally - outside the specific context of AI generated art -, there is not a relation "work¹ → past author" , but "work → large amount of past experience". (¹"work": in the sense of product, output etc.)

If the generative AI is badly programmed, it will copy the style of Smith. If properly programmed, it will "take into account" the style of Smith. There is a difference between learning and copying. Your tool can copy - if you do it properly, it can learn.

All artists work in a way "post consideration of a finite number of past artists in their training set".

ModernMech · 3 years ago

But this person’s dog isn’t in the training set, so why should some artist be credited for a picture they never drew? Not a single person has drawn his dog before, now there is a drawing of his dog, and you want to credit someone who had no input to the creative process here?

GuB-42 · 3 years ago

> That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak.

It doesn't belong to the "old ones", it is at best a derivative work. And even writing a prompt, as trivial as it might seem, makes you an artist. There are modern artists exposing a random shit as art, and you may or may not like it, but they are legally artists, and it is their work.

The question is about fair use. That is, are you allowed to use pictures in the dataset without permission. It is a tricky question. On one extreme, you won't be able to do anything withing infringing some kind of copyright. Used the same color as I did? I will sue you. On the other extreme, you essentially abolish intellectual property. Copying another artist style in your own work is usually fair use, and that's essentially what generative AI do, so I guess that's how it will go, but it will most likely depends on how judges and legislators see the thing, and different countries probably will have different ideas.

cthalupa · 3 years ago

>That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak

We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.

Could you explain what portion of copyright law you believe would cover this argument? I'm not a lawyer, but have a passing familiarity with US copyright law, and in it, at least, I do not know of anything that would support the idea you're proposing here. How would you even assign copyright to the "old" artists? How are you going to determine what percentage of any given generation was influenced by artists X, Y, Z?

rendall · 3 years ago

> AI generative art is an easy case of copyright infrigement...

Agreed. An AI model trained on an artist's work without permission is IP infringement and this should be widely understood. Unfortunately, because the technology is new people do not understand this. When Photoshop was new, there was a similar misunderstanding. People could take an artist's work, run it through Photoshop, and then not compensate the artist. It took some time for that to sort out.

stavros · 3 years ago

I agree. This is a clear-cut case of copyright infringement, as is all art. After all, people painting images have only seen paintings other people painted.

numpad0 · 3 years ago

The only problem to that, and a big one, is that there’s no way to trace back to the image in the dataset from a final output of AI.

It’s a static mapping, surely it should be possible, you’d think, but NN frameworks aren’t designed that way. That is blocking it from happening(and also allowing “AI is just learning, human is same” fallacy)

mdp2021 · 3 years ago

The shruggingface submission is very interesting and very instructive.

Nonetheless, it would be odd and a weak argument to point criticism towards not spending adequate «time and effort» (as if it made sense to renounce tools and work through unnecessary fatigue and wasting time). More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

This said, I'd say that Stable Diffusion is a milestone of a tool, incredible to have (though difficult to control). I'd also say that the results of the latest Midjourney (though quite resistant to control) are at "speechless" level. (Noting in case some had not yet checked.)

Paul-Craft · 3 years ago

> More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

I don't get this. If one "can produce pleasing graphics," how does that not equal knowing what they're doing? I only see this as being true in the sense of "Sure, you can get places quickly in a car, but you don't really know how it works."

basisword · 3 years ago

> if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall. There’s probably a market for it, but I get the strong impression it’s the “live, laugh, love” market. The people that buy pictures for their wall in the supermarket. The kind of people who pay individual artists money to paint bespoke images of their pet are not going to frame AI art. I don’t think the artists need to worry.

AuryGlenz · 3 years ago

It’s completely what you make it, though. If what’s in the OP isn’t your style you could literally type in anything you want.

I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.

I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.

yellow_postit · 3 years ago

I would expect it’s only a matter of time till those “traditional” artists also adopt these tools into their workflows. Similar to the initial pushback against the “digital darkroom” which is now the mainstay of photography.

In-ai-aided art, like manually developed film, will trend towards a niche.

theaiquestion · 3 years ago

> This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall.

Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.

> I don’t think the artists need to worry.

I disagree here but only on the basis of what type of art it is. Stock art/photography, and a lot of media designwork is likely at risk because we can now create "good enough" art at the click of a button for almost no cost. I agree that the "hang on the wall level good" artists aren't at risk just yet, but between the more filler-art and the uh

Well "anime/furry" commissioners are definitely at risk right now for anything except the highest quality artists, and there is a MASSIVE community behind this - in fact they have done a lot of the innovation for StableDiffusion including optimizations/A1111 webui, and have trained many custom models for their art, already had pretagged datasets of 10k's of images....

Aeolun · 3 years ago

Eh, there might be a market for AI art. As long as the artist is guaranteed to have made only a single one of every piece.

odessacubbage · 3 years ago

aishit is a reverse turing test. if you find it's output exciting or impressive you can no longer qualify as human.

asddubs · 3 years ago

most of the criticism I've seen is that it's all trained on uncompensated stolen artwork. Much like how copilot is trained on GPL code, disregarding its license terms.

simonw · 3 years ago

The trained on stolen artwork critique is reasonable - I helped with one of the first big investigations into how that training data worked when Stable Diffusion first came out: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weekno...

It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

I'm under the impression that DALL-E itself used licensed data as well.

I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.

minimaxir · 3 years ago

The general argument (IANAL) is that it's Fair Use, in the same vein as Google Images or Internet Archive scraping and storing text/images. Especially since the outputs of generated images are not 1:1 to their source inputs, so it could be argued that it's a unique derivative work. The current lawsuits against Stability AI are testing that, although I am skeptical they'll succeed (one of the lawsuits argues that Stable Diffusion is just "lossy compression" which is factually and technically wrong).

There is an irony, however, that many of the AI art haters tend to draw fanart of IP they don't own. And if Fair Use protections are weakened, their livelihood would be hurt far more than those of AI artists.

The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.

userbinator · 3 years ago

AI is just showing us a fact that many are unwilling to admit: everything is a derivative work. Much like humans will memorise and regurgitate what they've seen.

Dead Comment

brucethemoose2 · 3 years ago

TBH it would be much easier with more streamlined tooling, especially if doing it locally with lora/lycoris.

Its kinda like using ffmpeg for vapoursynth for video editing instead of a video editing GUI.

That being said the training parameter/data tuning is definitely an art, as is the prompting.

davely · 3 years ago

I love the detailed workflow that OP posted. Dogs seem to be particularly good subject material for this.

I turned my dog into a robot awhile back using the img2img feature of Stable Diffusion and the results were pretty amazing![1]

[1] https://twitter.com/davely/status/1583233180177297408

quadcore · 3 years ago

a lot of which is motivated by a sense of unfairness

Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?

Not the AI, not the prompter, so the closest artists you can find for that picture are the ones who made the pictures in the training set. So generating a picture is outright copyright infringement. Nothing to do with unfairness in the sense of "artists get out compete". Artists dont get out compete - they are stolen.

ModernMech · 3 years ago

Typical Midjourney workflow involves constantly reprompting and fine tuning based on examples and input images. When you arrive at a given image in Midjourney, it’s often impossible to recreate it even with the same seed. You’ll need the input image as well, and the input image is often the result of a long creative process.

Why is it you discount the creative input of the user? Are they not doing work by guiding the agent? Don’t their choices of prompt, input image, and the refinement of subsequent generated images represent a creative process?