- Automatic1111 outpainting works well but you need to enable the outpainting script. I would recommend Outpainting MK2. What the author did was just resize with fill which doesn't do any diffusion on the outpainted sections.
- There are much better resizing workflows, at a minumum I would recommend using the "SD Upscale Script". However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control. In this case "SD Upscale" is fine but the inpaint based upscale works well with complex compositions.
- When training I would typically recommend to keep the background. This allows for a more versitile finetuned model.
- You can get a lot more control of final output by using ControlNet. This is especially great if you have illustration skills. But it is also great to generate varitions in a different style but keep the composition and details. In this case you could have taken a portrait photo of the subject and used ControlNet to adjust the style (without and finetuning required).
> However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control.
Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?
For what it's worth if you actually want to get help on the state of the art on this stuff the best place to ask is the 4chan /g/ /sdg/ threads, and you can absolutely diffuse images that large using TiledVAE and Mixture of Diffusers or Multidiffusion, both of which are part of the Tiled Diffusion plugin for auto1111.
Here's an example using various techniques I've gathered from those 4chan threads. (yes I know it's 4chan but just ignore the idiots and ask for catboxes, you'll learn much faster than anywhere else, at least that was the case for me after exhausting the resources on github/reddit/various discords)
That confused me at first too.You aren’t diffusing the 8k image.
You are upsampling, then inpainting sections that need it. So if you took your 8K and inpainted a section with 1024x1024 that works well with normal ram usages. In Auto1111, you need to select “inpainted masked area” to do that.
To clarify when things are upscaled like that they typically mean a section of img2img in a grid pattern to make up the full picture so it doesn’t overuse vram.
Good luck! I have some workflow videos on Youtube https://youtube.com/pjgalbraith. But I haven't had a chance to show off all the latest techniques yet.
There's a great deal of pushback against AI art from the wider online art community at the moment, a lot of which is motivated by a sense of unfairness: if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?
(I do not share this opinion myself, but it's something I've seen a lot)
This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.
> a lot of which is motivated by a sense of unfairness
This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.
Most legitimate pushback I've seen has been more on the non-consensual training of models. Many artists don't want their work to be sucked up into the "AI Borg Model" and then regurgitated by someone else, removing the artists consent, credit, and compensation.
I found it rare that those dead-set against AI art actually concede that it has value after you take copyright out of the equation, bringing up Adobe Firefly instead pivots the conversation to other, considerably weaker arguments.
Using stock art is just further appropriation, which is silly considering the intent and licensing of stock artwork is clearly intended by all parties to turn works into commodities for commercial exploitation.
The old ways are best, the new ways are bad and take away the soul from the creation process and resulting works. Also unconvincing considering that most of the people saying that are using radically different, digitized, heavily time optimized art workflows compared to norm of the industry even 30 years ago.
Not that I don't see the problems, the potential for job losses due to the optimizations to workflow requiring less work and therefore less workers is an actual risk, but one that happens regardless of copyright enforcement of AI models. The problems commercialized AI art workflows cause may even be exacerbated by enforcement of copyright on training data by handing a monopoly of all higher quality generative AI models into already entrenched multinational intellectual property rightsholders hands. I think a lot of artists forget that copyright isn't as much for them as it's for the Disneys of the world.
I absolutely have seen it. A lot. It's dressed up as Luddism, more often expressed as "you shouldn't be able to have those results because I spent years honing my craft" which may or may not be followed by "...and if we allow this, those years were wasted and I'm out of a job, along with millions of others".
SD base models can't really be used to imitate style of other artists reliably, because the datasets that they were trained on are a huge mess. Caption accuracy is all over the place. For example - Joan Cornella's work and Cyanide & Happiness comics are in LAION5B, but if you prompt SD to make art in their style you'll get something completely different. Try prompting for a "minigun" - you will also get something weird.
In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.
Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).
>> a lot of which is motivated by a sense of unfairness
> This is not something I've seen once in any sort of criticism of "AI art"
I've actually seen this a lot.
In my view, it's not coming from professional artists working in the field. Their concern is more that people are ripping off their style, or that AI is making their efforts unnecessary (e.g. lots of people who made a living by copying the style of particular anime & cartoons for fans, no longer have a purpose since AI can do that given enough source material).
Non-professional artists, on the other hand, are still learning and have put a lot of time into their craft and it hasn't paid off yet. They seem to be annoyed that other people are getting results (via AI), without actually having to learn the mechanics of art.
AI basically lets your generic art history major produce lots and lots of pieces, because they can describe artwork well enough and know where to find good samples for the AI. The only thing stopping them was mere mechanical inability, not knowledge of the art space.
Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?
I think compensation as a condition is only assumed implied, that financial gains are artists’ motives and they actually live off that income. Rather, I see a lot of vocal oppositions to AI image generators that aren’t drawing for profit at all.
So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?
>Most legitimate pushback I've seen has been more on the non-consensual training of models
Look at the pushback to Adobe’s model.
“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.
The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.
Unfortunately it's become a meme among AI art haters that AI art is "just inputing text into a text box" despite the fact that is far from the truth, particularly if you want to get specific results as this blog post demonstrates.
Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.
> Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.
Indeed. Another criticism that I can definitely somewhat see the idea behind, is that the barrier to entry is very different from for example drawing. To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware. One way or another, if you want to practice AI generated art, you need more money than what a pen and paper cost.
From what I read on the internet, people assume AI generated art is a difficult question legaly speaking. Some literally assume artists complain only because there are out competed.
I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.
That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak. I really dont see what's difficult with that case. I think the internet assume a bit to quickly it's a difficult question and a grey area when maybe it just isnt.
It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.
> I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.
> That’s because you can’t find an artist for a generated picture other than the ones in the training set.
First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…
> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.
That’s not how copyright law works. The clearest example (not particularly germane to the computer generation case, but clearly illustrative of the fact that “can’t find another artist” is far from dispositive) is Fair Use noncommercial timeshifting of an existing work: it is extremely clear there is no artist but that of the original work, and yet it is not copyright infringement.
> I really dont see what’s difficult with that case.
You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.
Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.
> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.
IP law experts have said that the Fair Use argument is hard to resolve.
Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.
It’s not as simple as that though because the algorithm does learn by itself and mostly just uses the training data to score itself against, it doesn’t directly copy it as some people seem to think. It can end up learning to copy things if it sees them enough times though
“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”
I don’t think that’s valid on its own as a way to completely discount considering how directly it’s using the data. As an extreme example, what if I averaged all the colours in the training data together and used the resulting colour as the seed for some randomly generated fractal or something? You could apply the same arguments - there is no artist except the original ones in the training set - and yet I don’t think any reasonable person would say that the result obviously belongs to every single copyright owner from the training set
Normally - outside the specific context of AI generated art -, there is not a relation "work¹ → past author" , but "work → large amount of past experience". (¹"work": in the sense of product, output etc.)
If the generative AI is badly programmed, it will copy the style of Smith. If properly programmed, it will "take into account" the style of Smith. There is a difference between learning and copying. Your tool can copy - if you do it properly, it can learn.
All artists work in a way "post consideration of a finite number of past artists in their training set".
But this person’s dog isn’t in the training set, so why should some artist be credited for a picture they never drew? Not a single person has drawn his dog before, now there is a drawing of his dog, and you want to credit someone who had no input to the creative process here?
> That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak.
It doesn't belong to the "old ones", it is at best a derivative work. And even writing a prompt, as trivial as it might seem, makes you an artist. There are modern artists exposing a random shit as art, and you may or may not like it, but they are legally artists, and it is their work.
The question is about fair use. That is, are you allowed to use pictures in the dataset without permission. It is a tricky question. On one extreme, you won't be able to do anything withing infringing some kind of copyright. Used the same color as I did? I will sue you. On the other extreme, you essentially abolish intellectual property. Copying another artist style in your own work is usually fair use, and that's essentially what generative AI do, so I guess that's how it will go, but it will most likely depends on how judges and legislators see the thing, and different countries probably will have different ideas.
>That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak
We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.
Could you explain what portion of copyright law you believe would cover this argument? I'm not a lawyer, but have a passing familiarity with US copyright law, and in it, at least, I do not know of anything that would support the idea you're proposing here. How would you even assign copyright to the "old" artists? How are you going to determine what percentage of any given generation was influenced by artists X, Y, Z?
> AI generative art is an easy case of copyright infrigement...
Agreed. An AI model trained on an artist's work without permission is IP infringement and this should be widely understood. Unfortunately, because the technology is new people do not understand this. When Photoshop was new, there was a similar misunderstanding. People could take an artist's work, run it through Photoshop, and then not compensate the artist. It took some time for that to sort out.
I agree. This is a clear-cut case of copyright infringement, as is all art. After all, people painting images have only seen paintings other people painted.
The only problem to that, and a big one, is that there’s no way to trace back to the image in the dataset from a final output of AI.
It’s a static mapping, surely it should be possible, you’d think, but NN frameworks aren’t designed that way. That is blocking it from happening(and also allowing “AI is just learning, human is same” fallacy)
The shruggingface submission is very interesting and very instructive.
Nonetheless, it would be odd and a weak argument to point criticism towards not spending adequate «time and effort» (as if it made sense to renounce tools and work through unnecessary fatigue and wasting time). More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".
This said, I'd say that Stable Diffusion is a milestone of a tool, incredible to have (though difficult to control). I'd also say that the results of the latest Midjourney (though quite resistant to control) are at "speechless" level. (Noting in case some had not yet checked.)
> More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".
I don't get this. If one "can produce pleasing graphics," how does that not equal knowing what they're doing? I only see this as being true in the sense of "Sure, you can get places quickly in a car, but you don't really know how it works."
> if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?
This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall. There’s probably a market for it, but I get the strong impression it’s the “live, laugh, love” market. The people that buy pictures for their wall in the supermarket. The kind of people who pay individual artists money to paint bespoke images of their pet are not going to frame AI art. I don’t think the artists need to worry.
It’s completely what you make it, though. If what’s in the OP isn’t your style you could literally type in anything you want.
I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.
I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.
I would expect it’s only a matter of time till those “traditional” artists also adopt these tools into their workflows. Similar to the initial pushback against the “digital darkroom” which is now the mainstay of photography.
In-ai-aided art, like manually developed film, will trend towards a niche.
> This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall.
Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.
> I don’t think the artists need to worry.
I disagree here but only on the basis of what type of art it is. Stock art/photography, and a lot of media designwork is likely at risk because we can now create "good enough" art at the click of a button for almost no cost. I agree that the "hang on the wall level good" artists aren't at risk just yet, but between the more filler-art and the uh
Well "anime/furry" commissioners are definitely at risk right now for anything except the highest quality artists, and there is a MASSIVE community behind this - in fact they have done a lot of the innovation for StableDiffusion including optimizations/A1111 webui, and have trained many custom models for their art, already had pretagged datasets of 10k's of images....
most of the criticism I've seen is that it's all trained on uncompensated stolen artwork. Much like how copilot is trained on GPL code, disregarding its license terms.
It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.
I'm under the impression that DALL-E itself used licensed data as well.
I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.
The general argument (IANAL) is that it's Fair Use, in the same vein as Google Images or Internet Archive scraping and storing text/images. Especially since the outputs of generated images are not 1:1 to their source inputs, so it could be argued that it's a unique derivative work. The current lawsuits against Stability AI are testing that, although I am skeptical they'll succeed (one of the lawsuits argues that Stable Diffusion is just "lossy compression" which is factually and technically wrong).
There is an irony, however, that many of the AI art haters tend to draw fanart of IP they don't own. And if Fair Use protections are weakened, their livelihood would be hurt far more than those of AI artists.
The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.
AI is just showing us a fact that many are unwilling to admit: everything is a derivative work. Much like humans will memorise and regurgitate what they've seen.
a lot of which is motivated by a sense of unfairness
Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?
Not the AI, not the prompter, so the closest artists you can find for that picture are the ones who made the pictures in the training set. So generating a picture is outright copyright infringement. Nothing to do with unfairness in the sense of "artists get out compete". Artists dont get out compete - they are stolen.
Typical Midjourney workflow involves constantly reprompting and fine tuning based on examples and input images. When you arrive at a given image in Midjourney, it’s often impossible to recreate it even with the same seed. You’ll need the input image as well, and the input image is often the result of a long creative process.
Why is it you discount the creative input of the user? Are they not doing work by guiding the agent? Don’t their choices of prompt, input image, and the refinement of subsequent generated images represent a creative process?
I've done so much with a fine-tuned model of my dog.
I previously made coloring pages for my daughter of our dog as an astronaut, wild west sheriff, etc. They're the first pages she ever "colored," which was pretty special for us. Currently I'm working on making her into every type of Pokemon, just for fun.
I uploaded a couple of the Pokemon generations really quick as examples. I still need to go through and do quick fixes for double tails (the tails on Pokemon are not where they are on regular animals, apparently), watermarks, etc. and do a quick Img2Img on them.
StableTuner to fine tune the model - I can't recall the name of the model I trained on top of, but it was one of the top "broad" 1.5 based models on Civitai. Automatic1111 to do the actual generating. I used an anime line art LoRA (at a low weight) along with an offset noise LoRA for the coloring book pages as otherwise SD makes images be perfectly exposed. For something like that you obviously want a lot more white than black.
EveryDream2 would be another good tuning solution. Unfortunately that end of things is far from easy. There are a lot of parameters to change and it's all a bit of a mess. I had an almost impossible time doing it with pictures of my niece, my wife is hit or miss, her sister worked really well for some reason, and our dog was also pretty easy.
Stable Diffusion can run on Intel CPUs through OpenVINO if you don't have a GPU or the funds to rent one online (Google Collab is often used). You still need a decent amount of RAM (running SD takes about 8GB, training seems to run at 6-8GB) so I'd consider 12 or 16GiB of RAM to be a requirement.
There's a huge difference in performance (generating an image takes 10 minutes rather than 10 seconds and training a model would take forever) but with some Python knowledge and a lot of patience it can be done.
Apple's Intel Macbooks are infamous for their insufficient cooling design for the CPUs they chose, which won't help maintaining a high clock speed for extended durations of time; you may want to find a way to help cool the laptop down to give the chip a chance to boost more, and to prevent prolonged high temperatures from wearing down the hardware quicker.
I did something loosely related. As a present for my girlfriend's birthday, I made her a "90s website" with AI portraits of her dog: https://simoninman.github.io/
It wasn't actually particularly hard - I used a Colab notebook on the free tier to fine-tune the model, and even got chatGPT to write some of the prompts.
In my (limited) experience, dogs seem to be easier than people for fine-tuning - especially if your end result is going to be artsy. Faces of people you know well being off in slight ways really throws you off, but with dogs there's a bit more leeway.
He mentions the Colab for Dreambooth, that only takes ten minutes or so to train using an A100 (the premium GPU) and you can have it turn off after it finishes, and saves to Google Drive. Super easy.
I've trained a few smaller models using their Dreambooth notebook, but I think for 4000 training steps, an A100 will usually take 30-40min. I believe replicate also uses A100s for their dreambooth training jobs.
Ah I see, you're right 40 minutes sounds about right for that amount of training. Curious why the decision to train 40 images? I've used 15 for two separate subjects in Dreambooth with excellent results. I'm no expert, experimenting the same way as you, but haven't trained on more than 15-20 images per subject.
I've found the most important part is spending a good amount of time getting the prompts, although I'm not sure if having the person in an environment embodied and describing the objects around them helps give the model a "sense of scale"? For example if I just train "wincy" in the fast Dreambooth "wincy" will be the only token it'll know, with no other info in the prompts, it didn't know what in the image was "wincy" (me). I accidentally did this on training my wife (no prompts at all) and she got really mad at me at how ugly the results were (you made me ugly! haha)
Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?
People have been sending me the cute pics the AI generates of their pups. I think this is arguably the best thing so far in this latest wave of AI releases!
Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?
https://i.imgur.com/zOMarKc.jpg
Here's an example using various techniques I've gathered from those 4chan threads. (yes I know it's 4chan but just ignore the idiots and ask for catboxes, you'll learn much faster than anywhere else, at least that was the case for me after exhausting the resources on github/reddit/various discords)
You are upsampling, then inpainting sections that need it. So if you took your 8K and inpainted a section with 1024x1024 that works well with normal ram usages. In Auto1111, you need to select “inpainted masked area” to do that.
https://github.com/zero01101/openOutpaint
https://github.com/BlinkDL/Hua
Both use automatic1111 API for the work.
There's a great deal of pushback against AI art from the wider online art community at the moment, a lot of which is motivated by a sense of unfairness: if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?
(I do not share this opinion myself, but it's something I've seen a lot)
This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.
This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.
Most legitimate pushback I've seen has been more on the non-consensual training of models. Many artists don't want their work to be sucked up into the "AI Borg Model" and then regurgitated by someone else, removing the artists consent, credit, and compensation.
Using stock art is just further appropriation, which is silly considering the intent and licensing of stock artwork is clearly intended by all parties to turn works into commodities for commercial exploitation.
The old ways are best, the new ways are bad and take away the soul from the creation process and resulting works. Also unconvincing considering that most of the people saying that are using radically different, digitized, heavily time optimized art workflows compared to norm of the industry even 30 years ago.
Not that I don't see the problems, the potential for job losses due to the optimizations to workflow requiring less work and therefore less workers is an actual risk, but one that happens regardless of copyright enforcement of AI models. The problems commercialized AI art workflows cause may even be exacerbated by enforcement of copyright on training data by handing a monopoly of all higher quality generative AI models into already entrenched multinational intellectual property rightsholders hands. I think a lot of artists forget that copyright isn't as much for them as it's for the Disneys of the world.
In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.
Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).
I've actually seen this a lot.
In my view, it's not coming from professional artists working in the field. Their concern is more that people are ripping off their style, or that AI is making their efforts unnecessary (e.g. lots of people who made a living by copying the style of particular anime & cartoons for fans, no longer have a purpose since AI can do that given enough source material).
Non-professional artists, on the other hand, are still learning and have put a lot of time into their craft and it hasn't paid off yet. They seem to be annoyed that other people are getting results (via AI), without actually having to learn the mechanics of art.
AI basically lets your generic art history major produce lots and lots of pieces, because they can describe artwork well enough and know where to find good samples for the AI. The only thing stopping them was mere mechanical inability, not knowledge of the art space.
Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?
I think compensation as a condition is only assumed implied, that financial gains are artists’ motives and they actually live off that income. Rather, I see a lot of vocal oppositions to AI image generators that aren’t drawing for profit at all.
So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?
Look at the pushback to Adobe’s model.
“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.
The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.
Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.
Even if they did have a more complex workflow most of them are still based on copyrighted training data, so there will be many lawsuits.
Then why don’t they illustrate it instead, and save themselves some time?
Indeed. Another criticism that I can definitely somewhat see the idea behind, is that the barrier to entry is very different from for example drawing. To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware. One way or another, if you want to practice AI generated art, you need more money than what a pen and paper cost.
I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.
That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak. I really dont see what's difficult with that case. I think the internet assume a bit to quickly it's a difficult question and a grey area when maybe it just isnt.
It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.
> That’s because you can’t find an artist for a generated picture other than the ones in the training set.
First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…
> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.
That’s not how copyright law works. The clearest example (not particularly germane to the computer generation case, but clearly illustrative of the fact that “can’t find another artist” is far from dispositive) is Fair Use noncommercial timeshifting of an existing work: it is extremely clear there is no artist but that of the original work, and yet it is not copyright infringement.
> I really dont see what’s difficult with that case.
You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.
Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.
> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.
IP law experts have said that the Fair Use argument is hard to resolve.
Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.
“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”
I don’t think that’s valid on its own as a way to completely discount considering how directly it’s using the data. As an extreme example, what if I averaged all the colours in the training data together and used the resulting colour as the seed for some randomly generated fractal or something? You could apply the same arguments - there is no artist except the original ones in the training set - and yet I don’t think any reasonable person would say that the result obviously belongs to every single copyright owner from the training set
Normally - outside the specific context of AI generated art -, there is not a relation "work¹ → past author" , but "work → large amount of past experience". (¹"work": in the sense of product, output etc.)
If the generative AI is badly programmed, it will copy the style of Smith. If properly programmed, it will "take into account" the style of Smith. There is a difference between learning and copying. Your tool can copy - if you do it properly, it can learn.
All artists work in a way "post consideration of a finite number of past artists in their training set".
It doesn't belong to the "old ones", it is at best a derivative work. And even writing a prompt, as trivial as it might seem, makes you an artist. There are modern artists exposing a random shit as art, and you may or may not like it, but they are legally artists, and it is their work.
The question is about fair use. That is, are you allowed to use pictures in the dataset without permission. It is a tricky question. On one extreme, you won't be able to do anything withing infringing some kind of copyright. Used the same color as I did? I will sue you. On the other extreme, you essentially abolish intellectual property. Copying another artist style in your own work is usually fair use, and that's essentially what generative AI do, so I guess that's how it will go, but it will most likely depends on how judges and legislators see the thing, and different countries probably will have different ideas.
We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.
Could you explain what portion of copyright law you believe would cover this argument? I'm not a lawyer, but have a passing familiarity with US copyright law, and in it, at least, I do not know of anything that would support the idea you're proposing here. How would you even assign copyright to the "old" artists? How are you going to determine what percentage of any given generation was influenced by artists X, Y, Z?
Agreed. An AI model trained on an artist's work without permission is IP infringement and this should be widely understood. Unfortunately, because the technology is new people do not understand this. When Photoshop was new, there was a similar misunderstanding. People could take an artist's work, run it through Photoshop, and then not compensate the artist. It took some time for that to sort out.
It’s a static mapping, surely it should be possible, you’d think, but NN frameworks aren’t designed that way. That is blocking it from happening(and also allowing “AI is just learning, human is same” fallacy)
Nonetheless, it would be odd and a weak argument to point criticism towards not spending adequate «time and effort» (as if it made sense to renounce tools and work through unnecessary fatigue and wasting time). More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".
This said, I'd say that Stable Diffusion is a milestone of a tool, incredible to have (though difficult to control). I'd also say that the results of the latest Midjourney (though quite resistant to control) are at "speechless" level. (Noting in case some had not yet checked.)
I don't get this. If one "can produce pleasing graphics," how does that not equal knowing what they're doing? I only see this as being true in the sense of "Sure, you can get places quickly in a car, but you don't really know how it works."
This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall. There’s probably a market for it, but I get the strong impression it’s the “live, laugh, love” market. The people that buy pictures for their wall in the supermarket. The kind of people who pay individual artists money to paint bespoke images of their pet are not going to frame AI art. I don’t think the artists need to worry.
I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.
I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.
In-ai-aided art, like manually developed film, will trend towards a niche.
Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.
> I don’t think the artists need to worry.
I disagree here but only on the basis of what type of art it is. Stock art/photography, and a lot of media designwork is likely at risk because we can now create "good enough" art at the click of a button for almost no cost. I agree that the "hang on the wall level good" artists aren't at risk just yet, but between the more filler-art and the uh
Well "anime/furry" commissioners are definitely at risk right now for anything except the highest quality artists, and there is a MASSIVE community behind this - in fact they have done a lot of the innovation for StableDiffusion including optimizations/A1111 webui, and have trained many custom models for their art, already had pretagged datasets of 10k's of images....
It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.
I'm under the impression that DALL-E itself used licensed data as well.
I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.
There is an irony, however, that many of the AI art haters tend to draw fanart of IP they don't own. And if Fair Use protections are weakened, their livelihood would be hurt far more than those of AI artists.
The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.
Dead Comment
Its kinda like using ffmpeg for vapoursynth for video editing instead of a video editing GUI.
That being said the training parameter/data tuning is definitely an art, as is the prompting.
I turned my dog into a robot awhile back using the img2img feature of Stable Diffusion and the results were pretty amazing![1]
[1] https://twitter.com/davely/status/1583233180177297408
Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?
Not the AI, not the prompter, so the closest artists you can find for that picture are the ones who made the pictures in the training set. So generating a picture is outright copyright infringement. Nothing to do with unfairness in the sense of "artists get out compete". Artists dont get out compete - they are stolen.
Why is it you discount the creative input of the user? Are they not doing work by guiding the agent? Don’t their choices of prompt, input image, and the refinement of subsequent generated images represent a creative process?
I previously made coloring pages for my daughter of our dog as an astronaut, wild west sheriff, etc. They're the first pages she ever "colored," which was pretty special for us. Currently I'm working on making her into every type of Pokemon, just for fun.
https://imgur.com/a/11OxoSA
StableTuner to fine tune the model - I can't recall the name of the model I trained on top of, but it was one of the top "broad" 1.5 based models on Civitai. Automatic1111 to do the actual generating. I used an anime line art LoRA (at a low weight) along with an offset noise LoRA for the coloring book pages as otherwise SD makes images be perfectly exposed. For something like that you obviously want a lot more white than black.
EveryDream2 would be another good tuning solution. Unfortunately that end of things is far from easy. There are a lot of parameters to change and it's all a bit of a mess. I had an almost impossible time doing it with pictures of my niece, my wife is hit or miss, her sister worked really well for some reason, and our dog was also pretty easy.
There's a huge difference in performance (generating an image takes 10 minutes rather than 10 seconds and training a model would take forever) but with some Python knowledge and a lot of patience it can be done.
Apple's Intel Macbooks are infamous for their insufficient cooling design for the CPUs they chose, which won't help maintaining a high clock speed for extended durations of time; you may want to find a way to help cool the laptop down to give the chip a chance to boost more, and to prevent prolonged high temperatures from wearing down the hardware quicker.
Seems like lots of work went into that and I hope the author enjoyed the process and enjoys the final result.
It wasn't actually particularly hard - I used a Colab notebook on the free tier to fine-tune the model, and even got chatGPT to write some of the prompts.
Here's the colab notebook, in case anyone is interested: https://github.com/TheLastBen/fast-stable-diffusion
I've trained a few smaller models using their Dreambooth notebook, but I think for 4000 training steps, an A100 will usually take 30-40min. I believe replicate also uses A100s for their dreambooth training jobs.
I've found the most important part is spending a good amount of time getting the prompts, although I'm not sure if having the person in an environment embodied and describing the objects around them helps give the model a "sense of scale"? For example if I just train "wincy" in the fast Dreambooth "wincy" will be the only token it'll know, with no other info in the prompts, it didn't know what in the image was "wincy" (me). I accidentally did this on training my wife (no prompts at all) and she got really mad at me at how ugly the results were (you made me ugly! haha)
Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?
dreamlook.ai
Upload your pictures, we train the model in a few minutes, then you can download your trained checkpoint. $1/model, first one for free.
For app builders, we provide a solid API that scales to 1000s of runs per day without breaking a sweat.
People have been sending me the cute pics the AI generates of their pups. I think this is arguably the best thing so far in this latest wave of AI releases!
This would have been much better standalone.