> But I can't even get cartoons to most people for free now, without doing unpaid work for the profit-making companies who own the most use channels of communication
This is the sticking point for me. OpenAI isn't a profit-making company, but it's certainly a valuable company. A valuable company that is built from the work of content others created without transferring any value back to them. Regardless of legalities, that's wrong to me.
Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI's products have? If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.
>> If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.
But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it. If one doesn't want anyone to see/consume or be influenced by one's copyrighted work, then lock it in a box and don't show it to anyone.
I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before. Visual art may be approaching a similar point. No matter how original you think your drawings are, someone out there has already done something similar. They may not have created exactly the same image, but neither does AI literally copy images. That reality doesn't kill visual arts as it didn't kill off the fashion industry.
I firmly believe that training models qualifies as fair use. I think it falls under research, and is used to push the scientific community forward.
I also firmly believe that commercializing models built on top of copyrighted works (which all works start off as) does not qualify as fair use (or at least shouldn't) and that commercializing models build on copyrighted material is nothing more than license laundering. Companies that commercialize copyrighted work in this manner should be paying for a license to train with the data, or should stick to using the licenses that the content was released under.
I don't think your example is valid either. The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that. That is literally what they are and how they work. That is very different than people having similar styles.
Sorry, but these arguments by analogy are patently ridiculous.
We are not talking about the eons old human practice of creative artistic endeavor, which yes, is clearly derivative in some fashion, but which we have well established practices around.
We are discussing a new phenomenon of mass replication or derivation by machine at a scale impossible for a single individual to achieve by manual effort.
Further, artists tend to either explicitly or implicitly acknowledge their priors in secondary or even primary material, much like one cites work in an academic context.
Also, the claim:
>But if I take your work and compare it to millions of other people's work...
Is ridiculous. A. you haven't, nor will you ever actual do this. B. This is never how the system of artistic practice up to this point has worked precisely because this sort of activity is beyond the scale of human effort.
In addition, plagiarism exists and is bad. There's no reason that concept can be extended and expanded to include stochastic reproduction at scale.
If you feel artists shouldn't have a say and a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands, by all means. But this argument that somehow by analogy to human behavior companies should not be responsible for the vast use of material without permission is absolutely preposterous. These are machines owned by companies. They are not human beings and they do not participate in the social systems of human beings the way human beings do. You may want to consider a distinction in the rules that adequately reflects this distinction in participatory status in a social system.
Because we are humans and our capability of abusing those rights is limited. The scale and speed at which LLMs can abuse copyrighted work to threaten the livelihoods of the authors of those works is reason enough to consider it unethical.
I agree with the other commenters about the scale of this “deriving inspiration from others” is where this feels wrong.
It feels similar to the ye olden debates on police surveillance. Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with. Collating your behavior across every website and device you own from a data broker is fundamentally the same thing as a single phone’s wiretap, but it obviously feels way grosser and more unethical because it scales way past the point of what you’d imagine as being acceptable.
I'm not as interested in making a technical/legal argument, as I'm just sharing my feelings on the topic (and eventually, what I think the law should be), but during training copies are made of copyrighted material, even if the model doesn't contain exact copies of work. Crawling, downloading, storing (temporarily) for training all involve making copies, and thus are subject to copyright law. Maybe those copies are fair use, maybe it's not (I think it shouldn't be).
My main point is that OpenAI is generating an incredible amount of value all hinging on other people's work at a massive scale, without paying for their materials. Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho.
I do think it's worth remembering there's a difference between "legal" and "good".
It's entirely legal for me to leave the pub every time it comes up to my round. It's legal for me to get into a lift and press all the buttons.
It's not unreasonable I think for people to be surprised at what is now possible. I'm personally shocked at the progress in the last few years - I'd not have guessed five years ago that putting a picture online might result in my style being easily recreated by anyone for the benefit mostly of a profitable company.
Another example is proprietary software that may have it's source available, either intentionally or not. If you view this and then work on something related to it, like WINE for example, you are definitely at risk of being successfully sued.
If you worked at MicroSoft and worked on Windows, you would not be able to participate in WINE development at all without violating copyright.
If you viewed leaked Windows source code you also would not be able to participate in WINE development.
An interesting question that I have, is whether training on proprietary, non-trade-secret sources would be allowed. Something like unreal engine, where you can view the source but it's still proprietary.
Another question is whether training on leaked sources of proprietary and private but non-trade-secret code, like source dumps of Windows is legal.
Let's say I'm an artist. I have, thus far, distributed my art for consumption without cost, because I want people to engage with and enjoy it. But, for whatever reason, I have a deep, irrational philosophical objection to corporate profit. I want to preclude any corporation from ever using my art to turn a profit, when at all possible. I have accepted that in some sense, electrical and internet corporations will be turning a profit using my work, but cannot stomach AI corporations doing so. If I cannot preclude AI corporations from turning a profit using my work, I will stop producing and distributing my work.
Do you think it's reasonable for me to want some legal framework that allows me to explicitly deny that use of my work? Because I do.
Copyright is a bad idea in the first place, and should just be thrown out entirely; but that isn't the whole picture here.
If OpenAI is allowed to be ignorant of copyright, then the rest of us should be allowed, too.
The problem is that OpenAI (alongside a handful of other very large corporations) gets exclusive rights to that ignorance. They get to monopolize the un-monopoly. That's even worse than the problem we started with.
Who is "we" here? Are you making a distinction between people and machines? If I built a machine that randomly copied from a big sample of arts that I wanted, would that machine be ok?
OpenAI built a machine that does exactly that. They just sampled _everyone_.
Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations? Honestly wondering. This seems to be the crux issue here.
The artist in the article clearly states that his work was free to use only if it was not used to make a profit, those were the terms of their license. In the artist's opinion, OpenAI violated that license by training their tool on their work and then selling that tool.
This artist doesn't complain about work similar to their own being generated, and their artwork is very clearly not clothing.
Well, not exactly. Certain uses are fair. The question is does OpenAI's use count as fair. I don't think your immediate response comes close to addressing that question despite your conviction it does otherwise.
Also, clothing designs are copyrightable. The conviction expressed by some participants in this debate is exhausting in light of their familiarity with actual copyright law.
Copyright is just made up for pragmatic purposes. To incentive creation. It does not matter if training models is not the same as reproducing something exactly if we wish to decide that it's unfair or even just desirable for economic incentive to disallow it, then we are free to make that decision. The trade offs are fairly profound in both directions I think and likely some compromise will need to be made that is fair to all parties and does not cripple economic and social progress.
>But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.
It's important to consider in any legalistic argument over copyright that, unlike conventional property rights which are to some degree prehistoric, copyright is a recent legal construct that was developed for a particular economic purpose.
The existing standards of fair use are what they are because copyright was developed with supporting the art industry as an intentional goal, not because it was handed down from the heavens or follows a basic human instinct. Ancient playwrights clipped each others' ideas liberally; late medieval economists observed that restricting this behavior seemed to encourage more creativity. Copyright law is a creation of humans, for humans, and is subordinate to moral and economic reasoning, not prior to it.
Clothes are inherently consumable goods. If you use them, they will wear out. If you do not use them, they still age over time. You cannot "copy" a piece of clothing without a truly astonishing amount of effort. Both the processes, and the materials, may be difficult or impossible to imitate without a very large investment of effort.
Compare this to digital art: You can copy it literally for free. Before AI, at least you had to copy it mostly verbatim (modulo some relatively boring transforms, like up/down-scaling, etc.). That limited artist's incomes, but not their future works. But in a post-AI world, you can suck in an artist's life's work, and generate an unlimited number of copycats. Right now, the quality of those might be insufficient to be true replacements, but it's not hard to imagine we'll be in a world not so far off when it will be sufficient, and then artists will be truly screwed.
> We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.
In theory: sure
In practice: not really, especially when you're small and the other side is big and has lots of lawyers and/or lawmakers in their pockets.
Disney ("In 1989, for instance, the company even threatened to sue three Florida daycare centers unless they removed murals featuring some of its characters") and Deutsche Telekom[1][2] ("the company's actions just smack of corporate bully tactics, where legions of lawyers attempt to hog natural resources — in this case a primary color — that rightfully belong to everyone") are just two examples that spring to mind.
people and companies are copying copyrighted content when they're using datasets that contain copyrighted content (which also repackage and distribute copyrighted content - not just as links but as actual works/images too), download linked copyrighted content, and store that copyrighted content. plenty of copies created and stored, it seems to me.
and like, what, do you think they're trying their damnedest to keep datasets clean and to not store any images in the process? how do you think they retrain on datasets over and over? it's really simple - by storing terabytes of copyrighted content. for ease of use, of course - why download something over and over, if you can just download it and keep it. and if they really wanted to steer clear of copyright infringement, if there's truly "no good solution" (which is bullshit for compute, oh, they can compute everything but not that part) - why can't they just refrain from recklessly scraping everything, if something were to just 'slip in'? like, if you know it's kinda bad, just don't do the thing, right? well, maybe copyright infringement is just acceptable to them. if not the actual goal.
what they generate is kinda irrelevant - there's plenty of copyright infringement happening even before any training were to be done. assembling of datasets and bad datasets containing copyrighted content are the start and the core of the copyright problems.
there's a really banal thing at the core of this, and it's just a multi-TB storage filled with pirated works.
If training a model is fair use than model output should also fallow fair use criteria. The very first thing you can find on the internet about fair use is Wikipedia article on the topic. It lists a bunch of factors to decide whether something is fair use. The very first one has a quote from an old copyright case:
> [A] reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.
Most use of LLMs and image generation models do not produce criticism of their training data. The most common use is to produce similar works. You can find this very common “trick” to get a specific style of output to add “in style of <artist>”. Is this a direct way "to supersede the use of the original work”?
You can certainly see how other factors more or less put gen ai output into the grey zone.
The fact that clothing doesn’t qualify for copyright doesn’t mean text and images don’t. Or if you advocate that they don’t then you pretty much advocate for abolishment of copyright because those are the major areas of copyright applicability at the moment. Which is a stance to have but you’d probably be better to actually say that because saying that copyright applies to some images and text but not others is a much harder position to defend.
>I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
Just like the rest of AI, if your argument is "humans can already do this by hand, why is it a problem to let machines do it?", its because you are incorrectly valuing the labor that goes into doing it by hand. If doing X that has potentially negative side effect Y, then the human labor to accomplish X is the principle barrier to Y, which can be mitigated via existing structures. Remove the labor barrier, and the existing mitigation structures cease to be effective. The fact that we never deliberately established those barriers is irrelevant to the fact that our society expects them to be there.
I feel the emotionally charged nature of the topic prevents a lot of rational discussion from taking place. That's totally understandable too, it's the livelihood for some of those involved. Unless we start making specific regulations for Generative AI, current copyright law is pretty clear: you can't call your art a Picasso, but you can certainly say it was inspired by Picasso. The difference is that GAI can do it much faster and cheaper. The best middle ground in my opinion is to allow GAI to train on copyrighted data, but the output cannot be copyrighted, and the model weights creating it can't be copyrighted either. Any works modified by a human attempting to gain copyright protection should have to fulfill the requirements to be substantiative and transformative just as fair use requires now.
I think there is a case to be made when AI models do produce copies. For instance, I think the NYT have a right to have an issue with the near verbatim recall of NYT articles. It's not clear cut though, when these models produce copies, they are not functioning as intended. Legally that might produce a quagmire, is it fair use when you intend to be transformative but by accident it isn't? Does it matter if you have no control over which bits are not transformative? Does it matter if you know in advance that some bits will be non transformative but you don't know which ones.
I presume there are people working on research relating to how to prevent output of raw training data, what is the state of the art in this area? Would it be sufficient to prevent output of the training data or should the models be required to have no significant internal copies of training examples?
> This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before.
Most every fashion company has a legal team that reviews print and pattern, as well as certain other aspects of design, relative to any source of inspiration. My husband works in the industry and has to send everything he does for review in this way. I’m not sure where you got the idea that there are no IP protections for fashion, but this is untrue.
AI doing things that human laboriously learned and inspired from is just different. After all, sheer quantity can be its own quality, especially with AI learning.
Now, i am worried about companies like OpenAI monopolizing technology through making their technology proprietary. I think their output should be public domain and copyright should only apply to human authors if they should be at all.
OpenAI is very much a for-profit company with the same incentives to make money as every other US for-profit company. I understand that there's another company that is a non-profit and that company bosses the for-profit company. In my opinion, that's more of a footnote in their governance story, it doesn't make OpenAI a non-profit. Their website says...
"A new for-profit subsidiary would be formed, capable of issuing equity to raise capital and hire world class talent, but still at the direction of the Nonprofit. Employees working on for-profit initiatives were transitioned over to the new subsidiary."
Particularly after the whole Sam Altman debacle, regardless of one thinks that the board was being logical or not and regardless of whether anyone thinks that Sam Altman should have been fired, it's still very clear that the non-profit side of the company is not in control of the for-profit side.
We've seen zero evidence that the non-profit side of OpenAI meaningfully constrains the for-profit side in any way, and have seen direct evidence that when the non-profit and for-profit groups disagree with each other, the for-profit side wins.
This seems like a strange criticism to me - if you're posting your illustrations on social media, it's presumably because you feel that you're getting value out of doing so. Who cares if they're also getting value out of you doing it, particularly when that value comes at no cost to you?
If you sell your art, then art marketplaces and printers and shipping services all profit from your work, but I don't imagine she's complaining about that. What's the difference? In all of those cases, as with social media, companies are making money from your work in return for providing a useful service to you (and one you don't have to use if you don't think it's useful).
I see it differently. To me, if you post your work online as an artist, it's really there for everyone to view and be inspired by. As long as nobody copies it verbatim, don't think you've been hurt by any other usage. If another artist views it, and is inspired by it... so be it. If an AI views it, and is inspired by it, again, no harm done.
AI doesn't get inspired. It's not human. It adds everything about it to its endless stream of levers to pull, and if you pull the right ones, it will just give you the source verbatim as proven by the NYT lawsuit filing where it was just outputting unaltered copywritten NYT article text.
> If an AI views it, and is inspired by it, again, no harm done.
You had me till that^ line. In your example if "inspired" human start competing with you, then there is harm. If the inspired human is replaced by an AI, then it also harms. By harm I am referring to competition.
So instead of saying "no harm done", then maybe its more accurate to say "same harm as a other humans being inspired by your work".
> Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI's products have? If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content
Couldn't we say the same thing about search engines?
What value would google have without content to search for?
Is the conclusion we should make search engines pay royalities? That seems unfeasible at google scale. Should google just be straight up illegal? That also seems like a bad outcome; i like search engines i am glad they exist.
I guess i'm left with - i don't like this argument because of what it would imply for other projects if you follow the logic to its natural conclusion.
Search engines don't replicate the content, they index and point to it. When search engines have been caught replicating content they have been sued or had to work out licenses.
The deal with search engines was always that you would get traffic out of then.
Use my content, to get people to me. Google's snippets kinda broke that deal and people have indeed complained about that, but otoh you can still technically opt out of being indexed.
Google does transfer value back to the websites, by sending them traffic.
However, Google does get a lot of criticism when they do slurp up content and serve it back without sending traffic back to the websites! Yelp and others have testified to Congress complaining about this!
> Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI have?
What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
> then it should give some of that value back to the content.
That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.
> That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.
How is OpenAI compensating the owners of IP they trained their models on? Or is that not what you mean? It's certainly how I read the part of the GP comment you quoted.
"That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter. "
that sounds like insane bullshit to me. they're trained on the whole internet. there's no way they give back to the whole of the internet, more likely a lot of jobs will be taken away by their work.
> What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
Great. Let's do that then. No good reason to volunteer it for a lobotomy.
I keep thinking: this is what eg Google has done all along. The content it uses to train models and present the answers to us absolutely belongs to others, but you try get any content (eg maps data) out of it for free at scale.
But the business model emerged and delivered value to us while enough that we didn’t consider asking for money for our content. We like being searched and linked to. Less so Google snippets presented to users without the users landing on our site. Even less so generated without any interaction. But it’s all still all our content.
It would be nice if you used a label you had to pay a fee for whoever owns the label, if you don't want to pay a fee to the owners of an artstyle, then you can always use public domain works. Hell, this might be even better for public domain works and restoration if a small fee went to them as well.
I wonder if AI shrinks the economy. Not that such a metric is the most important ruler by which to measure goodness, but it would be ironic to have a massive tech company that produces less value than it removes from the world.
In a way it will remove value but in a way it will add value back in terms of an avalanche of derivative junk, lacking authenticity and context. If you find value in memes for example there will be a lot of that type of content in the future.
In my opinion, the play is thus, steal everything, build the models, then:
* People won't notice, or the majority will forget (doesn't seem to be happening).
* Raise enough money that you can smash anyone who complains in court.
* Make a model good enough that can generate synthetic data and then claim new models aren't trained on anyone's data.
* All of the above.
Anyway, I 100% agree with you, the value is in the content that everyone has produced for , they're repackaging and reselling it in a different format.
> A valuable company that is built from the work of content others created without transferring any value back to them.
But they are free to use the fruits of the model, same as anyone else. I suppose the difference is they don't care; they already have the talent to transform their labor into visual art, so what use do they have for a visual-art-generation machine?
I find strong parallels in the building of web crawlers and search indexers, except... Perhaps the indexers provided more universal, symmetrical value. It's hard to make the case that someone crawled and added to a search index doesn't derive value from that strong, healthy index being searchable (even librarians and news reporters thrive on having data indexed and reachable; the index is a force-multiplier, it's not "stealing their labor" by crawling their sub-indexing work and agglomerating it into a bigger index, nor is it cheapening the value of their information-sorting-and-sifting skills when the machine sorts and sifts).
So perhaps there is a dimension of symmetry here where the give-and-take aspect of what is created breaks. Much like the rich don't have to care whether it's legal to sleep under a bridge, artists don't have to care whether a machine can do 60% of the work of getting to a reasonable visual representation of an idea. No, more than that: it's harmful to them if it's legal to sleep under the bridge.
They'd be landlords in this analogy, crying to the city that because people can sleep under bridges the value of the houses they maintain has dropped.
Or they lost means of income? Like is this a difficult concept? Livelihoods will most likely be lost and probably never really coming back. Sure we can say industries change; however, we had protections in place to prevent artists losing money due to people copying… A company has said “screw the rules, here’s a supercharged printer.”
If you remove all the copyrighted, permission-less content from a human's training, what value does the human have, in connection with work?
When is AI good enough that the contents it contains can be comparable to human brain content, copyright wise?
And conversely, now that we can read signals from neurons in a human brain, and create images from dreams and audio from thoughts, would not that also break the copyright of the content?
There is absolutely zero comparison between living in the world and experiencing it, and building a model, loading in copyrighted, carefully curated material and then charging for the outputs of said model without paying royalties. It's hard to even believe people can't understand the difference.
The fact is, the majority of people do not want to steal others work for profit, and for those bottom feeders that do, there are lass to discourage such behavior and to protect the original producer.
If these models were trained on creative commons licensed material only, then you'd have a leg to stand on.
I even had to pay for my tuition, and textbook material. Even if some portion of my knowledge comes from osmosis, I have still contributed at some stage to access training material.
When I was 16, I wanted to learn to code, do you know what I did? I went and purchased coding books because even at 16, I understood that it was the right thing to do. To pay the author for the privilege of accessing their work.
How basic can one get?
Would you like it if I broke into your house and used your things without asking you? Because that's about what's happening her for professionals.
I think a model that would make more sense is to punish bad behavior in the form of infringement, so if someone monetizes an AI output that infringes on someone's copyright/trademark, then go after that person. Otherwise we are going to be completely stuck for the sake of some kind of outdated mentality around intellectual property.
Agreed with the overall sentiment, but let's be clear. OpenAI is currently a (capped) for-profit company. They are partnered with Microsoft, a for-profit company. They commercially license their products. They provide services to for-profit companies. The existential crisis of the last six months of the company seems to have been over moving in the direction of being more profit-oriented, and one side clearly won that battle. OpenAI may soon be getting contracts with the defense department, after silently revoking their promise not to. Describing them as a non-profit company is de facto untrue, regardless of their nominal governance structure, and describing them (or them describing themselves) as a non-profit feels like falling for a sleight of hand trick at this point.
The only way I see this working with out current economics and IP law is if the people training models license the work they are using. And a traditional license wouldn't do, it would have to be one specific to training AI models.
As to the question of worth, obviously OpenAI's models have value without the training data. Just having a collection of images does not make a trained AI. But the total value of the system is a combination of that model and the training data.
This goes for your knowledge as well, as AI fundamentally doesn’t learn any different than humans do.
If you remove all knowledge gained from learning from or copying others works, what value do you provide?
Nothing on this planet can learn without copying something else. So if we open the can of worms for AI, we should do the same for humans and require paying royalties to those who taught you.
If that includes a drastic revamp of copyright laws to increase the public domain, why not.
I don't see why Disney or Universal would be more legitimate than OpenAI to profit from stuff made from now dead authors 60 years ago. Both seems as legitimate.
That's not enough to say that, all companies are benefiting of what has been made before, nothing exists in a vaccum. AI adds into the current landscape.
It basically just looked at them. It’s absolutely preposterous that you can own a painting, thereby claiming nobody is allowed to draw that anymore, and now people can’t even look at your shitty drawing without paying? Then don’t put it online in the first place..
Seriously the audacity of these so called artists.. just because I sang a song one day does not mean I am entitled to own it and force people to pay me to be allowed to sing it. That’s absolutely insane.
Yes, OP knows that the quote is referring to platforms/media channels like youtube/facebook/google, etc. But it's also referring to profit-making companies on the internet, like OpenAI.
> I'm always reticent to fully engage in "The Dialogue," regardless of its momentary configuration. It's a smoothie made from shibboleths; you have to be able to detect them at only a few parts per million because once these things metastasize, they stop being about whatever they were about and instead become increasingly loud recitations of various catechisms and loyalty oaths.
Side comment: I respect the author’s right to choose words that ring well to them, but jeez, as a non native english speaker, reading this, i am happy my device has a dictionary function.
but if you call it virtue signaling, you get categorized into the group that uses that loyalty oath most often, instead of the non-group member that is simply frustrated by the accuracy of it occurring
>> If my interlocutor isn't prepared to make any distinctions between a human being and a machine designed to mimic one, I think that I can't meaningfully discuss this with them. I hate that this is the case; it must be seen as a personal failing. I can't do it, though. These models aren't people. They don't know anything. They don't want anything. They don't need anything. I won't privilege it over a person. And I certainly won't privilege its master.
Perhaps if you read Gabe's post you could have saved yourself the trouble of making this comment.
I think a better explanation is that the vast majority of users don't try any prompt parameters or different styles and just rely on its default settings
I'd find it hard to argue against this, or the Penny Arcade's statements, since I'm having trouble understanding their concrete arguments in between the rhetoric. I'd be hesitant to even discuss this in their comment sections or social media channels.
One might ask: Under what circumstances would AI art be acceptable then?
For example, does it really matter if these models are created by large corporations? I don't see what the legal or ethical difference would be if it was an individual who created such a model.
Is it relevant whether their artworks were used in the training data? Well, what if a new model that is trained only on public domain photos, videos and artworks turns out to be just as capable? What if a future model is able to imitate an art style after seeing merely one or two examples of it?
It might just be a matter of time until such a model is developed. Would it be alright then? If not, why?
(Personally, I think it's the responsibility of the AI model user to use the AI art legally and ethically, as if the user made the image themselves.)
Under what circumstances would AI art be acceptable then?
I hate to make a sort-of standard Internet retort but artists (and "society") don't have any obligation to reserve some space for AI art to be OK within culture. Maybe such a possibility exists and maybe it doesn't. But given that present AI is something like a complex but semi-literal average of the art works various largish companies could find, it seems reasonable to respond to people's objections to that.
That question - "Under what circumstances would AI art be acceptable then?" - is definitely not asked enough. And I think taking time to make it acceptable is a worthwhile goal, part of the path, not an obstacle.
> "Under what circumstances would AI art be acceptable then?"
Easy!
Under the circumstances where the artists whose art was used to train the model explicitly consented to that (without coercion), licensed their art for such use, and were fairly compensated for that.
Plenty of artists would gladly paint for AI to learn from — just like stock photographers, or clip art designers, or music sample makers.
Somehow, "paying for art" isn't an idea that has entered the minds of those who use the art.
The sooner anyone making profit from models trained on creators proprietary content start paying for the content they’re using the better for creators, society and even the AI companies. It’s pretty tiring hearing people argue about whether copyright law applies to AI companies or not. It applies. Just get on and sort out a proper licensing model.
> "The bottom line is this," the firm, known as a16z, wrote. "Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development."
> The firm said payment for all of the copyrighted material already used in LLMs would cost the companies that built them "tens or hundreds of billions of dollars a year in royalty payments."
Doesn't that kind of demonstrate the value being actively stolen from the creators, more than anything? Copyright law killed Napster, too. That doesn't mean applying copyright law was wrong.
Starting a business is huge amounts of risk to begin with. Just because you may lose more doesn't mean you're magically exempt from being able to ignore that.
Watching the superstars of venture capital whine that copyright is unfair is quite something, though.
“Payment for all workers who develop fields or man factories would cost the companies that operate them hundreds of thousands of dollars a year in salary payments”
- slavers, probably.
Of course slavery != AI, but the argument that we should protect companies from their expenses to enable their bad business model is very entitled and presumptuous.
Thousands of companies have failed because their businesses models didn’t work, and thousands more will.
AI will be fine. It probably won’t be as stupidly lucrative as the current model, but we’ll find a way.
There are many areas of research, technological advancement, and construction which would proceed much more quickly than their current pace if we didn't force them do things in the way that society has decided is correct and just.
> Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.
which, as an involuntary donor, is exactly what I want
Unfortunately, it's not easy to make this legal argument given how copyright law only protects fixed, tangible expressions, not ideas, concepts, principles, etc. and has a gaping hole called 'fair use.'
The new York Times has examples where GPT will reproduce world for word exactly paragraphs of their (copyrighted) text if you ask it to. That's a pretty fixed tangible expression I think.
>the better for [...] AI companies
Yes exactly. Kill off the free alternative since nobody else can afford licenses with big rights holders. Google will love this. Creators will get pennies. Everyone has to go through proprietary apis, OS will be outlawed. Not something I would want to see!
It's actually just "anyone making models". If you train a model with other people's art (without their permission) and then distribute the model or output for free, your still stealing their work, even if you make zero profit.
Yes, I know Adobe said so. No, I don't trust them.
Facts:
1. Adobe Firefly is trained with Adobe Stock assets. [1]
2. Anyone can submit to Adobe Stock.
3. Adobe Stock already has AI-generated assets that are not correctly tagged so. [2]
4. It's hard to remove an image from a trained model.
Unless Adobe carefully scrutinize every image in the training set, the logical conclusion is Adobe Firefly already contains at least second-handed unauthorized images (e.g. those generated by Stable Diffusion). It's just "not Adobe's fault".
[1] https://www.adobe.com/products/firefly.html : "The current Firefly generative AI model is trained on a dataset of licensed content, such as Adobe Stock, and public domain content where copyright has expired."
It applies - but how it applies is simply not settled. Clearly, there is no statistical model that can launder IP. If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright if they use it in a way that does not conform to the license, such as by creating their own website of cat and girl comics and running Adsense on it.
However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use. If that artist is some rando using a tool like Midjourney - that is inspired by the art but doesn't reproduce it - it is not at all clear to me that this is not also fair use.
The point is that Midjourney itself, the model weights, is possibly a derivative work of all of the works that went into its training. The fact that it can produce almost identical copies of some of them, especially if not carefully prevented from doing so through secondary mechanisms, is obvious proof that it directly encodes copies of some of these works.
That already clearly means that they couldn't publish the model directly even if they wanted to, since they don't have the right to distribute copies of those works, even if they are represented in a weird lossy encoding. Whether it's legal for them to give access to the model through an API that prevents returning copyrighted content is a much more complex legal topic.
>Clearly, there is no statistical model that can launder IP.
Of course. The model isn't making a decision as to what may be used as training data. The humans training it do.
>If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright
That's like saying that if a user unpacks the dune _full_movie.zip I'm sharing online, it's them who have produced the copyrighted work. And me, the human who put the movie Dune into that zip file, is doing no wrong. Clearly, there is no compression algorithm that can launder IP, right?
>However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use
The AI isn't inspired by anything. It's not a sentient being, it's not making decisions, and its behavior isn't regulated by laws because it does not have a behavior of its own. Humans decide what goes into an AI model, and what goes out. And humans who train AI models on art don't get "inspired". They transform it into a derivative work — the AI model.
One that has been shown to be awfully close to dune_full_movie.zip if you use the right unpacking tools. But even that isn't necessary. Using work of others in your own work without permission and credit usually goes by less inspiring words: plagiarism, theft, ripping off.
Regardless of whether you reproduce the work 1:1, and whether you can be punished by law for it.
>tool like Midjourney - that is inspired by the art but doesn't reproduce it
Never in the history of humanity has the word inspired meant something that a tool can do. If it's "inspired" (which is something only sentient beings can do), then we should be crying out about human right abuses the way the AI models are trained and treated.
If it's just a tool, it's not "inspired".
You can't have your cake and eat it too. Either pay your computer minimum wage for working for you, or stop saying that it can get "inspired" by art (whether it's training an AI model or creating a zip file).
Fair use is a defence you can use when you infringe copyright [edit for clarity] or in other words the action you take would otherwise infringe.
It's not fair use because you want it to be, and it's not at all legally clear if this defence is valid in the case of AI training. But it's not clear it isn't, either.
This is basically what all the noise and PR money is about, currently, in hope that shaping the narrative will shape the legal decisions.
We should note that whether AI training is actually fair use has yet to be tested in court. In particular, there's some wording in fair use about the effect on the market for the original and whether the derivative work can substitute for the original that I think a lot of programmers who blindly say this is fair use ignore. I am not a lawyer though.
There are complications, but google can use thumbnails because essentailly they are used to "review" the website.
Has google sampled and hosted the whole image on their own website and made more iamges in the style of say mickey mouse, they would have been taken to town by the owners.
This is why there are no commercial movies on youtube (without an explicit agreement) and why DCMA takedowns exist.
So the somewhat necessary followup question: if Midjourney or ChatGPT includes a license in front of its model weights or access to its model that forbids using those weights or output from its model to create a competitor, is it fair use to ignore that license?
In both cases, we're relying on copyright. The companies are saying that you can access the models under a license. It's not too hard to circumvent that license or get access through a third party or separate service: but doing so would obviously be seen as deliberate circumvention. We can easily compare that to an artist putting up a click-through in front of a gallery that says that by clicking 'agree' you agree not to use their work for training. And in fact, it's literally the same restriction in both cases: OpenAI maintains that they can use copyright to enforce that people accessing their model avoid using their model to train AIs. Paradoxically, they also maintain that artists can not use copyright to block people accessing their images from using those images to train AIs.
Facebook has released model weights with licenses that restrict how those model weights are used. But I can get those models without clicking 'agree' on that license agreement. They're mirrored all over the place. We'll see what courts say, but Facebook's public argument is that their license doesn't stop applying if I download their software from a mirror. So I find it hard to believe that Facebook honestly believes in a fair use argument if they also claim that the same fair use argument doesn't apply to their own copyrighted material that they've released online (assuming that model weights can even be copyrighted in the first place).
This is one of my biggest issues with the fair use argument -- it's not that there's nothing convincing about it, in a vacuum I'd very likely agree with it. I don't think that licenses should allow completely arbitrary restrictions. But I also can't ignore that the vast majority of companies making this argument are making it extremely inconsistently. They know that their business models literally don't work if competitors figure out ways to use their services to rapidly produce competing models. If another company comes up with a training method to rapidly replicate model weights or to augment existing models using their output, none of these companies will be OK with that.
None of these companies believe that fair use invalidates license terms when it comes to their own IP.
This is certainly the first commentary on the subject that made me _feel_ anything, so hats off to the author.
The generative AI that is changing the world today was built off the work of three groups - software developers, Reddit comment writers, and digital artists. The Reddit comment writers released their rights long ago and do not care. We are left with software developers and digital artists.
In general, the software developers were richly paid, the digital artists were not. The software developers released their work open to modification; the artists did not. Perhaps most importantly, software developers created the generative AIs, so in a way it is a creation of our own; cannibalizing your own profession is a much different feeling than having yours devoured by another alien group.
If Washington must burn, let it be the British and not the Martians. How might we have reacted if what has been done was not by our own hand?
It has to be very hard to overcome the bad vibes of being in a situation like this.
The technology seems indecipherable to a non-techie.
The law seems indecipherable to a layman.
The ethics seem indecipherable to everyone.
With so much confusion, to feel that one has been treated justly it might not be enough to participate in a class-action lawsuit resolving what happened. It would help with public trust if there were available for example protocols or sites for connecting people who want to sue companies. - Just something that shows that society does support values of equality and justice.
(1) means that taxes, the legal and justice systems, much of the public education system, criticism, begging on the street, and protest are all unethical, among I'm sure many others. Not to suggest you believe they are- I'm sure that you can present very strong arguments for some or all of these- but that makes it much less simple and leaves room for debate about whether this situation is also an exception.
I personally don't care if my work is used to train a large AI model.
It's also not inherently unethical to do things that someone doesn't want, because not all wants are valid or reasonable. A child may not want to have the candy put away, but it is still done anyway.
I don't see many people suggesting this, but I also quite like this way of thinking about it. The idea that it should be illegal for models to learn from artists, or that artists have a right to extract payment from the model, doesn't make much sense to me, it's too much of a radical departure from the way we treat human learning.
But it seems unfair that a company can own such a model. It's not their work, it's a codified expression of basically our entire cultural output as a species. We should all own it.
The simple fix is another AI that checks the output for copyright violation.
The issue here is models generating copy written work verbatim.
People claiming that training on copyrighted work is a violation of copyright (its not) have no legal legs to stand on. They are purposelessly muddying concepts though to make it seem like it is. However any competent judge is going to see right through this.
Some of the public data may still have license attached, so there are still challenges in provenance, attribution, and copyright.
To draw a parallel in software, we have MIT licenses that allow for-profit, private use of source code in the public. The copyleft license might be more aligned to what you are envisioning?
This is a great compromise, honestly. Only slightly related, but what if copy"right"-law was opt-out? You could ignore it if you want, but you aren't allowed to hold any either. I would join.
I agree that it isnt that simple, but you could still ban it.
Real world policy is rarely black and white, all or nothing. An inability to prevent every instance of something is not a reason to forgo any action
Just because country X has slavery, doesnt mean the US can't ban it.
There may be some consequences, and disadvantages to doing so, but that is a trade off, not a show stopper.
Same thing with open source. There is lots of illegal and bootlegged content that is shared on the internet, and cant be eradicated. That doesnt mean it can all be freely bought at Walmart.
This is the sticking point for me. OpenAI isn't a profit-making company, but it's certainly a valuable company. A valuable company that is built from the work of content others created without transferring any value back to them. Regardless of legalities, that's wrong to me.
Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI's products have? If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.
But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it. If one doesn't want anyone to see/consume or be influenced by one's copyrighted work, then lock it in a box and don't show it to anyone.
I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before. Visual art may be approaching a similar point. No matter how original you think your drawings are, someone out there has already done something similar. They may not have created exactly the same image, but neither does AI literally copy images. That reality doesn't kill visual arts as it didn't kill off the fashion industry.
I also firmly believe that commercializing models built on top of copyrighted works (which all works start off as) does not qualify as fair use (or at least shouldn't) and that commercializing models build on copyrighted material is nothing more than license laundering. Companies that commercialize copyrighted work in this manner should be paying for a license to train with the data, or should stick to using the licenses that the content was released under.
I don't think your example is valid either. The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that. That is literally what they are and how they work. That is very different than people having similar styles.
We are not talking about the eons old human practice of creative artistic endeavor, which yes, is clearly derivative in some fashion, but which we have well established practices around.
We are discussing a new phenomenon of mass replication or derivation by machine at a scale impossible for a single individual to achieve by manual effort.
Further, artists tend to either explicitly or implicitly acknowledge their priors in secondary or even primary material, much like one cites work in an academic context.
Also, the claim:
>But if I take your work and compare it to millions of other people's work...
Is ridiculous. A. you haven't, nor will you ever actual do this. B. This is never how the system of artistic practice up to this point has worked precisely because this sort of activity is beyond the scale of human effort.
In addition, plagiarism exists and is bad. There's no reason that concept can be extended and expanded to include stochastic reproduction at scale.
If you feel artists shouldn't have a say and a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands, by all means. But this argument that somehow by analogy to human behavior companies should not be responsible for the vast use of material without permission is absolutely preposterous. These are machines owned by companies. They are not human beings and they do not participate in the social systems of human beings the way human beings do. You may want to consider a distinction in the rules that adequately reflects this distinction in participatory status in a social system.
It feels similar to the ye olden debates on police surveillance. Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with. Collating your behavior across every website and device you own from a data broker is fundamentally the same thing as a single phone’s wiretap, but it obviously feels way grosser and more unethical because it scales way past the point of what you’d imagine as being acceptable.
My main point is that OpenAI is generating an incredible amount of value all hinging on other people's work at a massive scale, without paying for their materials. Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho.
It's entirely legal for me to leave the pub every time it comes up to my round. It's legal for me to get into a lift and press all the buttons.
It's not unreasonable I think for people to be surprised at what is now possible. I'm personally shocked at the progress in the last few years - I'd not have guessed five years ago that putting a picture online might result in my style being easily recreated by anyone for the benefit mostly of a profitable company.
People keep saying this but it's actually much more complicated, and in many cases you can't view copyrighted content.
An example, MicroSoft employees are not permitted to view or learn from an open source (GPL-2) terminal emulator:
https://github.com/microsoft/terminal/issues/10462#issuecomm...
Another example is proprietary software that may have it's source available, either intentionally or not. If you view this and then work on something related to it, like WINE for example, you are definitely at risk of being successfully sued.
If you worked at MicroSoft and worked on Windows, you would not be able to participate in WINE development at all without violating copyright.
If you viewed leaked Windows source code you also would not be able to participate in WINE development.
An interesting question that I have, is whether training on proprietary, non-trade-secret sources would be allowed. Something like unreal engine, where you can view the source but it's still proprietary.
Another question is whether training on leaked sources of proprietary and private but non-trade-secret code, like source dumps of Windows is legal.
Do you think it's reasonable for me to want some legal framework that allows me to explicitly deny that use of my work? Because I do.
If OpenAI is allowed to be ignorant of copyright, then the rest of us should be allowed, too.
The problem is that OpenAI (alongside a handful of other very large corporations) gets exclusive rights to that ignorance. They get to monopolize the un-monopoly. That's even worse than the problem we started with.
OpenAI built a machine that does exactly that. They just sampled _everyone_.
This artist doesn't complain about work similar to their own being generated, and their artwork is very clearly not clothing.
Well, not exactly. Certain uses are fair. The question is does OpenAI's use count as fair. I don't think your immediate response comes close to addressing that question despite your conviction it does otherwise.
Also, clothing designs are copyrightable. The conviction expressed by some participants in this debate is exhausting in light of their familiarity with actual copyright law.
It's important to consider in any legalistic argument over copyright that, unlike conventional property rights which are to some degree prehistoric, copyright is a recent legal construct that was developed for a particular economic purpose.
https://en.wikipedia.org/wiki/Intellectual_property#History
The existing standards of fair use are what they are because copyright was developed with supporting the art industry as an intentional goal, not because it was handed down from the heavens or follows a basic human instinct. Ancient playwrights clipped each others' ideas liberally; late medieval economists observed that restricting this behavior seemed to encourage more creativity. Copyright law is a creation of humans, for humans, and is subordinate to moral and economic reasoning, not prior to it.
Clothes are inherently consumable goods. If you use them, they will wear out. If you do not use them, they still age over time. You cannot "copy" a piece of clothing without a truly astonishing amount of effort. Both the processes, and the materials, may be difficult or impossible to imitate without a very large investment of effort.
Compare this to digital art: You can copy it literally for free. Before AI, at least you had to copy it mostly verbatim (modulo some relatively boring transforms, like up/down-scaling, etc.). That limited artist's incomes, but not their future works. But in a post-AI world, you can suck in an artist's life's work, and generate an unlimited number of copycats. Right now, the quality of those might be insufficient to be true replacements, but it's not hard to imagine we'll be in a world not so far off when it will be sufficient, and then artists will be truly screwed.
In theory: sure
In practice: not really, especially when you're small and the other side is big and has lots of lawyers and/or lawmakers in their pockets.
Disney ("In 1989, for instance, the company even threatened to sue three Florida daycare centers unless they removed murals featuring some of its characters") and Deutsche Telekom[1][2] ("the company's actions just smack of corporate bully tactics, where legions of lawyers attempt to hog natural resources — in this case a primary color — that rightfully belong to everyone") are just two examples that spring to mind.
[0] https://hls.harvard.edu/today/harvard-law-i-p-expert-explain... [1] https://www.dw.com/en/court-confirms-deutsche-telekoms-right... [2] https://futurism.com/the-byte/tmobile-legal-rights-obnoxious...
and like, what, do you think they're trying their damnedest to keep datasets clean and to not store any images in the process? how do you think they retrain on datasets over and over? it's really simple - by storing terabytes of copyrighted content. for ease of use, of course - why download something over and over, if you can just download it and keep it. and if they really wanted to steer clear of copyright infringement, if there's truly "no good solution" (which is bullshit for compute, oh, they can compute everything but not that part) - why can't they just refrain from recklessly scraping everything, if something were to just 'slip in'? like, if you know it's kinda bad, just don't do the thing, right? well, maybe copyright infringement is just acceptable to them. if not the actual goal.
what they generate is kinda irrelevant - there's plenty of copyright infringement happening even before any training were to be done. assembling of datasets and bad datasets containing copyrighted content are the start and the core of the copyright problems.
there's a really banal thing at the core of this, and it's just a multi-TB storage filled with pirated works.
> [A] reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.
Most use of LLMs and image generation models do not produce criticism of their training data. The most common use is to produce similar works. You can find this very common “trick” to get a specific style of output to add “in style of <artist>”. Is this a direct way "to supersede the use of the original work”?
You can certainly see how other factors more or less put gen ai output into the grey zone.
The fact that clothing doesn’t qualify for copyright doesn’t mean text and images don’t. Or if you advocate that they don’t then you pretty much advocate for abolishment of copyright because those are the major areas of copyright applicability at the moment. Which is a stance to have but you’d probably be better to actually say that because saying that copyright applies to some images and text but not others is a much harder position to defend.
Just like the rest of AI, if your argument is "humans can already do this by hand, why is it a problem to let machines do it?", its because you are incorrectly valuing the labor that goes into doing it by hand. If doing X that has potentially negative side effect Y, then the human labor to accomplish X is the principle barrier to Y, which can be mitigated via existing structures. Remove the labor barrier, and the existing mitigation structures cease to be effective. The fact that we never deliberately established those barriers is irrelevant to the fact that our society expects them to be there.
I presume there are people working on research relating to how to prevent output of raw training data, what is the state of the art in this area? Would it be sufficient to prevent output of the training data or should the models be required to have no significant internal copies of training examples?
Most every fashion company has a legal team that reviews print and pattern, as well as certain other aspects of design, relative to any source of inspiration. My husband works in the industry and has to send everything he does for review in this way. I’m not sure where you got the idea that there are no IP protections for fashion, but this is untrue.
Deleted Comment
Now, i am worried about companies like OpenAI monopolizing technology through making their technology proprietary. I think their output should be public domain and copyright should only apply to human authors if they should be at all.
You'd have to argue the entirety, everything about copyright law being ethical, to make your version of the argument.
Deleted Comment
Deleted Comment
"A new for-profit subsidiary would be formed, capable of issuing equity to raise capital and hire world class talent, but still at the direction of the Nonprofit. Employees working on for-profit initiatives were transitioned over to the new subsidiary."
https://openai.com/our-structure
We've seen zero evidence that the non-profit side of OpenAI meaningfully constrains the for-profit side in any way, and have seen direct evidence that when the non-profit and for-profit groups disagree with each other, the for-profit side wins.
If you sell your art, then art marketplaces and printers and shipping services all profit from your work, but I don't imagine she's complaining about that. What's the difference? In all of those cases, as with social media, companies are making money from your work in return for providing a useful service to you (and one you don't have to use if you don't think it's useful).
I see it differently. To me, if you post your work online as an artist, it's really there for everyone to view and be inspired by. As long as nobody copies it verbatim, don't think you've been hurt by any other usage. If another artist views it, and is inspired by it... so be it. If an AI views it, and is inspired by it, again, no harm done.
You had me till that^ line. In your example if "inspired" human start competing with you, then there is harm. If the inspired human is replaced by an AI, then it also harms. By harm I am referring to competition.
So instead of saying "no harm done", then maybe its more accurate to say "same harm as a other humans being inspired by your work".
There's such thing as consent, I hope you've heard of it.
No artist whose work was used to train the AIs consented to such use.
Particularly if they released their work online before generative AI was a possibility.
>it's really there for everyone to view and be inspired by
Generative AI model is not "everyone". It's a model, a combination of data that goes into it.
It's a thing. A product. A derivative work, to be specific, made by the person who trained it.
>If an AI views it, and is inspired by it, again, no harm done.
Such a romantic notion!
But the same metric, a photocopy machine is an auteur that gets inspired by the work that it happens to stumble into to produce its own original art.
No.
The AI doesn't "view" the work, it has no agency. The human that trains the model does.
And that human is the one that is ripping the artist off.
The AI, as many people said, is just a tool. It doesn't suddenly turn into a person for copyright purposes.
It still remains a tool for people who train the models. A tool to rip off others' intellectual property, in the case we're discussing.
This is not really how copyright works.
> an AI views it, and is inspired by it,
This is a misleading anthropomorphization of how AI works.
Couldn't we say the same thing about search engines?
What value would google have without content to search for?
Is the conclusion we should make search engines pay royalities? That seems unfeasible at google scale. Should google just be straight up illegal? That also seems like a bad outcome; i like search engines i am glad they exist.
I guess i'm left with - i don't like this argument because of what it would imply for other projects if you follow the logic to its natural conclusion.
Use my content, to get people to me. Google's snippets kinda broke that deal and people have indeed complained about that, but otoh you can still technically opt out of being indexed.
It's not clear how you opt out of LLMs.
However, Google does get a lot of criticism when they do slurp up content and serve it back without sending traffic back to the websites! Yelp and others have testified to Congress complaining about this!
Deleted Comment
What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
> then it should give some of that value back to the content.
That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.
How is OpenAI compensating the owners of IP they trained their models on? Or is that not what you mean? It's certainly how I read the part of the GP comment you quoted.
that sounds like insane bullshit to me. they're trained on the whole internet. there's no way they give back to the whole of the internet, more likely a lot of jobs will be taken away by their work.
plus, there was no consent.
Great. Let's do that then. No good reason to volunteer it for a lobotomy.
But the business model emerged and delivered value to us while enough that we didn’t consider asking for money for our content. We like being searched and linked to. Less so Google snippets presented to users without the users landing on our site. Even less so generated without any interaction. But it’s all still all our content.
I hope I never have to live in a world where such a thing exists.
* People won't notice, or the majority will forget (doesn't seem to be happening). * Raise enough money that you can smash anyone who complains in court. * Make a model good enough that can generate synthetic data and then claim new models aren't trained on anyone's data. * All of the above.
Anyway, I 100% agree with you, the value is in the content that everyone has produced for , they're repackaging and reselling it in a different format.
But they are free to use the fruits of the model, same as anyone else. I suppose the difference is they don't care; they already have the talent to transform their labor into visual art, so what use do they have for a visual-art-generation machine?
I find strong parallels in the building of web crawlers and search indexers, except... Perhaps the indexers provided more universal, symmetrical value. It's hard to make the case that someone crawled and added to a search index doesn't derive value from that strong, healthy index being searchable (even librarians and news reporters thrive on having data indexed and reachable; the index is a force-multiplier, it's not "stealing their labor" by crawling their sub-indexing work and agglomerating it into a bigger index, nor is it cheapening the value of their information-sorting-and-sifting skills when the machine sorts and sifts).
So perhaps there is a dimension of symmetry here where the give-and-take aspect of what is created breaks. Much like the rich don't have to care whether it's legal to sleep under a bridge, artists don't have to care whether a machine can do 60% of the work of getting to a reasonable visual representation of an idea. No, more than that: it's harmful to them if it's legal to sleep under the bridge.
They'd be landlords in this analogy, crying to the city that because people can sleep under bridges the value of the houses they maintain has dropped.
When is AI good enough that the contents it contains can be comparable to human brain content, copyright wise?
And conversely, now that we can read signals from neurons in a human brain, and create images from dreams and audio from thoughts, would not that also break the copyright of the content?
The fact is, the majority of people do not want to steal others work for profit, and for those bottom feeders that do, there are lass to discourage such behavior and to protect the original producer.
If these models were trained on creative commons licensed material only, then you'd have a leg to stand on.
I even had to pay for my tuition, and textbook material. Even if some portion of my knowledge comes from osmosis, I have still contributed at some stage to access training material.
When I was 16, I wanted to learn to code, do you know what I did? I went and purchased coding books because even at 16, I understood that it was the right thing to do. To pay the author for the privilege of accessing their work.
How basic can one get?
Would you like it if I broke into your house and used your things without asking you? Because that's about what's happening her for professionals.
As to the question of worth, obviously OpenAI's models have value without the training data. Just having a collection of images does not make a trained AI. But the total value of the system is a combination of that model and the training data.
If you remove all knowledge gained from learning from or copying others works, what value do you provide?
Nothing on this planet can learn without copying something else. So if we open the can of worms for AI, we should do the same for humans and require paying royalties to those who taught you.
As in: we will change the world, all that is required is that we throw away all previous protections! The ends justify the means!
I do see a much more beneficial trajectory for LLMs vs cryptocurrencies, but yeah, this is gross and unfair.
Note: as the days go on, I continue to realize the pitfalls of Utilariansim. I do miss the simplicity, but nope.
I don't know yet exactly how this compares, I’m trying to think it all through.
AI has different levels - output can be loosely inspired by, style cloning, or near exact reproductions of specific work.
I don't see why Disney or Universal would be more legitimate than OpenAI to profit from stuff made from now dead authors 60 years ago. Both seems as legitimate.
Seriously the audacity of these so called artists.. just because I sang a song one day does not mean I am entitled to own it and force people to pay me to be allowed to sing it. That’s absolutely insane.
Deleted Comment
Deleted Comment
(text commentary below the comic, in case your OS has decided to conceal the scroll bar from you and you didn't notice the page is longer)
Boy does that ring true.
Perhaps if you read Gabe's post you could have saved yourself the trouble of making this comment.
Deleted Comment
One might ask: Under what circumstances would AI art be acceptable then?
For example, does it really matter if these models are created by large corporations? I don't see what the legal or ethical difference would be if it was an individual who created such a model.
Is it relevant whether their artworks were used in the training data? Well, what if a new model that is trained only on public domain photos, videos and artworks turns out to be just as capable? What if a future model is able to imitate an art style after seeing merely one or two examples of it?
It might just be a matter of time until such a model is developed. Would it be alright then? If not, why?
(Personally, I think it's the responsibility of the AI model user to use the AI art legally and ethically, as if the user made the image themselves.)
I hate to make a sort-of standard Internet retort but artists (and "society") don't have any obligation to reserve some space for AI art to be OK within culture. Maybe such a possibility exists and maybe it doesn't. But given that present AI is something like a complex but semi-literal average of the art works various largish companies could find, it seems reasonable to respond to people's objections to that.
It's currently fair use to give an artist paintings and say I want something like this but different in these ways.
You can tell a script writer to watch Star Wars and write something similar.
Questions of copyright will depend on if the output is sufficiently transformative, not if copyrighted work was used as inspiration
Easy!
Under the circumstances where the artists whose art was used to train the model explicitly consented to that (without coercion), licensed their art for such use, and were fairly compensated for that.
Plenty of artists would gladly paint for AI to learn from — just like stock photographers, or clip art designers, or music sample makers.
Somehow, "paying for art" isn't an idea that has entered the minds of those who use the art.
Perhaps because it wasn't asked enough.
> The firm said payment for all of the copyrighted material already used in LLMs would cost the companies that built them "tens or hundreds of billions of dollars a year in royalty payments."
https://www.businessinsider.com/marc-andreessen-horowitz-ai-...
Watching the superstars of venture capital whine that copyright is unfair is quite something, though.
- slavers, probably.
Of course slavery != AI, but the argument that we should protect companies from their expenses to enable their bad business model is very entitled and presumptuous.
Thousands of companies have failed because their businesses models didn’t work, and thousands more will.
AI will be fine. It probably won’t be as stupidly lucrative as the current model, but we’ll find a way.
which, as an involuntary donor, is exactly what I want
If only saying it would make it so.
Unfortunately, it's not easy to make this legal argument given how copyright law only protects fixed, tangible expressions, not ideas, concepts, principles, etc. and has a gaping hole called 'fair use.'
Dead Comment
It's actually just "anyone making models". If you train a model with other people's art (without their permission) and then distribute the model or output for free, your still stealing their work, even if you make zero profit.
Yes, I know Adobe said so. No, I don't trust them.
Facts:
1. Adobe Firefly is trained with Adobe Stock assets. [1]
2. Anyone can submit to Adobe Stock.
3. Adobe Stock already has AI-generated assets that are not correctly tagged so. [2]
4. It's hard to remove an image from a trained model.
Unless Adobe carefully scrutinize every image in the training set, the logical conclusion is Adobe Firefly already contains at least second-handed unauthorized images (e.g. those generated by Stable Diffusion). It's just "not Adobe's fault".
[1] https://www.adobe.com/products/firefly.html : "The current Firefly generative AI model is trained on a dataset of licensed content, such as Adobe Stock, and public domain content where copyright has expired."
[2] Famous example: https://twitter.com/destiny_thememe/status/17448423657672255...
However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use. If that artist is some rando using a tool like Midjourney - that is inspired by the art but doesn't reproduce it - it is not at all clear to me that this is not also fair use.
That already clearly means that they couldn't publish the model directly even if they wanted to, since they don't have the right to distribute copies of those works, even if they are represented in a weird lossy encoding. Whether it's legal for them to give access to the model through an API that prevents returning copyrighted content is a much more complex legal topic.
Of course. The model isn't making a decision as to what may be used as training data. The humans training it do.
>If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright
That's like saying that if a user unpacks the dune _full_movie.zip I'm sharing online, it's them who have produced the copyrighted work. And me, the human who put the movie Dune into that zip file, is doing no wrong. Clearly, there is no compression algorithm that can launder IP, right?
>However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use
The AI isn't inspired by anything. It's not a sentient being, it's not making decisions, and its behavior isn't regulated by laws because it does not have a behavior of its own. Humans decide what goes into an AI model, and what goes out. And humans who train AI models on art don't get "inspired". They transform it into a derivative work — the AI model.
One that has been shown to be awfully close to dune_full_movie.zip if you use the right unpacking tools. But even that isn't necessary. Using work of others in your own work without permission and credit usually goes by less inspiring words: plagiarism, theft, ripping off.
Regardless of whether you reproduce the work 1:1, and whether you can be punished by law for it.
>tool like Midjourney - that is inspired by the art but doesn't reproduce it
Never in the history of humanity has the word inspired meant something that a tool can do. If it's "inspired" (which is something only sentient beings can do), then we should be crying out about human right abuses the way the AI models are trained and treated.
If it's just a tool, it's not "inspired".
You can't have your cake and eat it too. Either pay your computer minimum wage for working for you, or stop saying that it can get "inspired" by art (whether it's training an AI model or creating a zip file).
Deleted Comment
Like leaded gas, the government can make regulations to deal with anything should they choose to.
Why do you think it's okay for massive companies to freely profit off the work of others?
Dead Comment
I would love to know why not.
It's not fair use because you want it to be, and it's not at all legally clear if this defence is valid in the case of AI training. But it's not clear it isn't, either.
This is basically what all the noise and PR money is about, currently, in hope that shaping the narrative will shape the legal decisions.
https://fairuse.stanford.edu/overview/fair-use/what-is-fair-...
There are complications, but google can use thumbnails because essentailly they are used to "review" the website.
Has google sampled and hosted the whole image on their own website and made more iamges in the style of say mickey mouse, they would have been taken to town by the owners.
This is why there are no commercial movies on youtube (without an explicit agreement) and why DCMA takedowns exist.
In both cases, we're relying on copyright. The companies are saying that you can access the models under a license. It's not too hard to circumvent that license or get access through a third party or separate service: but doing so would obviously be seen as deliberate circumvention. We can easily compare that to an artist putting up a click-through in front of a gallery that says that by clicking 'agree' you agree not to use their work for training. And in fact, it's literally the same restriction in both cases: OpenAI maintains that they can use copyright to enforce that people accessing their model avoid using their model to train AIs. Paradoxically, they also maintain that artists can not use copyright to block people accessing their images from using those images to train AIs.
Facebook has released model weights with licenses that restrict how those model weights are used. But I can get those models without clicking 'agree' on that license agreement. They're mirrored all over the place. We'll see what courts say, but Facebook's public argument is that their license doesn't stop applying if I download their software from a mirror. So I find it hard to believe that Facebook honestly believes in a fair use argument if they also claim that the same fair use argument doesn't apply to their own copyrighted material that they've released online (assuming that model weights can even be copyrighted in the first place).
This is one of my biggest issues with the fair use argument -- it's not that there's nothing convincing about it, in a vacuum I'd very likely agree with it. I don't think that licenses should allow completely arbitrary restrictions. But I also can't ignore that the vast majority of companies making this argument are making it extremely inconsistently. They know that their business models literally don't work if competitors figure out ways to use their services to rapidly produce competing models. If another company comes up with a training method to rapidly replicate model weights or to augment existing models using their output, none of these companies will be OK with that.
None of these companies believe that fair use invalidates license terms when it comes to their own IP.
The generative AI that is changing the world today was built off the work of three groups - software developers, Reddit comment writers, and digital artists. The Reddit comment writers released their rights long ago and do not care. We are left with software developers and digital artists.
In general, the software developers were richly paid, the digital artists were not. The software developers released their work open to modification; the artists did not. Perhaps most importantly, software developers created the generative AIs, so in a way it is a creation of our own; cannibalizing your own profession is a much different feeling than having yours devoured by another alien group.
If Washington must burn, let it be the British and not the Martians. How might we have reacted if what has been done was not by our own hand?
The technology seems indecipherable to a non-techie.
The law seems indecipherable to a layman.
The ethics seem indecipherable to everyone.
With so much confusion, to feel that one has been treated justly it might not be enough to participate in a class-action lawsuit resolving what happened. It would help with public trust if there were available for example protocols or sites for connecting people who want to sue companies. - Just something that shows that society does support values of equality and justice.
1. Don't do things to people that they don't want to be done to them. 2. Do as you would be done by.
It really is that simple.
It's also not inherently unethical to do things that someone doesn't want, because not all wants are valid or reasonable. A child may not want to have the candy put away, but it is still done anyway.
One one hand I want better AI that can generate whatever image that comes to my mind.
On the other hand I don’t want it to blatantly copy someone else’s style that they spent years making.
US is pretty fucked since it has very little safety net compared to other modern countries if AI really started replacing humans.
This is people’s livelihoods we are talking about.
Will AI cause people to commit suicide? Yeah if it starts replacing them and they lose meaning in life.
Ban private large models trained on public data, require them to be public weights.
If a company wants to train large private model, they can do it with their own data.
But it seems unfair that a company can own such a model. It's not their work, it's a codified expression of basically our entire cultural output as a species. We should all own it.
Maybe, just maybe better to ask artist, who should own it?
The issue here is models generating copy written work verbatim.
People claiming that training on copyrighted work is a violation of copyright (its not) have no legal legs to stand on. They are purposelessly muddying concepts though to make it seem like it is. However any competent judge is going to see right through this.
To draw a parallel in software, we have MIT licenses that allow for-profit, private use of source code in the public. The copyleft license might be more aligned to what you are envisioning?
You can't ban it for open source either.
Real world policy is rarely black and white, all or nothing. An inability to prevent every instance of something is not a reason to forgo any action
Just because country X has slavery, doesnt mean the US can't ban it. There may be some consequences, and disadvantages to doing so, but that is a trade off, not a show stopper.
Same thing with open source. There is lots of illegal and bootlegged content that is shared on the internet, and cant be eradicated. That doesnt mean it can all be freely bought at Walmart.
They would be better off if you did it the other way around - ban large public models.
And I am in favor of more stringent licenses than that: you must pay me to use my data.