Put 10,000 hours into writing a book. Watch somebody with more resources or media coverage take full credits for it and/or make money instead of yourself.
Copyright is a good thing. Same principle applies to the core of similar laws.
I agree with you. I think that copyright is bad, and patents is also bad.
It is a different issue if they steal your private data or your power (I mean the electrical power for the computers, in case that isn't already clear).
Making copies of published books, music, etc (and doing what you want with them) is not the bad things.
and that is equally atrocious and should be eliminated from a society that wants to share ideas freely.
Idk if your in the US but you also massively oversimplify in your example, copyright law is waaaaaaaay more complex than that and it would take a set of special circumstances way beyond doing what you say it siphon money from an infringment claim
It feels like there are two equally valid sides to this argument that get muddied because of our current model’s/regulations inability to differentiate one over the other.
On the free-information side, I don’t think anyone would argue that AI shouldn’t be allowed to offer a general synopsis of a given book / series. From an author/creator’s POV, it feels like extortion to be able to summarize/recreate any given chapter/subsection to the point that the entire work could be reproduced near-verbatim.
IMO the question is, can we meaningfully draw a line between the two, and if so, how?
I don't think anyone is stopping AI learning on the synopses of books. Or learning on books having paid licensing costs. It's the wanting to have cake, eat cake and for free that is falling.
In contrast to typical corporate crime, it seems there is documentation of upper management signing off on the decision.
Are there other juicy examples where the C-suite can be directly implicated? Always assumed that management knew how to leave instructions vague enough so as to keep their hands clean (a la meddlesome priests). The bad actor was always some middle-manager gone rogue.
I think the main issue is that authors published books with the intention of human not machine consumption. Nobody though to put a contract in a book saying "human consumption only, not to be used to Train AI". Meta pirated the books in question, but what if they had bought a copy. Oddly cracking the encryption, a violation of the DMCA might be the infraction..
The courts have some tough questions to answer here.
If training AI doesn't constitute fair use, you will lose more than you could ever possibly hope to gain. As will the rest of us.
Meanwhile, sublimate your dudgeon towards advocating for free access to the resulting models. That's what's important. Meta is not the company you want to go after here, since they released the resulting model weights.
Unauthorized copying (aka pirating) is definitely a copyright violation.
That appears to be a huge problem with the large models and training. They don't secure legal access to the materials they train on, and thus fail to compensate authors for their work.
AKA students are required to buy or otherwise obtain legal access to their text books(like checking the book out of the library).
Training AI should play the same rules humans students have to follow.
Obtaining copies of pirated works is not infringement. Unauthorized sharing is infringement but being on the receiving end of sharing is not (even if one is an active participant).
As long as I have access to the resulting model, sure. I thought I made that clear. Copyright is not as important as reaching the next stage of our intellectual evolution. Current-gen AI may not be sufficient to reach that stage, but I believe it is a necessary step.
Like the author of this screed, my work went into training every major model. I get paid back every time one of those models helps me learn or do something. The injustice, if it happens, will occur when a few well-heeled players like OpenAI succeed in locking the technology up with regulatory capture or (worse) if a few greedy, myopic assholes render it illegal or uneconomical to continue development by advocating copyright maximalism.
Does fair use imply that pirating copyrighted material is ok?
I mean, it’s a serious question; I don’t see this as really connected.
As long as an AI can “understand” the content of a book and spit out a summary of it, or even leverage what it learned to perform further inference, I’d be inclined to say that this is fair use; a human would do the same.
But this has nothing to do with using pirated material for training, especially for some kind of commercial purpose (even if llama is free, they’re building on top of it) - I don’t see why it should be legal.
I get the commercial/legal angle, but from the viewpoint of AI being something we as a society have an interest in developing, how should this work?
Do you want to severely limit evolution of models by having them pick (and buy) a tiny subset of all books?
Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?
Should Meta buy a physical or electronic copy of every book they want to use for training? That has zero impact on revenue for individual authors.
Would they be paid by word, by token, by book? This makes little sense. We don’t charge people for the knowledge they acquired while going to the library over 50 years, AI just squeezes this into weeks. Our legal framework simply doesn’t fit.
Why should it be fair use? Why would being a derivative work not be OK? There is a massive corpus of public domain and FOSS works. Likewise plenty of permissively licensed government created datasets. There is no reason why any corpus created from these sources is insufficient.
That's not even the real problem. It's a problem, yes, but not the real problem. The problem is that before they could train the model on the book, they had to copy the book from somewhere. Is it ok to make illegal pirated copies of a copyrighted book to train your model? I think that's the issue we are dealing with here.
Whether it is ok to create a derivative work or not is beside the point.
The illustration shows a page from Matter by Iain M. Banks. I don't suppose that's an IP violation, but it implies a human artist with attention to detail.
Mind you, it's page 1 and the book is not on page 1.
But also no one is selling "your book", the product is completely different in literally every conceivable way.
you have never (and no one ever should) own words arranged in a certain way. You own the right to sell a book. Not the words themselves.
meta does bad things and im not a fan, but this really pales in comparison.
It is a different issue if they steal your private data or your power (I mean the electrical power for the computers, in case that isn't already clear).
Making copies of published books, music, etc (and doing what you want with them) is not the bad things.
I wonder if an equivalent to Performance Rights Organizations will emerge as a channel for LLM publishers (so to speak) to pay fees.
Idk if your in the US but you also massively oversimplify in your example, copyright law is waaaaaaaay more complex than that and it would take a set of special circumstances way beyond doing what you say it siphon money from an infringment claim
On the free-information side, I don’t think anyone would argue that AI shouldn’t be allowed to offer a general synopsis of a given book / series. From an author/creator’s POV, it feels like extortion to be able to summarize/recreate any given chapter/subsection to the point that the entire work could be reproduced near-verbatim.
IMO the question is, can we meaningfully draw a line between the two, and if so, how?
Are there other juicy examples where the C-suite can be directly implicated? Always assumed that management knew how to leave instructions vague enough so as to keep their hands clean (a la meddlesome priests). The bad actor was always some middle-manager gone rogue.
The courts have some tough questions to answer here.
If training AI doesn't constitute fair use, you will lose more than you could ever possibly hope to gain. As will the rest of us.
Meanwhile, sublimate your dudgeon towards advocating for free access to the resulting models. That's what's important. Meta is not the company you want to go after here, since they released the resulting model weights.
Unauthorized copying (aka pirating) is definitely a copyright violation.
That appears to be a huge problem with the large models and training. They don't secure legal access to the materials they train on, and thus fail to compensate authors for their work.
AKA students are required to buy or otherwise obtain legal access to their text books(like checking the book out of the library).
Training AI should play the same rules humans students have to follow.
Like the author of this screed, my work went into training every major model. I get paid back every time one of those models helps me learn or do something. The injustice, if it happens, will occur when a few well-heeled players like OpenAI succeed in locking the technology up with regulatory capture or (worse) if a few greedy, myopic assholes render it illegal or uneconomical to continue development by advocating copyright maximalism.
I mean, it’s a serious question; I don’t see this as really connected.
As long as an AI can “understand” the content of a book and spit out a summary of it, or even leverage what it learned to perform further inference, I’d be inclined to say that this is fair use; a human would do the same.
But this has nothing to do with using pirated material for training, especially for some kind of commercial purpose (even if llama is free, they’re building on top of it) - I don’t see why it should be legal.
"Fair use" in copyright law allows limited, specific uses of copyrighted material without permission.
Hence, by definition, not "pirating".
Do you want to severely limit evolution of models by having them pick (and buy) a tiny subset of all books?
Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?
Should Meta buy a physical or electronic copy of every book they want to use for training? That has zero impact on revenue for individual authors.
Would they be paid by word, by token, by book? This makes little sense. We don’t charge people for the knowledge they acquired while going to the library over 50 years, AI just squeezes this into weeks. Our legal framework simply doesn’t fit.
Dead Comment
Dead Comment
That's not even the real problem. It's a problem, yes, but not the real problem. The problem is that before they could train the model on the book, they had to copy the book from somewhere. Is it ok to make illegal pirated copies of a copyrighted book to train your model? I think that's the issue we are dealing with here.
Whether it is ok to create a derivative work or not is beside the point.
Mind you, it's page 1 and the book is not on page 1.
Deleted Comment