Dead Comment
Dead Comment
Dead Comment
Dead Comment
Dead Comment
Dead Comment
Let's distinguish two different scenarios here:
1) Your prompt is copyright-free, but the LLM produces a significant amount of copyrighted content verbatim. Then the LLM is liable, and you too are liable if you redistribute it.
2) Your prompt contains copyrighted data, and the LLM transforms it, and you distribute it. Then if the transformation is not sufficient, you are liable for redistributing it.
The second example is what I'm referring to, since the commercial LLM's are now very good about not reproducing copyrighted content verbatim. And yes, OpenAI is off the hook from everything I understand legally.
Your example of commissioning an artist is different from LLM's, because the artist is legally responsible for the product and is selling the result to you as a creative human work, whereas an LLM is a software tool and the company is selling access to it. So the better analogy is if you rent a Xerox copier to copy something by Warhol. Xerox is not liable if you try to redistribute that copy. But you are. So here, Xerox=OpenAI. They are not liable for your copyrighted inputs turning into copyrighted outputs.
But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:
1) Was any copyrighted material acquired legally (not applicable here), and
2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)
And in this particular case, they confirmed that the new implementation is 98.7% unique.
BTW in 2023 I watched ChatGPT spit out hundreds of lines of F# verbatim from my own GitHub. A lot of people had this experience with GitHub Copilot. "98.7% unique" is still a lot of infringement.
Dead Comment