Copyrightest (u/Copyrightest)

Copyrightest commented on Redox OS has adopted a Certificate of Origin policy and a strict no-LLM policy gitlab.redox-os.org/redox... · Posted by u/pjmlp

flammafex · 2 days ago

Thanks. Now I know which software to avoid: the ones that ban legitimate tool use. I have no respect for this protectionist prohibition. These people would insist on driving horse carriages 125 years ago because people were still getting used to driving automobiles.

Copyrightest · 2 days ago

NetBSD has a very reasonable stance:

  If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution. Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code.

  Code generated by a large language model or similar technology, such as GitHub/Microsoft's Copilot, OpenAI's ChatGPT, or Facebook/Meta's Code Llama, is presumed to be tainted code, and must not be committed without prior written approval by core.

https://www.netbsd.org/developers/commit-guidelines.html

Copyrightest commented on Is legal the same as legitimate: AI reimplementation and the erosion of copyleft writings.hongminhee.org/2... · Posted by u/dahlia

crazygringo · 2 days ago

> If you commission it from OpenAI (by sending a query to their ChatGPT API), by your argument, you are the person liable — and OpenAI is off the hook even if that work is distributed further.

Let's distinguish two different scenarios here:

1) Your prompt is copyright-free, but the LLM produces a significant amount of copyrighted content verbatim. Then the LLM is liable, and you too are liable if you redistribute it.

2) Your prompt contains copyrighted data, and the LLM transforms it, and you distribute it. Then if the transformation is not sufficient, you are liable for redistributing it.

The second example is what I'm referring to, since the commercial LLM's are now very good about not reproducing copyrighted content verbatim. And yes, OpenAI is off the hook from everything I understand legally.

Your example of commissioning an artist is different from LLM's, because the artist is legally responsible for the product and is selling the result to you as a creative human work, whereas an LLM is a software tool and the company is selling access to it. So the better analogy is if you rent a Xerox copier to copy something by Warhol. Xerox is not liable if you try to redistribute that copy. But you are. So here, Xerox=OpenAI. They are not liable for your copyrighted inputs turning into copyrighted outputs.

Copyrightest · 2 days ago

The most salient difference is that it's impossible to tell if an LLM is plagiarizing, whereas Xeroxing something implies specific intent to copy. It makes no sense to push liability onto LLM users.

Copyrightest commented on Is legal the same as legitimate: AI reimplementation and the erosion of copyleft writings.hongminhee.org/2... · Posted by u/dahlia

crazygringo · 2 days ago

You might wish that were true, but there are very strong arguments it's not. Training on copyleft licensed code is not a license violation. Any more than a person reading it is. In copyright terms, it's such an extreme transformative use that copyright no longer applies. It's fair use.

But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:

1) Was any copyrighted material acquired legally (not applicable here), and

2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)

And in this particular case, they confirmed that the new implementation is 98.7% unique.

Copyrightest · 2 days ago

The big difference between people reading code and LLMs reading code is that people have legal liability and LLMs do not. You can't sue an LLM for copyright infringement, and it's almost impossible for users to tell when it happens.

BTW in 2023 I watched ChatGPT spit out hundreds of lines of F# verbatim from my own GitHub. A lot of people had this experience with GitHub Copilot. "98.7% unique" is still a lot of infringement.