The FSF considers large language models

> The prompt used to create the code should also be provided. The LLM-generated code should be clearly marked.

I have a feeling the people who write these haven't really used LLMs for programming because even just playing around with them will make it obvious that this makes no sense - especially if you try to use something local based that lets you rewrite the discussion at will, including any code the LLM generated. E.g. sometimes when trying to get Devstral make something for me, i let it generate whatever (sometimes buggy/not working) code it comes up with[0] and then i start editing its response to fix the bug so that further instructions are under the assumption it generated the correct code from the get go instead of trying to convince it[0] to fix the code it generated. In such a scenario there is no clear separation between LLM-generated code and manually written code nor any specific "prompt" (unless you count all snapshots of the entire discussion every time one hits the "submit" button as a series of prompts, which technically is what the LLM using as a prompt instead of what the user types, but i doubt this was what the author had in mind).

And all that without taking into account what someone commented in the article about code not even done in a single session but with plans, restarting from scratch, summarizing, etc (and there are tools to automate these too and those can use a variety of prompts by themselves that the end user isn't even aware of).

TBH i think if FSF wants to "consider LLMs" they should begin by gaining some real experience using them first - and bringing people with such experience on board to explain things for them.

[0] i do not like anthropomorphizing LLMs, but i cannot think of another description for that :-P

falcor84 · 2 months ago

Agreed, it's almost like requiring that code always come with full transcripts of all the meetings where the team discussed the next steps.

jmathai · 2 months ago

> I have a feeling the people who write these haven't really used LLMs for programming because even just playing around with them will make it obvious that this makes no sense

This is one problem with LLM generated code. It is very greenfield. There’s no correct or even good way to do it. Because it’s a little bit unbounded in possible approaches and quality of output.

I’ve tried tracking prompt history in many permutations as a means to documenting and making rollbacks more possible. I hasn’t felt like that's the right way to think about it.

cxr · 2 months ago

What you're describing isn't any different from a branch of commits between two people practicing a form of continuous integration where they commit whatever they have (whether it breaks the build or not, or is buggy, etc.), capped off by a merge commit when it's finally in the finished state.

badsectoracula · 2 months ago

Eh, i do not think these are comparable, unless you really stretch the idea of what is a "commit", who makes it and you consider all sorts of destructive modifications of branch history and commits normal.

It looks like the FSF is going to sit this one out like the SaaS revolution, to which they reacted late with the AGPL but did not push it. They are not working on a new license and Siewicz is already low-key pushing in favor of LLMs:

"Many years ago, he said, photographs were not generally seen as being copyrightable. That changed over time as people figured out what could be done with that technology and the creativity it enabled. Photography may be a good analogy for LLMs, he suggested."

I have zero trust in the FSF since they backstabbed Stallman.

EDIT: Criticizing anything from LWN, be it Debian, Linux or FSF related, results in instant downvotes. LWN is not a critical publication and just lionizes whoever has a title and bloviates on a mailing list or at a conference.

pessimizer · 2 months ago

I have no idea how to criticize them because I have no idea what to say about LLMs irt the GPL, other than that Free Software should try its best to legally protect itself from LLMs being trained on its code.

I've always been in favor of the GPLs being pushed as proprietary, restrictive licenses, and being as aggressive in enforcement as any other restrictive license. GPL'd software is public property. The association with Open Source, "Creative Commons" and "Public Domain" code is nothing but a handicap; proprietary code can take advantage of all permissively licensed code without pretending that it shares anything in terms of philosophy, and without sharing back unless it finds it strategically advantageous.

> They are not working on a new license and Siewicz is already low-key pushing in favor of LLMs

I just have no idea what I would put in a new license, or what it means to be "in favor" of LLMs. Are Free Software supporters just supposed to not use them, ever? Even if they're only trained on permissively licensed code? Do you think that it means that people are pushing to allow LLMs to train on GPL-licensed software?

I just don't understand what you're trying to say. I also have zero trust in the FSF over Stallman, simply because I don't hear people who speak like Stallman at the FSF i.e. I think his vision was pushed out along with his voice. But I do not understand what you're getting at.

bgwalter · 2 months ago

More or less what you said in your last paragraph: Stallman also reacted late to the web revolution, but at least he was passionate. That passion seems gone.

I don't see any sense of urgency in the reported discussion or any will to fight against large corporations. The quoted parts in the article do not seem very prepared, there are a lot of maybes, no clear stance and no overarching vision that LLMs must be fought for software freedom.

lukan · 2 months ago

"I have zero trust in the FSF since they backstabbed Stallman."

The controversial line might have also been that one.

bgwalter · 2 months ago

Sure, but remember that the Stallman situation started with a highly clumsy Minsky/Epstein mail on an MIT mailing list. The Epstein coverup was bipartisan and now all tech companies are ostensibly on Trump's side and even finance his ballroom.

Are there any protests or demands for the cancellation of Trump, Clinton, Wexner, Black, Barak?

I have not seen any. The cancel tech people only go after those who they perceive as weak.

gjvc · 2 months ago

Yes. 100% agree.

isodev · 2 months ago

> There is also, of course, the question of copyright infringements in code produced by LLMs, usually in the form of training data leaking into the model's output

Well yes, LLMs like Claude Code are merely a "copyright violation as a service". Everyone is so focused on the next new "AI" feature but we haven't actually resolved the issue of all model providers using stolen code to train their models and their lack of transparency on sourced training data.

1gn15 · 2 months ago

Copyright violation is not stealing, and training is not copyright violation (it's already been ruled as fair use, multiple times).

inglor_cz · 2 months ago

I think the concerning problem is when the LLM reproduces some copyrighted code verbatim, and the user doesn't even stand a chance to know it.

blibble · 2 months ago

> it's already been ruled as fair use, multiple times

most countries don't have a concept of fair use

but they nearly all have copyright law

thesz · 2 months ago

One can train model with copyrighted code as it is fair use, fair enough.

Are there any rulings about use of code generated by model trained on copyrighted code?

I believe distinction is clear.

matheusmoreira · 2 months ago

Yeah, copyright infringement isn't stealing, copyright shouldn't even exist to begin with.

I just think it's especially asinine how corporations are perfectly willing to launder copyrighted works via LLMs when it's profitable to do so. We have to perpetually pay them for their works and if we break their little software locks it's felony contempt of business model, but they get to train their AIs on our works and reproduce them infinitely and with total impunity without paying us a cent.

It's that "rules for thee but not for me" nonsense that makes me reach such extreme logical conclusions that I feel empathy for terrorists.

Not really, only a handful of authorities have weighed on that and most of them in a country where model providers literally buy themselves policy and judges.

Wasn't copyleft essentially intended to be "copyright violation as a service"? I.e. making it impossible for an individual working with copyleft code to use copyright to assert control over the code?

wvenable · 2 months ago

Copyleft requires strong copyright protections. Without a license, you have no rights at all to use the code. If you want to use the code, because it's copyrighted, you have abide by the terms of the license.

andybak · 2 months ago

If AI code is non-copyrightable, does this not achieve the aims of copyleft by different means?

I'm playing a bit fast and loose here but there's a solid idea at the heart of this statement - I'm just on the cusp of going to bed so wanted to post a placeholder until tomorrow. The gist is "what do copyleft licences aim to achieve as an end goal - and what would non-copyrightable code mean in that broader context?"

Deleted Comment

stuaxo · 2 months ago

A GPL model trained purely on GPL based code could work.

Dead Comment

somewhereoutth · 2 months ago

1. Understand that code that has been wholly or partly LLM generated is tainted - it has (in at least some part) been created neither by humans nor by a deterministic, verifiable, process. Any representations to its quality are therefore void.

2. Ban tainted code.

Consider code that (in the old days) had been copy pasted from elsewhere. Is that any better than LLM generated code? Why yes - to make it work a human had to comb through it, tweaking as necessary, and if they did not then stylistic cues make the copy pasta quite evident. LLMs effectively originate and disguise copy pasta (including mimicking house styles), making it harder/impossible to validate the code without stepping through every single statement. The process can no longer be validated, so the output has to be. Which does not scale.

acoustics · 2 months ago

It depends on the nature of the code and codebase.

There have been many occasions when working in a very verbose enterprise-y codebase where I know exactly what needs to happen, and the LLM just types it out. I carefully review all 100 lines of code and verify that it is very nearly exactly what I would have typed myself.