Readit News logoReadit News
benlivengood · 2 months ago
Open source and libre/free software are particularly vulnerable to a future where AI-generated code is ruled to be either infringing or public domain.

In the former case, disentangling AI-edits from human edits could tie a project up in legal proceedings for years and projects don't have any funding to fight a copyright suit. Specifically, code that is AI-generated and subsequently modified or incorporated in the rest of the code would raise the question of whether subsequent human edits were non-fair-use derivative works.

In the latter case the license restrictions no longer apply to portions of the codebase raising similar issues from derived code; a project that is only 98% OSS/FS licensed suddenly has much less leverage in takedowns to companies abusing the license terms; having to prove that infringers are definitely using the human-generated and licensed code.

Proprietary software is only mildly harmed in either case; it would require speculative copyright owners to disassemble their binaries and try to make the case that AI-generated code infringed without being able to see the codebase itself. And plenty of proprietary software has public domain code in it already.

strogonoff · 2 months ago
People sometimes miss that copyleft is powered by copyright. Copyleft (which means Linux, Blender, and plenty of other goodness) needs the ability to impose some rules on what users do with your work, presumably in the interest of common good. Such ability implies IP ownership.

This does not mean that powerful interests abusing copyright with ever increasing terms and enforcement overreach is fair game. It harms common interest.

However, it does mean that abusing copyright from the other side and denouncing the core ideas of IP ownership—which is now sort of in the interest of certain companies (and capital heavily invested in certain fashionable but not yet profitable startups) based around IP expropriation—harms common interest just as well.

graemep · 2 months ago
> People sometimes miss that copyleft is powered by copyright.

That is legally true, but it is also true that copyleft is necessary because of copyright. Without copyright (or if copyright did not apply to software) there would be no need for copyleft and very little motive to produce proprietary software. What was produced could be reverse engineered, or used a binary blobs, and compatible replacements produced.

Where the only choice was keeping the source code a trade secret and obfuscating the distributed form or keep source open the latter would easily dominate.

ben_w · 2 months ago
While this is a generally true statement (and has echoes in other areas like sovereign citizens), GenAI may make copyright (and copyleft) economically redundant.

While the AI we have now is not good enough to make an entire operating system when asked*, if/when they can, the benefits of all the current licensing models evaporate, and it doesn't matter if that model is proprietary with no source, or GPL, or MIT, because by that point anyone else can reproduce your OS for whatever the cost of tokens is without ever touching your code.

But as we're not there yet, I agree with @benlivengood that (most**) OSS projects must treat GenAI code as if it's unusable.

* At least, not a modern OS. I've not tried getting any model to output a tiny OS that would fit in a C64, and while I doubt they can currently do this, it is a bet I might lose, whereas I am confident all models would currently fail at e.g. reproducing Windows XP.

** I think MIT licensed projects can probably use GenAI code, they're not trying to require derivatives to follow the same licence, but I'm not a lawyer and this is just my barely informed opinion from reading the licenses.

AJ007 · 2 months ago
I understand what experienced developers don't want random AI contributions from no-knowledge "developers" contributing to a project. In any situation, if a human is review AI code line by line that would tie up humans for years, even ignoring anything legally.

#1 There will be no verifiable way to prove something was AI generated beyond early models.

#2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects. The only room for debate on that is an apocalypse level scenario where humans fail to continue producing semiconductors or electricity.

#3 If a project successfully excludes AI contributions (not clear how other than controlling contributions to a tight group of anti-AI fanatics), it's just going to be cloned, and the clones will leave it in the dust. If the license permits forking then it could be forked too, but cloning and purging any potential legal issues might be preferred.

There still is a path for open source projects. It will be different. There's going to be much, much more software in the future and it's not going to be all junk (although 99% might.)

amake · 2 months ago
> #2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects

Still waiting to see evidence of AI-driven projects eating the lunch of "traditional" projects.

basilgohar · 2 months ago
I feel like this is mostly proofless assertion. I'm aware what you hint at is happening, but the conclusions you arrive at are far from proven or even reasonable at this stage.

For what it's worth, I think AI for code will arrive at a place like how other coding tools sit – hinting, intellisense, linting, maybe even static or dynamic analysis, but I doubt NOT using AI will be a critical asset to productivity.

Someone else in the thread already mentioned it's a bit of an amplifier. If you're good, it can make you better, but if you're bad it just spreads your poor skills like a robot vacuum spreads animal waste.

blibble · 2 months ago
> #2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects

"competitive", meaning: "most features/lines of code emitted" might matter to a PHB or Microsoft

but has never mattered to open source

A4ET8a8uTh0_v2 · 2 months ago
I am of two minds of it having now seen both good coders augmented by AI and bad coders further diminished by it ( I would even argue its worse than stack overflow, because back then they would at least would have had to adjust code a little bit ).

I am personally somewhere in the middle, just good enough to know I am really bad at this so I make sure that I don't contribute to anything that is actually important ( like QEMU ).

But how many people recognize their own strengths and weaknesses? That is part of the problem and now we are proposing that even that modicum of self-regulation ( as flawed as it is ) be removed.

FWIW, I hear you. I also don't have an answer. Just thinking out loud.

heavyset_go · 2 months ago
Regarding #1, at least in the mainframe/cloud model of hosted LLMs, the operators have a history of model prompts and outputs.

For example, if using Copilot, Microsoft also has every commit ever made if the project is on GitHub.

They could, theoretically, determine what did or didn't come out of their models and was integrated into source trees.

Regarding #2 and #3, with relatively novel software like QEMU that models platforms that other open source software doesn't, LLMs might not be a good fit for contributions. Especially where emulation and hardware accuracy, timing, quirks, errata etc matter.

For example, modeling a new architecture or emulating new hardware might have LLMs generating convincing looking nonsense. Similarly, integrating them with newly added and changing APIs like in kvm might be a poor choice for LLM use.

alganet · 2 months ago
Quoting them:

> The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax.

So, no need for the drama.

safety1st · 2 months ago
It seems to me that the point in your first paragraph argues against your points #2 and #3.

If a project allows AI generated contributions, there's a risk that they'll be flooded with low quality contributions that consume human time and resources to review, thus paralyzing the project - it'd be like if you tried to read and reply to every spam email you receive.

So the argument goes that #2 and #3 will not materialize, blanket acceptance of AI contributions will not help projects become more competitive, it will actually slow them down.

Personally I happen to believe that reality will converge somewhere in the middle, you can have a policy which says among other things "be measured in your usage of AI," you can put the emphasis on having contributors do other things like pass unit tests, and if someone gets spammy you can ban them. So I don't think AI is going to paralyze projects but I also think its role in effective software development is a bit narrower than a lot of people currently believe...

kylereeve · 2 months ago
> #2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects. The only room for debate on that is an apocalypse level scenario where humans fail to continue producing semiconductors or electricity.

??

"AI" code generators are still mostly overhyped nonsense that generate incorrect code all the time.

devmor · 2 months ago
None of your claims here are based in factual assertion. These are unproven, wishful fantasies that may or may not be eventually true.

No one should be evaluating or writing policy based on fantasy.

conartist6 · 2 months ago
#2 is a complete and total fallacy, trivially disprovable.

Overall velocity doesn't come from writing a lot more code, or even from writing code especially quickly.

XorNot · 2 months ago
A reasonable conclusion about this would simply be that the developers are saying "we're not merging anything which you can't explain".

Which is entirely reasonable. The trend of people say, on HN saying "I asked an LLM and this is what it said..." is infuriating.

It's just an upfront declaration that if your answer to something is "it's what Claude thinks" then it's not getting merged.

gadders · 2 months ago
I am guessing they don't need people to prove that contributions didn't contain AI code, they just need the contributor to say they didn't use any AI code. That way, if any AI code is found in their contribution the liability lies with the contributor (but IANAL).
otabdeveloper4 · 2 months ago
> Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects.

There is zero evidence so far that AI improves software developer efficiency.

No, just because you had fun vibing with a chatbot doesn't mean you delivered the end product faster. All of the supposed AI software development gains are entirely self-reported based on "vibes". (Remember these are the same people who claimed massive developer efficiency gains from programming in Haskell or Lisp a few years back.)

Note I'm not even touching on the tech debt issue here, but it is also important.

P.S. The hallucination and counting to five problems will never go away. They are intrinsic to the LLM approach.

furyofantares · 2 months ago
Much of that may be true in the (near) future but it also makes sense for people to make decisions that apply right now, and update as the future comes along.
sidewndr46 · 2 months ago
I guess the AI-fueled demise of left-pad is inevitable then. AI will be able to produce superior versions in no time at all
rapind · 2 months ago
> If a project successfully excludes AI contributions (not clear how other than controlling contributions to a tight group of anti-AI fanatics), it's just going to be cloned, and the clones will leave it in the dust.

Yeah I don’t think so. But if it does then who cares? AI can just make a better QEMU at that point I guess.

They aren’t hurting anyone with this stance (except the AI hype lords), which I’m pretty sure isn’t actually an anti-AI stance, but a pragmatic response to AI slop in its current state.

Eisenstein · 2 months ago
If AI can generate software so easily and which performs the expected functions, why do we even need to know that it did so? Isn't the future really just asking an AI for a result and getting that result? The AI would be writing all sorts of bespoke code to do the thing we ask, and then discard it immediately after. That is what seems more likely, and not 'so much software we have to figure out rights to'.
Thorrez · 2 months ago
Is there any likelihood that the output of the model would be public domain? Even if the model itself is public domain, the prompt was created by a human and impacted the output, so I don't see how the output could be public domain. And then after that, the output was hopefully reviewed by the original prompting human and likely reviewed by another human during code review, leading to more human impact on the final code.
AndrewDucker · 2 months ago
There is no copyright in AI art. Presumably the same reasoning would apply to AI code: https://iclg.com/news/22400-us-court-confirms-ai-generated-a...
stronglikedan · 2 months ago
To me, AI doesn't generate code by itself, so there's no difference between the outputted code or code written by the human that prompted it. As well, the humans that prompt it are solely responsible for making sure it is correct, and solely to blame for any negative outcomes of its use, just as if they had written it themselves.
graemep · 2 months ago
Proprietary source code would not usually end up training LLMs. Unless its leaked, how would an LLM have access to it?

> it would require speculative copyright owners to disassemble their binaries

I wonder whether AI might be a useful tool for making that easier.

If you have evidence then you can get courts to order disclosure or examination of code.

> And plenty of proprietary software has public domain code in it already.

I am pretty sure there is a significant amount of proprietary code that has FOSS code in it, against license terms (especially GPL and similar).

A lot of proprietary code is now been written using AIs trained on FOSS code, and companies are open about this. It might open an interesting can of worms.

physicsguy · 2 months ago
> Unless its leaked

Given the number of people on HN that say they're using for e.g. Cursor, OpenAI, etc. through work, and my experience with workplaces saying 'absolutely you can't use it', I suspect a large amount is being leaked.

pmlnr · 2 months ago
Licence incompatibility is enough.
zer00eyz · 2 months ago
raincole · 2 months ago
It's sailed, but towards the other way: https://www.bbc.com/news/articles/cg5vjqdm1ypo
jssjsnj · 2 months ago
QEMU: Define policy forbidding use of AI code generators
koolala · 2 months ago
This is a win for MIT license though.
graemep · 2 months ago
From what point of view?

For someone using MIT licensed code for training, it still requires a copy of the license and the copyright notice in "copies or substantial portions of the software". SO I guess its fine for a snippet, but if the AI reproduces too much of it, then its in breach.

From the point of view of someone who does not want their code used by an LLM then using GPL code is more likely to be a breach.

Dead Comment

olalonde · 2 months ago
Seems like a fake problem. Who would sue QEMU for using AI-generated code? OpenAI? Anthropic?
ethbr1 · 2 months ago
Anyone whose code is in a used model's training set.*

This is about future existential tail risk, not current risk.

* Depending on future court decisions in different jurisdictions

deadbabe · 2 months ago
If a software is truly wide open source in the sense of “do whatever the fuck you want with this code, we don’t care”, then it has nothing to fear from AI.
behringer · 2 months ago
Open source is about sharing the source code. You generally need to force companies to share their source code derived from your project, or else companies will simply take it, modify it, and never release their changes,and charge for it too.
kgwxd · 2 months ago
Can't release someone else's proprietary source under a "do whatever the fuck you want" license and actually do whatever the fuck you want, without getting sued.
candiddevmike · 2 months ago
Won't apply to closed source, not public code, which the GPL (QEMU uses) is quite good at ensuring becomes open source...
JonChesterfield · 2 months ago
Interesting. Harder line than the LLVM one found at https://llvm.org/docs/DeveloperPolicy.html#ai-generated-cont...

I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.

compton93 · 2 months ago
I don't want to review code the author doesn't understand

This really bothers me. I've had people ask me to do some task except they get AI to provide instructions on how to do the task and send me the instructions, rather than saying "Hey can you please do X". It's insulting.

andy99 · 2 months ago
Had someone higher up ask about something in my area of expertise. I said I didn't think is was possible, he followed up with a chatGPT conversation he had where it "gave him some ideas that we could use as an approach", as if that was some useful insight.

This is the same people that think that "learning to code" is a translation issue they don't have time for as opposed to experience they don't have.

windward · 2 months ago
It's the modern equivalent of sending a LMGTFY link, except the insult is from them being purely credulous and sincere
guappa · 2 months ago
My company hired a new CTO and he asked chatgpt to write some lengthy documents about "how engineering gets done in our company".

He also writes all his emails with chatgpt.

I don't bother reading.

Oddly enough he recently promoted a guy who has been fucking around with LLMs for years instead of working as his right hand man.

nijave · 2 months ago
Especially when you try to correct them and they insist AI is the correct one

Sometimes it's fun reverse engineering the directions back into various forum, Stack Overflow, and documentation fragments and pointing out how AI assembled similar things into something incorrect

halostatue · 2 months ago
I have just started adding DCO to _all_ of the open source code that I maintain and will be adding text like this on `CONTRIBUTING.md`:

---

LLM-Generated Contribution Policy

Color is a library full of complex math and subtle decisions (some of them possibly even wrong). It is extremely important that any issues or pull requests be well understood by the submitter and that, especially for pull requests, the developer can attest to the Developer Certificate of Origin for each pull request (see LICENCE).

If LLM assistance is used in writing pull requests, this must be documented in the commit message and pull request. If there is evidence of LLM assistance without such declaration, the pull request will be declined.

Any contribution (bug, feature request, or pull request) that uses unreviewed LLM output will be rejected.

---

I am also adding this to my `SECURITY.md` entries:

---

LLM-Generated Security Report Policy

Absolutely no security reports will be accepted that have been generated by LLM agents.

---

As it's mostly just me, I'm trying to strike a balance, but my preference is against LLM generated contributions.

japhyr · 2 months ago
> any issues or pull requests be well understood by the submitter

I really like this phrasing, particularly in regards to PRs. I think I'll find a way to incorporate this into my projects. Even for smaller, non-critical projects, it's such a distraction to deal with people trying to make "contributions" that they don't clearly understand.

brulard · 2 months ago
Good luck detecting the LLM use
phire · 2 months ago
I do use GitHub copilot on my personal projects.

But I refuse to use it as anything more than a fancy autocomplete. If it suggests code that's pretty close to what I was about to type anyway, I accept it.

This ensures that I still understand my code, that there shouldn't be any hallucination derived bugs, [1] and there really shouldn't be any questions about copyright if I was about to type it.

I find using copilot this way speeds me up. Not really because my typing is slow, it's more that I have a habit of getting bored and distracted while typing. Copilot helps me get to the next thinking/debugging part sooner.

My brain really comprehend the idea that anyone would not want to not understand their code. Especially if they are going to submit it as a PR.

And I'm a little annoyed that the existence of such people is resulting in policies that will stop me from using LLMs as autocomplete when submitting to open source projects.

I have tried using copilot in other ways. I'd love for it to be able to do menial refactoring tasks for me. But every-time I experiment, it seems to fall off the rails so fast. Or it just ends up slower than what I could do manually because it has to re-generate all my code instead of just editing it.

[1] Though I find it really interesting that if I'm in the middle of typing a bug, copilot is very happy to autocomplete it in its buggy form. Even when the bug is obvious from local context, like I've typoed a variable name.

dawnerd · 2 months ago
That’s how I use it too. I’ve tried to make agent mode work but it ends up taking just as long if not longer than just making the edits myself. And unless you’re very narrowly specific models like sonnet will go off track making changes you never asked for. At least gpt4.1 is pretty lazy I guess.
jitl · 2 months ago
When I use LLM for coding tasks, it's like "hey please translate this YAML to structs and extract any repeated patterns to re-used variables". It's possible to do this transform with deterministic tools, but AI will do a fine job in 30s and it's trivial to test the new output is identical to the prompt input.

My high-level work is absolutely impossible to delegate to AI, but AI really helps with tedious or low-stakes incidental tasks. The other day I asked Claude Code to wire up some graphs and outlier analysis for some database benchmark result CSVs. Something conceptually easy, but takes a fair bit of time to figure out libraries and get everything hooked up unless you're already an expert at csv processing.

mattmanser · 2 months ago
In my experience, AI will not do a fine job of things like this.

If the definition is past any sort of length, it will hallucinate new properties, change the names, etc. It also has a propensity to start skipping bits of the definitions by adding in comments like "/** more like this here **/"

It may work for you for small YAML files, but beware doing this for larger ones.

Worst part about all that is that it looks right to begin with because the start of the definitions will be correct, but there will be mistakes and stuff missing.

I've got a PoC hanging around where I did something similar by throwing an OpenAPI spec at an AI and telling it to generate some typescript classes because I was being lazy and couldn't be bothered to run it through a formal tool.

Took me a while to notice a lot of the definitions had subtle bugs, properties were missing and it had made a bunch of stuff up.

stefanha · 2 months ago
There is ongoing discussion about this topic in the QEMU AI policy: https://lore.kernel.org/qemu-devel/20250625150941-mutt-send-...
mistrial9 · 2 months ago
oh agree and amplify this -- graphs are worlds unto themselves. some of the high end published research papers have astounding contents, for example..
hsbauauvhabzb · 2 months ago
You’re the exact kind of person I want to work with. Self reflective and in opposition of lazy behaviours.
linsomniac · 2 months ago
>I don't want to review code the author doesn't understand

I get that. But the AI tooling when guided by a competent human can generate some pretty competent code, a lot of it can be driven entirely through natural language instructions. And every few months, the tooling is getting significantly more capable.

I'm contemplating what exactly it means to "understand" the code though. In the case of one project I'm working on, it's an (almost) entirely vibe-coded new storage backend to an existing VM orchestration system. I don't know the existing code base. I don't really have the time to have implemented it by hand (or I would have done it a couple years ago).

But, I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.

As an open source maintainer myself, I can imagine (thankfully I haven't been hit with it myself) how frustrating getting all sorts of low quality LLM "slop" submissions could be. I also understand that I'm going to have to review the code coming in whether or not the author of the submission understands it.

So how, as developers, do we leverage these tools as appropriate, and signal to other developers the level of quality in code. As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code. ;-)

imiric · 2 months ago
> I'm contemplating what exactly it means to "understand" the code though.

You can't seriously be questioning the meaning of "understand"... That's straight from Jordan B. Peterson's debate playbook which does nothing but devolve the conversation into absurdism, while making the person sound smart.

> I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.

You understand the system as well as any user could. Your tests only prove that the system works in specific scenarios, which may very well satisfy your requirements, but they absolutely do not prove that you understand how the system works internally, nor that the system is implemented with a reliable degree of accuracy, let alone that it's not misbehaving in subtle ways or that it doesn't have security issues that will only become apparent when exposed to the public. All of this might be acceptable for a tool that you built quickly which is only used by yourself or a few others, but it's far from acceptable for any type of production system.

> As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code.

This doesn't match my (~20y) experience at all. Testing is important, particularly more advanced forms like fuzzing, but it's not a failproof method of surfacing bugs. Tests, like any code, can itself have bugs, it can test the wrong things, setup or mock the environment in ways not representative of real world usage, and most importantly, can only cover a limited amount of real world scenarios. Even in teams that take testing seriously, achieving 100% coverage, even for just statements, is seen as counterproductive and as a fool's errand. Deeply thorough testing as seen in projects like SQLite is practically unheard of. Most programmers I've worked with will often only write happy path tests, if they bother writing any at all.

Which isn't to say that code review is the solution. But a human reviewing the code, building a mental model of how it works and how it's not supposed to work, can often catch issues before the code is even deployed. It is at this point that writing a test is valuable, so that that specific scenario is cemented in the checks for the software, and regressions can be avoided.

So I wouldn't say that testing "trumps" reviews, but that it's not a reliable way of detecting bugs, and that both methods should ideally be used together.

rodgerd · 2 months ago
This to me is interesting when it comes to free software projects; sure there are a lot of people contributing as their day job. But if you contribute or manage a project for the pleasure of it, things which undermine your enjoyment - cleaning up AI slop - are absolutely a thing to say "fuck off" over.
dheera · 2 months ago
> I don't want to review code the author doesn't understand

The author is me and my silicon buddy. We understand this stuff.

recursive · 2 months ago
Of course we understand it. Just ask us!
acedTrex · 2 months ago
Oh hey, the thing I predicted in my blog titled "yes i will judge you for using AI" happened lol

Basically I think open source has traditionally HEAVILY relied on hidden competency markers to judge the quality of incoming contributions. LLMs throw that entire concept on its head by presenting code that has competent markers but none of the backing experience. It is a very very jarring experience for experienced individuals.

I suspect that virtual or in person meetings and other forms of social proof independent of the actual PR will become far more crucial for making inroads in large projects in the future.

SchemaLoad · 2 months ago
I've started seeing this at work with coworkers using LLMs to generate code reviews. They submit comments which are way above their skill level which almost trick you in to thinking they are correct since only a very skilled developer would make these suggestions. And then ultimately you end up wasting tons of time proving how these suggestions are wrong. Spending far more time than the person pasting the suggestions spent to generate them.
Groxx · 2 months ago
By far the largest review-effort PRs of my career have been in the past year, due to mid-sized LLM-built features. Multiple rounds of other signoffs saying "lgtm" with only minor style comments only for me to finally read it and see that no, it is not even remotely acceptable and we have several uses built by the same team that would fail immediately if it was merged, to say nothing of the thousands of other users that might also be affected. Stuff the reviewers have experience with and didn't think about because they got stuck in the "looks plausible" rut, rather than "is correct".

So it goes back for changes. It returns the next day with complete rewrites of large chunks. More "lgtm" from others. More incredibly obvious flaws, race conditions, the works.

And then round three repeats mistakes that came up in round one, because LLMs don't learn.

This is not a future style of work that I look forward to participating in.

diabllicseagull · 2 months ago
funny enough I had coworkers who similarly had a hold of the jargon but without any substance. They would always turn out to be time sinks for others doing the useful work. AI imitating that type of drag on the workplace is kinda funny ngl.
beej71 · 2 months ago
I'm not really in the field any longer, but one of my favorite things to do with LLMs is ask for code reviews. I usually end up learning something new. And a good 30-50% of the suggestions are useful. Which actually isn't skillful enough to give it a title of "code reviewer", so I certainly wouldn't foist the suggestions on someone else.
acedTrex · 2 months ago
Yep 100%, it is something I have also observed. Frankly has been frustrating to the point I spun up a quick one off html site to rant/get my thoughts out. https://jaysthoughts.com/aithoughts1
mrheosuper · 2 months ago
People keep telling LLM will improve efficiency, but your comment has proved it's the otherwise.

It look like LLM is not good for cooperation, because the nature of LLM is randomness.

stevage · 2 months ago
> Basically I think open source has traditionally HEAVILY relied on hidden competency markers to judge the quality of incoming contributions.

Yep, and it's not just code. Student essays, funding applications, internal reports, fiction, art...everything that AI touches has this problem that AI outputs look superficially similar to the work of experts.

whatevertrevor · 2 months ago
I have learned over time that the actually smart people worth listening to, avoid jargon beyond what is strictly necessary, talk in simple terms with specific goals/improvements/changes in mind.

If I'm having to reread something over and over to understand what they're even trying to accomplish, odds are it's either AI generated or an attempt at sounding smart instead of being constructive.

danielbln · 2 months ago
Trajectory so far has been that AI outputs are converging increasingly not just in superficial similarity but also quality of expert output. We are obviously not there yet, and some might say we never will. But if we do, there is a whole new conversation to be had.
make3 · 2 months ago
this is 100% the actual crux of the problem, I agree
itsmekali321 · 2 months ago
send your blog link please
acedTrex · 2 months ago
https://jaysthoughts.com/aithoughts1 Bit of a rambly rant, but the prediction stuff I was tongue in cheek referring to above is at the bottom.
ants_everywhere · 2 months ago
This is signed off primarily by RedHat, and they tend to be pretty serious/corporate.

I suspect their concern is not so much whether users have own the copyright to AI output but rather the risk that AI will spit out code from its training set that belongs to another project.

Most hypervisors are closed source and some are developed by litigious companies.

blibble · 2 months ago
> but rather the risk that AI will spit out code from its training set that belongs to another project.

this is everything that it spits out

ants_everywhere · 2 months ago
This is an uninformed take
golergka · 2 months ago
When model trained on trillions of lines of code knows that inside of a `try` block, tokens `logger` and `.` have a high probability of being followed by `error` token, but almost zero probability of being followed by `find` token, which project does this belong to?
make3 · 2 months ago
Even with an interpretation of LLMs where they only learn through some debatable level of generalization in "pattern matching", they do show some fairly strong ability to contextualize and adapt their outputs to the situation through more than copy-pasting, otherwise they would be completely useless.

I get wanting to hate GenAI, for different valid reasons. But this take is incorrect.

duskwuff · 2 months ago
I'd also worry that a language model is much more likely to introduce subtle logical errors, potentially ones which violate the hypervisor's security boundaries - and a user relying heavily on that model to write code for them will be much less prepared to detect those errors.
ants_everywhere · 2 months ago
Generally speaking AI will make it easier to write more secure code. Tooling and automation help a lot with security and AI makes it easier to write good tooling.

I would wager good money that in a few years the most security-focused companies will be relying heavily on AI somewhere in their software supply chain.

So I don't think this policy is about security posture. No doubt human experts are reviewing the security-relevant patches anyway.

Havoc · 2 months ago
I wonder whether the motivation is really legal? I get the sense that some projects are just sick of reviewing crap AI submissions
esjeon · 2 months ago
Possibly, but QEMU is such a critical piece software in our industry. Its application stretches from one end to the other - desktop VM, cloud/remote instance, build server, security sandbox, cross-platform environment, etc. Even a small legal risk can hurt the industry pretty badly.
gerdesj · 2 months ago
The policy is concise and well bounded. It seems to me to assert that you cannot safely assign attribution of authorship of software code that you think was generated algorithmically.

I use the term algorithmic because I think it is stronger than "AI lol". I note they use terms like AI code generator in the policy, which might be just as strong but looks to me as unlikely to becoming a useful legal term (its hardly "a man on the Clapham omnibus").

They finish with this, rather reasonable flourish:

"The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax."

No doubt they do get a load of slop but they seem to want to close the legal angles down first and attribution seems a fair place to start off. This play book looks way better than curl's.

bobmcnamara · 2 months ago
Have you seen how Monsanto enforces their seed right?
SchemaLoad · 2 months ago
This could honestly break open source, with how quickly you can generate bullshit, and how long it takes to review and reject it. I can imagine more projects going the way of Android where you can download the source, but realistically you can't contribute as a random outsider.
b00ty4breakfast · 2 months ago
I have an online acquaintance that maintains a very small and not widely used open-source project and the amount of (what we assume to be) automated AI submissions* they have to wade through is kinda wild given the very small number of contributors and users the thing has. It's gotta be clogging up these big projects like a DDoS attack.

*"Automated" as in bots and "AI submissions" as in ai-generated code

hollerith · 2 months ago
I've always thought that the possibility of forking the project is the main benefit to open-source licensing, and we know Android can be forked.
zahlman · 2 months ago
For many projects you realistically can't contribute as a random outsider anyway, simply because of the effort involved in grokking enough of the existing architecture to figure out where to make changes.
graemep · 2 months ago
I think it is yet another reason (potentially malicious contributors are another) that open source projects are going to have to verify contributors.
api · 2 months ago
Quality contributions to OSS are rare unless the project is huge.
disconcision · 2 months ago
i mean they say the policy is open for revision and it's also possible to make exceptions; if it's an excuse, they are going out of their way to let people down easy
Lerc · 2 months ago
I'm not sure which way AI would move the dial when it comes to the median submission. Humans can, and do, make some crap code.

If the problem is too many submissions, that would suggest there needs to be structures in place to manage that.

Perhaps projects receiving lage quanties of updates need triage teams. I suspect most of the submissions are done in good faith.

I can see some people choosing to avoid AI due to the possibility of legal issues. I'm doubtful of the likelihood of such problems, but some people favour eliminating all possibly over minimizing likelihood. The philosopher in me feels like people who think they have eliminated the possibility of something just haven't thought about it enough.

ehnto · 2 months ago
Barrier of entry, automated submissions are two aspects I see changing with AI. You at least have to be able to code before submitting bad code.

With AI you're going to get job hunters automating PRs for big name projects so they can stick the contributions in their resume.

catlifeonmars · 2 months ago
> If the problem is too many submissions, that would suggest there needs to be structures in place to manage that. > Perhaps projects receiving lage quanties of updates need triage teams. I suspect most of the submissions are done in good faith.

This ignores the fact that many open source projects do not have the resources to dedicate to a large number of contributions. A side effect of LLM generated code is probably going to be a lot of code. I think this is going to be an issue that is not dependent on the overall quality of the code.

hughw · 2 months ago
I'd hope there could be some distinction between using LLM as a super autocomplete in your IDE, vs giving it high-level guidelines and making it generate substantive code. It's a gray area, sure, but if I made a contribution I'd want to be able to use the labor-saving feature of Copilot, say, without danger of it copying an algorithm from open source code. For example, today I generated a series of case statements and Copilot detected the pattern and saved me tons of typing.
dheera · 2 months ago
That and also just AI glasses that become an extension of my mind and body, just giving me clues and guidance on everything I do including what's on my screen.

I see those glasses as becoming just a part of me, just like my current dumb glasses are a part of me that enables me to see better, the smart glasses will help me to see AND think better.

My brain was trained on a lot of proprietary code as well, the copyright issues around AI models are pointless western NIMBY thinking and will lead to the downfall of western civilization if they keep pursuing legal what-ifs as an excuse to reject awesome technology.

Aeolun · 2 months ago
This seems absolutely impossible to enforce. All my editors give me AI assisted code hints. Zed, cursor, VS code. All of them now show me autocomplete that comes from an LLM. There's absolutely no distinction between that code, and code that I've typed out myself.

It's like complaining that I may have no legal right to submit my stick figure because I potentially copied it from the drawing of another stick figure.

I'm firmly convinced that these policies are only written to have plausible deniability when stuff with generated code gets inevitably submitted anyway. There's no way the people that write these things aren't aware they're completely unenforceable.

luispauloml · 2 months ago
> I'm firmly convinced that these policies are only written to have plausible deniability when stuff with generated code gets inevitably submitted anyway.

Of course it is. And nobody said otherwise, because that is explicitly stated on the commit message:

    [...] More broadly there is,
    as yet, no broad consensus on the licensing implications of code
    generators trained on inputs under a wide variety of licenses
And in the patch itself:

    [...] With AI
    content generators, the copyright and license status of the output is
    ill-defined with no generally accepted, settled legal foundation.
What other commenters pointed out is that, beyond the legal issue, other problems also arise form the use of AI-generated code.

teeray · 2 months ago
It’s like the seemingly-confusing gates passing through customs that say “nothing to declare” when you’ve already made your declarations. Walking through that gate is a conscious act that places culpability on you, so you can’t simply say “oh, I forgot” or something.

The thinking here is probably similar: if AI-generated code becomes poisonous and is detected in a project, the DCO could allow shedding liability onto the contributor that said it wasn’t AI-generated.

Filligree · 2 months ago
> Of course it is. And nobody said otherwise, because that is explicitly stated on the commit message

Don’t be ridiculous. The majority of people are in fact honest, and won’t submit such code; the major effect of the policy is to prevent those contributions.

Then you get plausible deniability for code submitted by villains, sure, but I’d like to hope that’s rare.

raincole · 2 months ago
I think most people don't make money by submitting code to QEMU, so there isn't that much incentive to cheat.
shmerl · 2 months ago
Neovim doesn't force you to use AI, unless you configure it yourself. If your editor doesn't allow you to switch it off, there must be a big problem with it.
bgwalter · 2 months ago
It is interesting to read the pro-AI rant in the comments on the linked commit. The person who is threatening to use "AI" anyway has almost no contributions either in qemu or on GitHub in general.

This is the target group for code generators. All talk but no projects.