I'm gonna get hated on for this, but I don't think "give back" is an open source concept.
I'm not aware of any Open Source license,or Free license for that matter,that has a give-back clause. Source code is available to -users- ,not prior-authors.
Some Open Source licenses can be used in proprietary code, (MIT, BSD etc) with little more than simple attribution.
Those developers chose that license for a reason, and I've got no problem with commercial entities using that code.
There is a valid argument to be made about the training of models on GPL code. (Argument in the sense that there's two sides to the coin.) On the one hand we happily train humans on GPL code. Those humans can then write their own functions,but for trivial functions they're gonna look a lot like GPL Source.
If the AI is regurgitating GPL code as-is, then that's a problem- not dissimilar to a student or employee regurgitating the same code.
But this argument is about Free software licenses,not really (most?) Open Source licenses.
Either way OSS/Free is not about "giving back",its about giving forward.
In the specific case here of co-pilot making money,I'd say
A) you're allowed to make money from Free/OSS code. B) no one is forcing you to use this feature.
> I'm not aware of any Open Source license,or Free license for that matter,that has a give-back clause. Source code is available to -users- ,not prior-authors.
In essence, copyleft licenses are exactly that. They oblige the author of a derived work to publish the changes to all users under the same terms. The original authors tend to be users. So, a license which would grant this directly to the original authors would end up providing the same end result since the original authors would be both allowed to and reasonably expected to distribute the derived work to their users as well.
This aligns with the reason why some people publish their work under copyleft licenses: You get my work, for free, and the deal is that if you find and fix bugs then I get to benefit from those fixes by them flowing back to me. Obviously as long as you only use them privately you are not obliged to anything, the copyleft author gives you that option, but once you publish any of this, we all share the results.
That's the spirit here and trying to argue around that with technicalities is disingenuous. That's what Copilot does since it ignores this deal.
All this Free Software movement started by something really similar to "right to repair", a firmware bug in a printer that was proprietary software.
Free Software is about being in control of software you use. The spirit was never "contribute back to GNU", the spirit was always "if you take GNU software, you can't make it non-free". Those GNU devs at the time just wanted a good and actually free/libre OS, that would remain free no matter who distributed it.
You are using expectations of modern day devs in the world of a lot of social development thanks to Github.
You might claim that GP was using the technicalities of the licences, but you can actually check the whole FSF philosophy an you note that they align perfectly with "giving forward" not "giving back".
Free Software is about user's freedom. Not dev rights, or politeness, etc.
Now obviously, some devs picked up copyleft licenses with the purpose of improving their own software from downstream changes (Linus states that is the reason he picked GPL), but that's a nice side effect, not the purpose. Which ofc, with popular social sharing platforms like github, those things gets confused.
I'm not sure I agree with this as a general point of view.
Speaking generally, I'm not sure that one can claim
>> The original authors tend to be the users
There are endless forks of say emacs,and I expect RMS is not s user of any of them.
Of course RMS is free to inspect the code for all of them, separate out bug fixes from features, and retro apply it to his build. But I'm not seeing anything in any license that requires a fork to "push" bug fixes to him.
>> This aligns with the reason why some people publish their work under copyleft licenses: You get my work, for free, and the deal is that if you find and fix bugs then I get to benefit from those fixes by them flowing back to me.
I think you are reading terms into the license that simply don't exist. I agree a lot of programmers -believe- this is how Free Software works, and many do push bug fixes upstream, but that's orthogonal to Free Software principles, and outside the terms of the license.
>> That's the spirit here and trying to argue around that with technicalities is disingenuous.
Licenses are matters of law, not spirit. The original post is about this "spirit". My thesis is that he, and you, are inferring responsibilities that are simply not in the license. This isn't a technicality,it goes to the very heart of Free Software.
You'd think so, but there's also a good chunk of copyleft code that's just "here's our source code, go figure out how to deploy lol".
You can try to fork it into something workable, but that can sometimes literally mean trying to figure out what the actual deployment process is and what weird tweaks were done to the deploying device beforehand. In addition, forking those projects is also unworkable if the original has pretty much enterprise-speed development. At best you get a fork that's years out of date where the maintainer is nitpicking every PR and is burnt out enough to not make it worthwhile to merge upstream patches. At worst, you get something like Iceweasel[0] where someone just releases patches rather than a full fork (and having done that a few times, it's a pain in the neck to maintain those patches).
FOSS isn't at all inherently community-minded; it can be and can facilitate it, but it can also be used as a way to get cheap cred from the people who are naïeve enough to believe the former is the only place it applies.
[0]: "Fork" of Firefox LTS by the GNU Foundation to strip out trademarked names and logo's. It's probably one of their silliest projects in term of relevancy.
> They oblige the author of a derived work to publish the changes to all users under the same terms. The original authors tend to be users. So, a license which would grant this directly to the original authors would end up providing the same end result since the original authors would be both allowed to and reasonably expected to distribute the derived work to their users as well.
I might be in the wrong, but this is not how I understand GPL [0]. Care to correct me if I'm wrong.
What I get from the license is that you have to share the code with the users of your program, not anyone else.
AFAIK you could do an Emacs fork and ask money for it. Not only that but the source code only needs to be available to the recipients of the software, not anyone else.
A company could have an upgraded version of a GPL tool and not share it with anyone outside the company. Theoretically employees might share the code outside, but I doubt they'd dared.
> That's the spirit here and trying to argue around that with technicalities is disingenuous.
First, I am not a lawyer, but don't licenses exist precisely for their technicalities? This is not like a law on the books in which case we can consider the "Letter and Spirit of the law" because we know in what context in which it was written in/for. With a written license however, someone chooses to adopt a license and accepts those terms from an authorship point-of-view.
Exactly. We all benefit from sharing contributions to the same code base. I use your library, you use mine, we fix each others bugs, add features, etc... The code gets better.
I think we're in a new enough situation that we can look beyond what's legal in a license. When many of us started working on open source projects, AI was a far-off concept. Speaking for myself, I thought we'd see steady improvement in code-completion tools, but I didn't think I'd see anything like GPT-4 in my lifetime.
Licenses were written for humans working with code. We can talk about corporations as well, but when I've thought about corporations in the past, I thought about people working on code at corporations. The idea of an AI using my open source project to generate working code for someone or some corporation feels...different.
Yes, I'm talking explicitly about feelings. I know my feelings don't impact the legalities of a license. But feelings are worth talking about, especially as we're all finding the boundaries of new tools and new ways of working.
I don't agree with everything in the post, but I think this is a great conversation to be having.
> Yes, I'm talking explicitly about feelings. I know my feelings don't impact the legalities of a license.
They don't impact the current legality of a licence, but it will affect future ones.
GPL/BSD/Apache/proprietary, they are all picked for ideological concerns which all stem from feelings.
It is good to discuss these things, and it is good to recognise that these are emotionally driven.
Yes. People here seem to be forgetting that Open Source was a community driven ideal first. The License came later as "protection". Corporations were stealing code and there was no recourse. The variety of open source licenses were created to provide a framework for the community, to fight off stealing, to keep it open. So GPT is very much 'laundering' the code just like criminals 'launder' money.
I agree that AI usage of code is somewhat murky with current licenses,which obviously don't mention it either way.
Free software has a principle of "freedom to run, to do whatever you wish" (freedom 0), so arguably has said that training AI is OK. (We could quibble over the word Run, but the Gnu.org,and RMS clearly say "freedom 0 does not restrict how you use it."
GPL code can be used by the military to develop nuclear weapons. Given that the is a guiding principle of the FSF its hard to argue that the current usage is not OK.
This may seem a bit nitpicky and philosophical, but anyway: these feelings you mention are about things, and these things the feelings come from are what is most important. Feelings are never standalone, if they are they are just moods which are so personal its hard to have a conversation over.
Let's call 'the things' values. I'd say feelings are perceptions of values, and as such they invariably have a conceptual element to them. And exactly that conceptual aspect makes them suitable for conversation and sometimes even debate, insofar as they can be incorrect. We can acknowledge the subjective, emotive aspect of feelings as highly and inalienably personal, respect the individual opinion behind them and contest the implicit truth-claims all at the same time.
> I'm not aware of any Open Source license,or Free license for that matter,that has a give-back clause.
§5.c of the GPL Version 3 states explicitly:
> You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it.
As in, all modifications must be made available. Is that not meeting your definition of giving back? GPL (all variants) is one of the most widely distributed of the free software licenses and has an explicit "give back" clause as far as I can see it. -- and is part of why some people referred to GPL as a "cancer".
FWIW the issue I've come to have with copilot is that you're not explicitly permitted to use the suggestions for anything other than inspiration (as per their terms), there is no license given to use the code that is generated. You do so at your own risk.
>> As in, all modifications must be made available. Is that not meeting your definition of giving back?
Available to all users. Not previous authors. There may be overlap, or there may not be overlap.
Plus, I would say it's giving forward, not back. If there are public users then the original authors can become users and get the code. But there will be bug fixes and features smooshed together.
Which is why i posit that there's no "give back" concept in the license. Only "give forward".
I would say that open source as a movement sprung up from the principles of early netiquette[0]. Which themselves were built on the foundations of sharing your knowledge with your peers.
Whether you were trawling Usenet or just a presence in your local BBS scene, "teach it forward" was always a core concept. Still is. It's difficult to pay back to the person who taught you something valuable, so you can instead pay it forward by teaching the lessons - along with your own additions - to the later newcomers.
Of course the Eternal September changed the landscape. And now we can't have nice things.
I think the big move to formalization of GNU was that no free compilers for C existed. RS rightly saw this as a problem and did what he thought was needed to get a universal free c compiler.
Co-pilot spits back protected expressions, not novel expressions based on ideas harvested from code. It is therefore violating the licenses of numerous free and open source projects. The author is right to be pissed.
That's not the case, there's a probability it may "spit back" the protected expression.
There's also a probability I, as a human "spit back" the protected expressions. This could either be by pure chance or from past learnings, reading the protected code and internalizing it as a solution, my subconscious forgetting I actually saw it elsewhere.
In Uni, students run their theses through plagiarism checkers, even if it's novel research as it naturally occurs.
As the thought experiment goes, given infinity, a monkey with a typewriter will inevitably write Shakespeares works.
You are correct. The problem is that the GitHub Terms of Service probably (guessing) have a clause which invalidates your license if you upload your code there. And that's exactly why you shouldn't use GitHub.
This seems to be what people imagine about it, not what it actually does, although I don’t doubt you could cherry-pick some snippet after a lot of trial and error to try to claim that it had regurgitated something verbatim. But certainly let’s see the examples.
You're allowed to make money from Free/OSS code, and plenty of companies have (Google, Amazon etc.), but they have always also at least given back something to the community to earn some good will. The situation with AI is new because it not only doesn't give anything back, it actually takes something away by threatening developers' jobs etc.
One possible problem is if Copilot gets good enough that you can rather easily sidestep GPL (or any other license) by having Copilot implement functionality X for you instead of using a license-bound library providing X. Not only may this be questionable regarding the license, but it would also be tend to reduce contributions to the library which otherwise would have been used.
It would be interesting to have Free Software License that requires that any thing which ingests the source code must be Free Software running on Free Hardware. If you train a model on such inputs, your model would need to be Free software and all the hardware the model runs on would need to be Free Hardware. This would create a massive incentivize to either not use such software in your model or to use Free Software and Free Hardware.
Taken to its logical conclusion, you could add the notion of Free Humans are legally bound to only produce Free Ideas. One could imagine this functioning like sort of monastic vow of charity or chastity. "Vow of silence on producing anything which is not Free (as in freedom)."
Would you take such a vow if offered 100,000 USD/year for the rest of your life (adjusted for inflation)? I would.
This idea ("make a stronger license") has come up in previous discussions of Copilot as well[0].
The problem is that the Copilot project doesn't claim to be abiding by the license(s) of the ingested code. The reply to licensing concerns was that licensing doesn't apply to their use. So unfortunately they would just claim they could ignore your hypothetical Free³ license as well.
> If the AI is regurgitating GPL code as-is, then that's a problem- not dissimilar to a student or employee regurgitating the same code.
Not "if". We know it does.
And since it doesn't show citations, it might be the case that you use and mistakenly end up making your entire software GPL, because of including copy pasted GPL software.
Yeah on the one hand, isn't opening your source all about not really minding what happens to it after that? It's intended to be copied and used. On the other hand something about the term "laundering" kind of resonated for me. It's kind of like automated plagiarism where you spread your copying out over millions of people. But plagiarism only has meaning as an offense when the thing being copied isn't intended to be copied. But for copyright purposes is there a difference between copying exactly, and the type of blending a LLM does? I'm too confused. That feeling when you hit on something society has never thought about before.
If we go by how many people explain open source, you would be right, but if we go by how people who actually know what their licenses are supposed to do explain open source, then no. You give a license for a specific reason. One might be to allow others to copy, but there is usually a condition, and that is to leave the license information intact. If we go further towards free/libre software licenses like GPL or AGPL, then we have more conditions to it. For example that, if you distribute software using that code, you need to distribute the source of your software as well (a bit imprecise).
If you want to get a better picture of the situation, read up on the licenses and what they do, specifically the term "copyleft".
> Yeah on the one hand, isn't opening your source all about not really minding what happens to it after that?
No! That's a gross misrepresentation of what open sourcing is. It's the offer of a deal. You publish the source code and in return for looking at it and using it for sth, I have obligations. Like attribution and licensing requirements regarding derived works.
no, open source isn't about practically giving up your rights, its about restricting use of your code and software in exactly such a way that it gives every user as much freedom as possible.
This actually has been thought about before, in the context of remixes, collages, etc. The essential question is how much of the originality of the original work(s) constitutes the originality of the new/derived work. If it is little enough, then it’s okay. The issue with AI models is that they have no way of assessing originality and tracking the transfer of originality.
The term is being used here to imply that the generated code is somehow bypassing the licensing requirements, which isn’t necessarily true, and certainly isn’t a substantiated claim.
You can read licensed code, learn from it, and then write your own code derived from that learning, without having committed a copyright violation.
You can also read licensed code, directly copy paste it into your codebase, and still not have committed a copyright violation, as long as you did so in a way that constituted fair use (which copy-pasting snippets certainly would).
There’s no copyright issue here at all, and rationally speaking there aren’t any legitimate misuse of open source concerns either. If these people were honest they’d just admit to feeling threatened by AI, but nobody would care about that, so they just try to manufacture some fake moral panic.
I agree that copyleft is more about "giving forward", and I think it's a confusion a lot of people make. Reading through the thread, I get the impression that some think as soon as one "distributes" the licensed material, original authors should get a copy. I'm extrapolating of course, but even then I feel some people would agree with that statement.
GPL, for instance, merely states that distributed sources or patches "based on" the program should be "conveyed" under the same terms. In other words, anyone who gets their hands on it will do so under the same license.
If anything, I would be worried that GitHub trained itself on publicly-available but not clearly licensed code, because then it would have no license to "use" it in any way[0]. GPL provides such a right, so there is no problem there. It would be even more worrying if the not clearly licensed code was in a private repository but I think I remember reading that private repositories were not included in the training data.
However, would you consider a black box program, of which the output can consistently produce verbatim or at the very least slightly modified copies of code from GPL code to be transformative? The problem does not lie in how the code is distributed but in how transformative the distributed code is. Not only does the same apply to any program besides AI-powered software, it applies to humans[1].
Given how unpredictable the output of an AI is, one should not be able to train itself on GPL code if it cannot reliably guarantee it will not produce infringing code.
Perhaps I'm out of the loop on this, but I always thought the concept of open source was primarily about the opportunity for personal professional development. The ability for someone not connected with a corporation to stay relevant and continuously update his skills in away that was not dependent on proprietary systems. That is a huge asset, not only for oneself but also for the the world.
Time will tell, and it's a destined trend for more devs close sourcing their code no matter what curious angle you are trying to justify large firms using AI exploiting money, and I doubt you are working for one of them.
IMHO, just like there was a robots.txt file made for the web, there needs to be a NOAI.txt for git repos. Sorry, this repo does not permit you to ingest the code for a learning model. Seems completely reasonable.
If we were somehow able to prevent AI models from ingesting a codebase, that would mean everyone else who wants to produce similar code would have to re-invent the wheel, wasting their time repeating work that has already been done.
All because... the person who did it first wants attribution? They want their name to be included in some credits.txt file that nobody will ever read? That's ridiculous.
People keep bringing this up. It's not as straightforward as a clause that says "you can't use this to train AI" (which is what I suspect many people think).
Licensing operates on a continuum of permissiveness. They can only relax the restrictions that you as a creator are given by default. You can't write a copyright license that adds them. You could write a legal instrument that compels and prohibits certain behavior(s), but at that point you're talking about a contract. (And there's no way to coerce anyone to agree with the contract.)
Harry Potter has even more restrictions than the GPL or any other open source license. It's "All Rights Reserved—it enjoys the maximum protections that a work can. And yet it would still be possible feed it into an AI model, even if all of Rowling, Bloomsbury, and Scholastic didn't want you to. They don't get a say in that. Nor do open source software developers in their works which selectively abandon some of the protections that Rowling reserves for herself and her business partners.
The only real viable path to achieve this using an IP license alone would be a React PATENTS-like termination clause—if your company engages in any kind of AI training that uses this project as an input, then your license to make copies (including distributing modified copies) under the ordinary terms are revoked, along with revoking permission for a huge swathe of other free/open source software owned by a bunch of other signatories, too. This is, of course, contingent upon the ability to freely copy and modify a given set of works being appealing enough to get people to abstain from the lure of building AI models and offering services based on them.
> I'm gonna get hated on for this, but I don't think "give back" is an open source concept.
You're right. It's a politeness law some people have invented.
It's also a value people have, but that's for themselves. I like contributing to OSS projects. But, as soon as it's imposed on others, and there are punishments for disobeying, it's a politeness law.
> On the one hand we happily train humans on GPL code. Those humans can then write their own functions,but for trivial functions they're gonna look a lot like GPL Source.
Exactly. People are getting mad that Microsoft is making good money while the people who made all that free software available mostly did it for free (like in no money and no recognition). It can sound unfair but that's the deal. If you didn't want people or AI to learn from your code, open source was not the right option.
> If you didn't want people or AI to learn from your code, open source was not the right option.
There's nothing wrong with other people using - learning and creating derivative works of - one's open-source code, provided they respect the terms of the license. It seems to me that the real issue is the fact that these licenses don't have enough teeth.
Most people I know who contribute, or host open source projects, me included, do this for references. And the most successful ones find a way to generate revenue. "Giving back" is a nice additional thing, but I don't know anybody who does that _primarily_ to "help the world"
If we are being honest as a community, open source developer are pretty far down the list of groups with valid grievances against this current wave of AI for how they are trained. There is at least a debatable case that these systems are operating in the spirit if not the exact letter of general open source licenses. It is a much harder argument to make for the AI trained on writing and art that is clearly copyrighted. If you have ethical questions about Copilot, you really should be against this entire crop of AI systems.
So you're suggesting that developers shut up and let the artists talk first? I'm not sure what the "you're suffering less than these other people" thing is actually intended to translate into? What do we do with that?
All software licences are based on copyright, same as writing, art, music, etc. Some software licences are permissive. Some writing is permissive (e.g. Cory Doctorow). Some music is permissive (e.g. Amanda Palmer). It entirely depends on what the author wants. The fact that more software is permissive is a good thing, right?
I entirely agree that there are ethical problems with training AI on copyrighted training data. But please let's not start gatekeeping this. We need to have a serious discussion as a culture about it, and saying "you're way down the list of victims" isn't helping.
What I agree with is the typical open source dev, who goes "I MIT license all my things, because I have seen it elsewhere and I don't want to think about licenses a lot." being pretty far down the list of groups of people to complain.
What I disagree with is the idea, that they should therefore not complain, or that there could not be an AI system, that does not code laundering, but keeps licenses in place and does this ethically and in an honest way. Adding "ethically" and "honest way", because I am sure that companies will try to find a way around being honest, if they ever are forced to add back the licenses.
In fact, artists might not be the group, that grasps the impact of training on that corpus as quickly as the dev communities. Perhaps it is exactly the devs, who need to complain loudest and first, to have a signal effect.
>I'm gonna get hated on for this, but I don't think "give back" is an open source concept.
Well I guess you know why you may be hated for this already. For anyone who has surf HN since ~2010 would know or should notice the definition of open source has changed over the past 10-15 years. Giving Back and Communities are the two predominant Open Source ideals now. Along with making lots money on top of OSS code being somewhat a contentious issue to say the least.
But I want to side step the idealistic issue and think this is more of an economic issue. Where this could be attributed as a zero interest rate phenomenon. You now have developers ( especially those from US ) for most if not all of their professional life living under the money / investment were easy, comparatively speaking. And they should give back when money ( or should I say cash flow ) isn’t an issue. When $200K Total Comp were suppose to be the norm for fresh grad joining Google. And management thinking $500K is barely enough they need to work their way to $1M, while seniors developer believes if Junior were worth $200K than they are asking for $1M total comp is perfectly sane, or some other extreme where everyone in the company should earn exactly the same.
If Twitter or social media were any indication you see a lot of these ideals were completely gone from the conversation. Although this somehow started before the layoffs.
It is somewhat interesting to see the sociologic and idealogical changes with respect to economics changes. But then again, economics in itself is perhaps the largest field psychology study.
> The code that was regurgitated by the model is marketed as "AI generated" and available for use for any project you want. Including proprietary ones. It's laundering open-source code. All of the decades of knowledge and uncountable hours of work is being, well, stolen. There is nothing being given back.
Leaving GitHub wont change that, OpenAI is training its models on every bit of code they can have, sourcehut, codeberg etc.
If its public, they will train on it.
Also from my experience of trying to leave GitHub, you just end up having a couple of projects on your alternative platform, and everything else on GitHub.
You are still active on GitHub, probably even more than your new alternative.
And if you want to build a community, you will quickly find out that the majority want to stick to GitHub, and leaving it can kill your projects chances of getting contributions.
Personally if the courts decide its fair use, that's it, I'm going back, its the best got platform out there, gitlab doesn't even compare in free features.
However I have been eyeing Gitea and Gitea Actions, with it Codeberg could become a realistic choice for me.
To end it, here is a Hot take, I really hate Sourcehut.
it hard to use, the ui is .. Not great and trying to browse issues or latest commits is a nightmare.
Every time a project uses it, its a pain to deal with.
> Also from my experience of trying to leave GitHub, you just end up having a couple of projects on your alternative platform, and everything else on GitHub.
> And if you want to build a community, you will quickly find out that the majority want to stick to GitHub, and leaving it can kill your projects chances of getting contributions.
That's a defeatist attitude and a self-fulfilling prophecy at the same time. As more and more people leave GitHub (hopefully not to go to the same alternative), it becomes less and less of a must-have. The reason these things are somewhat true today is because of the network effect, and it's precisely that effect which we must actively attempt to squash by leaving.
Parent is talking about a fundamental feature of networks. A denser and larger network has much more useful network-related features, and if one company has a significant majority of the total addressable market for a network, it's a massive ask for people to extricate themselves and rebuild a network somewhere else.
It's why Facebook is still on top even though everyone hated it for a while; YouTube is the *only video platform, etc.
> Leaving GitHub wont change that, OpenAI is training its models on every bit of code they can have, sourcehut, codeberg etc. If its public, they will train on it.
Not every bit of code, they are respecting proprietary licenses.
When MS puts the code for Windows, Office, Azure and everything else in front of ChatGPT, Copilot, whatever other AI learning model they have, then perhaps they have a leg to stand on.
Otherwise, they're just being hypocritical to claim that no injury is being done by using code for training, because they are refusing to train on any of their code.
Right now it just looks like they are ripping off open source licenses without meeting the terms of the license.
AFAIK that has nothing to do with the license, it has to do with whether the code is public. You don't want the AI accidentally revealing proprietary non-public information (e.g. imagine someone had a secret API key in a private repo and copilot leaked it; that'd be a huge incident), so you don't train it on that information, regardless of what it's licensed under.
You could make a similar argument for not training on GPL code, but it's a lot easier to programmatically determine whether or not code is public than it is to programmatically determine what it's licensed under, particularly when you're training on massive amounts of unlabeled data. Not to mention it's way easier to delete an accidentally-added snippet of GPL code from a codebase than it is to "unleak" company secrets after they've been publicly revealed.
> Every time a project uses it, its a pain to deal with.
Sorry, but I consider that a plus.
One of the primary problems with GitHub right now is the "drive by" nature. Everybody is on Github because a bunch of idiotic big corporations made "community contribution" part of their annual review processes so we now have a bunch of people who shouldn't be on GitHub throwing things around on there.
Putting just a touch of friction into the comment/contribute cycle is a good thing. The people who contribute then have to want to contribute.
I like sourcehut, I'm just not a fan of email oriented collaboration workflow, so I dont use it. And the rest of the world isn't either, if the success of github is anything to go by. I get that Drew likes it, the greybeards are used to it, it works, it's adequate, and it keeps things simple, but I just never could do it. I don't like git either tbh, I grumble while I use it. IMO the perfect collaboration suite would be something like fossil with RSS feeds for every action.
I believe the goal is to build a minimal UI for those that don't prefer which is fine, but email & pull requests aren't the only model here. Look how much tooling is created to try to fit stack-based diffs atop Git+GitHub instead of using a different platform.
I'm mostly familiar with gitlab, what does github provide for free above and beyond that? I like that I can run my gitlab pipeline on my machines and sync to a free gitlab instance. I like that I don't read about security vulnerabilities in gitlab pipelines nearly as often as github actions. I like gitlab issues as they are fairly minimal.
GitHub registry, GitHub actions and GitHub Codespaces are unlimited for public repos, in addition to all enterprise features.
That's without talking about nice to have features like GitHub Sponsors, the for you tab, the (arguably) more popular UI layout, It's simply a better platform for Open source projects
Unlimited package registry, unlimited Action run time, premium features unlocked and more.
Also, the free tier on GitHub gives more for private repos too!, unlimited orgs, 2000 Ci minutes etc.
It's just plain better, and It's because Microsoft can afford to play the long game, GitLab can't anymore.
I believe he just wants to do his bit by removing his activity from github towards lowering their dominance numbers in the space. I don't think he intends to stop those LLM code models.
This whole open source thing is the biggest farce on planet Earth. Someone with a good knowledge about geeks and their behaviour concocted up this open source bullshit. So now talented people give their skill to the "whole" and they have to beg for contributions and donations to get by. And other geeks (not suits with ties) finance the ones they sympathise with. It's ridiculous.
And faceless entities use their hard work for who knows what, but mostly to fatten up their already oversized corp and give back NOTHING.
And people, seemingly without common sense suck up to companies that rob them, and even disseminate their shiny new "free" tools.
This would be a Hugo-Nebula award winner novel if it wouldn't be reality.
This is such a misrepresentation of the open-source landscape. Yes, there are people working on open-source projects who beg for donations; but there also are open-source projects maintained by full-time employees (Eleventy, paid by Netlify; React, paid by Facebook; Angular, paid by Google; Next.js, paid by Vercel; Linux, paid by various companies; etc.). If a person thinks that his efforts will be better compensated elsewhere, he can always start looking for a paid job.
> So now talented people give their skill to the "whole" and they have to beg for contributions and donations to get by. And other geeks (not suits with ties) finance the ones they sympathise with. It's ridiculous.
Is it? I can't think of a single professional dev making money right now that isn't making money because they did not have to reinvent the entire tech stack that they are skilled in.
If there was no open source, we'd all be making a lot less, and the state of tech would be far far smaller than it is right now.
I don’t think open source per-se but certainly permissive licenses like Apache were a mistake. They’ve just allowed business to either get free things to make a profit while contributing nothing back or to literally create a business by selling the Apache licensed programs in the cloud.
Yikes. You sound very bitter. Is there a story behind that bitterness?
There's a wide variety of people in the open source community at large. And a wide variety of motivations for contributing. I for one am happy that open source software is a thing. It's been a net good for mankind. Sure, there are abuses, and I'm sure many things could be improved. But I'm glad it's there all the same.
I tend to disregard articles that default to the "Stochastic Parrot" argument. These tools are useful now, I don't personally care about achieving actual intelligence. I want additional utility for myself and other humans, which these provide now, at scale.
By a lot of measures many humans perform at just about the same level, including confidently making up bullshit.
This post reads like one of the "Goodbye X online video game" posts. I'll cut them some slack because this is their blog they're venting on and was likely posted here by someone else and not themselves doing some attention seeking, but meh.
Being useful and a stochastic parrot are not mutually exclusive. And I in fact think the opposite. it's Necessary to remind people what it really is, especially in this phase of "Enthusiasm" because I see to many people attributing some meaning or some hidden Insight and especially some innate infallibility to AI nowdays, maybe confused by the name AI.
Right, but most arguments, including the one here, go something like "AI is a Stochastic Parrot so it's a lie and now I think it's bad and we shouldn't do it."
Which is a pretty dumb position imo. Not that I personally think these newer LLMs are a stochastic parrot, or at least not to the degree proponents of the Stochastic Parrot argument would have you believe.
> now quickly taking on the role of a general reasoning engine
And this right here is why it's important to emphasize the "stochastic parrot" fact. Because people think this is true and are making decisions based on this misunderstanding.
Since ChatGPT I've become much more aware of my own thoughts and written text. I'm now often wondering whether I'm just regurgitating the most frequently used next word or phrase or whether it could actually be described as original. Especially, for things like reacting with short answers to chat messages, I am confident that these are only reactionary answers without alternatives, which could have come from ChatGPT trained on my chat log. I feel like knowing and seeing how ChatGPT works can elevate our own thinking process. Or maybe it only is similar to awareness meditation.
> I think we’re now way past that now with LLMs now quickly taking on the role of a general reasoning engine.
No we're not, and no they are not.
An LLM doesn't reason, period. It mimics reasoning ability by stochastically chosing a sequence of tokens. Alot of the time these make sense. At other times, they don't make any sense. I recently asked an LLM:
"Mike leaves the elevator at the 2nd floor. Jenny leaves at the 9th floor. Who left the elevator first?"
It answered correctly that Mike leaves first. Then I asked:
"If the elevator started at the 10th floor, who would have left first?"
And the answer was that Mike still leaves first, because he leaves at the 2nd floor, and that's the first floor the elevator reaches. Another time I asked an LLM how many footballs fit in a coffe-mug, and the conversation reached a point where the AI tried to convince me, that coffe-mugs are only slightly smaller than the trunk of a car.
Yes, they can also produce the correct answers to both these questions, but the fact that they can also spew such complete illogical nonsense shows that they are not "reasoning" about things. They complete sequences, that's it, period, that's literally the only thing a language model can do.
Their apparent emergent abilities look like reasoning, in the same way as Jen from "The IT crowd" can sound like shes speaking Italian, when in fact she has no idea what she is even saying.
I think AI is here to stay (obviously) but we do need a much better permission model regarding content, whether this is the writing on your blog, your digital art, your open source code, video, audio...all of it.
The current model basically says that as soon as you publish something, others can pretty much do with it as they please under the disguise of "fair use", an aggressive ToS, the like.
I stand by the author that the current model is parasitic. You take the sum of human-produced labor, knowledge and intelligence without permission or compensation, centralize this with tech about 2 companies have or can afford, and then monetize it. Worse, in a way that never even attributes or refers to the original content.
Half-quitting Github will not do anything, instead we need legal reform in this age of AI.
We need training permission control as none of today's licenses were designed with AI in mind. The default should be no permission where authors can opt-in per account and/or per piece of content. No content platform's ToS should be able to override this permission with a catch-all clause, it should be truly free consent.
Ideally, we'd include monetization options where conditional consent is given based on revenue sharing. I realize that this is a less practical idea as there's still no simple internet payment infrastructure, AI companies likely will have enough non-paid content to train, plus it doesn't solve the problem of them having deep pockets to afford such content, thus they keep their centralization benefits. The more likely outcome is that content producers increasingly withdraw into closed paid platforms as the open web is just too damn hostile.
I find none of this to be anti-AI, it's pro-human and pro-creator.
An important legislative step for this is that anyone creating and publishing an AI learning model needs to be able to cite their sources - in this case, a list of all the github repositories and files therein, along with their licenses.
If that is made mandatory, only then can these lists actually be checked against licenses.
There will also need to be a trial license, to establish whether an AI learning model can be considered derived from a licensed open source project - and therefore whether it falls under the license.
And finally, we'll likely get updated versions of the various OSS licenses that include a specific statement on e.g. usage within AI / machine learning.
In the age of reposts and generative AI, "attribution" is irrelevant. Nobody cares who originally made some content, and it truly doesn't matter.
>The more likely outcome is that content producers increasingly withdraw into closed paid platforms
Nah. You didn't get paid to write that post, did you? You did it for free. People nowadays are perfectly willing to create free content, and often high quality content, sometimes anonymously, even before generative AI.
There's no need for financial incentives anymore. As content creation becomes easier, people will start creating out of intrinsic motivation - to express themselves, to influence others and to inform. It's better that way.
Restricting content so that others can't benefit from it is not pro-human or pro-creator, it's selfish and wasteful. We should get rid of licenses altogether and feed everything humanity creates into a common AI model that is available for use by everyone.
I maintain a popular OSS project which code is hosted on Github [1].
The entire "Github doesn't give back" argument is wrong. For "free", Github lets me host our code, run thousands and thousands of hours of free CI (and we are aggressively using it), host releases and docker images, and lets us manage thousands of issues. Also, Copilot is free when you are eligible to it, so we are fortunate enough to not have to pay for it as well.
Yes, they monetize our attention and train Copilot with the code, but the only argument which can't be used against this company is that they don't give back.
Why hasn't someone just changed the GPL license already:
"If you train an AI on this code, you must release the source code and generated neural net of that AI as open source" or something to that effect.
It won't stop it, but it will slow it down, and it seems like the right T&Cs to put on training against GPL code because it gives an advantage to open source AIs, however minor.
Aren't they claiming that it's fair use? IANAL, but wouldn't that make the licence irrelevant if training AI/ML models was found to be fair use? And if not, it's a licence violation anyway?
It will be difficult to claim fair use if training AI model is explicitly mentioned in the license, I think.
Currently GPL says:
> To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
> A "covered work" means either the unmodified Program or a work based
If in addition it would say something like "Generative AI models trained on the program source code as well as the text produced with such models is also a work "based on" the Program", then there will be little room for a fair use claim, I think.
IANAL either, but the license still applies to the end-user (the person who trained the AI) so it would seem like it would add at least 1 non-trivial license violation for that user?
Edit: I googled "fair use copyright US" and have now decided that US copyright law is stupid.
When GH devised Copilot, they could have (internally, at GH) decided to make a two-tier model, one tier trained only on unrestrictive licenses, the other bringing in more-restrictively license code too. And then offer them to the GH-using public as two different functionalities. An intelligently differentiated product line for intelligent people.
But: NOOOO.
In order to close off this possibility, which would restrict Copilot revenue, they instead would roll out a single undifferentiated product and with lots of "gee whiz!" and associated hooplah, and be sure to offer it for free for a while to suck everyone in and head off criticism.
They are really not going to care about what you put into your license file, they are just going to claim that the use of GitHub binds you to their terms of service and that this supersedes your own license. Good luck fighting that.
> This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.
I abandoned github the day that they (and others, including people here) started arguing that their ToS trumps your code's license. That's absurd. It's authoritarian, it's hostile, it's an act of enmity. Fuck all that bullshit. I do business with no entity, period, that treats me with that level of disdain.
Yeah and it's like, nonsensical for the service: if you are working in open source, the reality is that a lot of the time you are working with software you don't own. I develop a lot of software, and the vast majority of it was open source... but I've only ever put two projects of mine on GitHub (and one only because I was working with some other people and I essentially got outvoted ;P). And yet, if you search for my code, I'm sure you can find almost all of it on GitHub, because it was open source and other people wanted to be able to edit it or even merely redistribute it... which I'd have said is there right, but I guess not if supposedly that overrides the license on the software? Or like, if this were the case, how would one expect some large/old open source project with a ton of prior contributions--which is normally fine as everyone has the same rights under the license and so you just all mix your code together and are happy: you don't actually need some central organization with ownership until you want to change the license (which is something many people explicitly don't want to ever happen)--to be hosted on GitHub? Even simpler: most of Google's code is open source--such as the Android Open Source Project, or Chromium--but they don't host it officially on GitHub... I guess it isn't OK for anyone to work on this stuff on GitHub either, right?
Yup, and this contrived conundrum is proof positive of a truth: all code is licensed before the ToS is agreed to, whether published or not. Licenses override ToS, which means that Microsoft needs to remove all code on github either due to ToS violations, since their ToS directly contradicts all licenses, or because the licenses contradict their ToS.
What it comes down to is that the Github ToS is illegal.
Absurd is communities like Elm where all identity and package management must be published on Microsoft GitHub or you and your code can't be a part the community repository.
I'm not aware of any Open Source license,or Free license for that matter,that has a give-back clause. Source code is available to -users- ,not prior-authors.
Some Open Source licenses can be used in proprietary code, (MIT, BSD etc) with little more than simple attribution.
Those developers chose that license for a reason, and I've got no problem with commercial entities using that code.
There is a valid argument to be made about the training of models on GPL code. (Argument in the sense that there's two sides to the coin.) On the one hand we happily train humans on GPL code. Those humans can then write their own functions,but for trivial functions they're gonna look a lot like GPL Source.
If the AI is regurgitating GPL code as-is, then that's a problem- not dissimilar to a student or employee regurgitating the same code.
But this argument is about Free software licenses,not really (most?) Open Source licenses.
Either way OSS/Free is not about "giving back",its about giving forward.
In the specific case here of co-pilot making money,I'd say A) you're allowed to make money from Free/OSS code. B) no one is forcing you to use this feature.
In essence, copyleft licenses are exactly that. They oblige the author of a derived work to publish the changes to all users under the same terms. The original authors tend to be users. So, a license which would grant this directly to the original authors would end up providing the same end result since the original authors would be both allowed to and reasonably expected to distribute the derived work to their users as well.
This aligns with the reason why some people publish their work under copyleft licenses: You get my work, for free, and the deal is that if you find and fix bugs then I get to benefit from those fixes by them flowing back to me. Obviously as long as you only use them privately you are not obliged to anything, the copyleft author gives you that option, but once you publish any of this, we all share the results.
That's the spirit here and trying to argue around that with technicalities is disingenuous. That's what Copilot does since it ignores this deal.
not really.
All this Free Software movement started by something really similar to "right to repair", a firmware bug in a printer that was proprietary software. Free Software is about being in control of software you use. The spirit was never "contribute back to GNU", the spirit was always "if you take GNU software, you can't make it non-free". Those GNU devs at the time just wanted a good and actually free/libre OS, that would remain free no matter who distributed it.
You are using expectations of modern day devs in the world of a lot of social development thanks to Github.
You might claim that GP was using the technicalities of the licences, but you can actually check the whole FSF philosophy an you note that they align perfectly with "giving forward" not "giving back".
Free Software is about user's freedom. Not dev rights, or politeness, etc. Now obviously, some devs picked up copyleft licenses with the purpose of improving their own software from downstream changes (Linus states that is the reason he picked GPL), but that's a nice side effect, not the purpose. Which ofc, with popular social sharing platforms like github, those things gets confused.
Speaking generally, I'm not sure that one can claim
>> The original authors tend to be the users
There are endless forks of say emacs,and I expect RMS is not s user of any of them.
Of course RMS is free to inspect the code for all of them, separate out bug fixes from features, and retro apply it to his build. But I'm not seeing anything in any license that requires a fork to "push" bug fixes to him.
>> This aligns with the reason why some people publish their work under copyleft licenses: You get my work, for free, and the deal is that if you find and fix bugs then I get to benefit from those fixes by them flowing back to me.
I think you are reading terms into the license that simply don't exist. I agree a lot of programmers -believe- this is how Free Software works, and many do push bug fixes upstream, but that's orthogonal to Free Software principles, and outside the terms of the license.
>> That's the spirit here and trying to argue around that with technicalities is disingenuous.
Licenses are matters of law, not spirit. The original post is about this "spirit". My thesis is that he, and you, are inferring responsibilities that are simply not in the license. This isn't a technicality,it goes to the very heart of Free Software.
You can try to fork it into something workable, but that can sometimes literally mean trying to figure out what the actual deployment process is and what weird tweaks were done to the deploying device beforehand. In addition, forking those projects is also unworkable if the original has pretty much enterprise-speed development. At best you get a fork that's years out of date where the maintainer is nitpicking every PR and is burnt out enough to not make it worthwhile to merge upstream patches. At worst, you get something like Iceweasel[0] where someone just releases patches rather than a full fork (and having done that a few times, it's a pain in the neck to maintain those patches).
FOSS isn't at all inherently community-minded; it can be and can facilitate it, but it can also be used as a way to get cheap cred from the people who are naïeve enough to believe the former is the only place it applies.
[0]: "Fork" of Firefox LTS by the GNU Foundation to strip out trademarked names and logo's. It's probably one of their silliest projects in term of relevancy.
I might be in the wrong, but this is not how I understand GPL [0]. Care to correct me if I'm wrong.
What I get from the license is that you have to share the code with the users of your program, not anyone else.
AFAIK you could do an Emacs fork and ask money for it. Not only that but the source code only needs to be available to the recipients of the software, not anyone else.
A company could have an upgraded version of a GPL tool and not share it with anyone outside the company. Theoretically employees might share the code outside, but I doubt they'd dared.
[0] https://www.gnu.org/software/emacs/manual/html_node/emacs/Co...
First, I am not a lawyer, but don't licenses exist precisely for their technicalities? This is not like a law on the books in which case we can consider the "Letter and Spirit of the law" because we know in what context in which it was written in/for. With a written license however, someone chooses to adopt a license and accepts those terms from an authorship point-of-view.
I think we're in a new enough situation that we can look beyond what's legal in a license. When many of us started working on open source projects, AI was a far-off concept. Speaking for myself, I thought we'd see steady improvement in code-completion tools, but I didn't think I'd see anything like GPT-4 in my lifetime.
Licenses were written for humans working with code. We can talk about corporations as well, but when I've thought about corporations in the past, I thought about people working on code at corporations. The idea of an AI using my open source project to generate working code for someone or some corporation feels...different.
Yes, I'm talking explicitly about feelings. I know my feelings don't impact the legalities of a license. But feelings are worth talking about, especially as we're all finding the boundaries of new tools and new ways of working.
I don't agree with everything in the post, but I think this is a great conversation to be having.
They don't impact the current legality of a licence, but it will affect future ones.
GPL/BSD/Apache/proprietary, they are all picked for ideological concerns which all stem from feelings. It is good to discuss these things, and it is good to recognise that these are emotionally driven.
Free software has a principle of "freedom to run, to do whatever you wish" (freedom 0), so arguably has said that training AI is OK. (We could quibble over the word Run, but the Gnu.org,and RMS clearly say "freedom 0 does not restrict how you use it."
GPL code can be used by the military to develop nuclear weapons. Given that the is a guiding principle of the FSF its hard to argue that the current usage is not OK.
This may seem a bit nitpicky and philosophical, but anyway: these feelings you mention are about things, and these things the feelings come from are what is most important. Feelings are never standalone, if they are they are just moods which are so personal its hard to have a conversation over.
Let's call 'the things' values. I'd say feelings are perceptions of values, and as such they invariably have a conceptual element to them. And exactly that conceptual aspect makes them suitable for conversation and sometimes even debate, insofar as they can be incorrect. We can acknowledge the subjective, emotive aspect of feelings as highly and inalienably personal, respect the individual opinion behind them and contest the implicit truth-claims all at the same time.
§5.c of the GPL Version 3 states explicitly:
> You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it.
As in, all modifications must be made available. Is that not meeting your definition of giving back? GPL (all variants) is one of the most widely distributed of the free software licenses and has an explicit "give back" clause as far as I can see it. -- and is part of why some people referred to GPL as a "cancer".
FWIW the issue I've come to have with copilot is that you're not explicitly permitted to use the suggestions for anything other than inspiration (as per their terms), there is no license given to use the code that is generated. You do so at your own risk.
Available to all users. Not previous authors. There may be overlap, or there may not be overlap.
Plus, I would say it's giving forward, not back. If there are public users then the original authors can become users and get the code. But there will be bug fixes and features smooshed together.
Which is why i posit that there's no "give back" concept in the license. Only "give forward".
Whether you were trawling Usenet or just a presence in your local BBS scene, "teach it forward" was always a core concept. Still is. It's difficult to pay back to the person who taught you something valuable, so you can instead pay it forward by teaching the lessons - along with your own additions - to the later newcomers.
Of course the Eternal September changed the landscape. And now we can't have nice things.
0: https://en.wikipedia.org/wiki/Etiquette_in_technology#Netiqu...
In Uni, students run their theses through plagiarism checkers, even if it's novel research as it naturally occurs.
As the thought experiment goes, given infinity, a monkey with a typewriter will inevitably write Shakespeares works.
The busybox authors disagree: https://busybox.net/license.html
If it's not a hard derivation, then it's difficult to prove or even notice.
Taken to its logical conclusion, you could add the notion of Free Humans are legally bound to only produce Free Ideas. One could imagine this functioning like sort of monastic vow of charity or chastity. "Vow of silence on producing anything which is not Free (as in freedom)."
Would you take such a vow if offered 100,000 USD/year for the rest of your life (adjusted for inflation)? I would.
The problem is that the Copilot project doesn't claim to be abiding by the license(s) of the ingested code. The reply to licensing concerns was that licensing doesn't apply to their use. So unfortunately they would just claim they could ignore your hypothetical Free³ license as well.
[0]: https://news.ycombinator.com/item?id=34277352
> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Does GPT spit out the copyright notice when it regurgitates my code?
Not "if". We know it does.
And since it doesn't show citations, it might be the case that you use and mistakenly end up making your entire software GPL, because of including copy pasted GPL software.
If you want to get a better picture of the situation, read up on the licenses and what they do, specifically the term "copyleft".
No! That's a gross misrepresentation of what open sourcing is. It's the offer of a deal. You publish the source code and in return for looking at it and using it for sth, I have obligations. Like attribution and licensing requirements regarding derived works.
You can read licensed code, learn from it, and then write your own code derived from that learning, without having committed a copyright violation.
You can also read licensed code, directly copy paste it into your codebase, and still not have committed a copyright violation, as long as you did so in a way that constituted fair use (which copy-pasting snippets certainly would).
There’s no copyright issue here at all, and rationally speaking there aren’t any legitimate misuse of open source concerns either. If these people were honest they’d just admit to feeling threatened by AI, but nobody would care about that, so they just try to manufacture some fake moral panic.
GPL, for instance, merely states that distributed sources or patches "based on" the program should be "conveyed" under the same terms. In other words, anyone who gets their hands on it will do so under the same license.
If anything, I would be worried that GitHub trained itself on publicly-available but not clearly licensed code, because then it would have no license to "use" it in any way[0]. GPL provides such a right, so there is no problem there. It would be even more worrying if the not clearly licensed code was in a private repository but I think I remember reading that private repositories were not included in the training data.
However, would you consider a black box program, of which the output can consistently produce verbatim or at the very least slightly modified copies of code from GPL code to be transformative? The problem does not lie in how the code is distributed but in how transformative the distributed code is. Not only does the same apply to any program besides AI-powered software, it applies to humans[1].
Given how unpredictable the output of an AI is, one should not be able to train itself on GPL code if it cannot reliably guarantee it will not produce infringing code.
[0]: https://docs.github.com/en/site-policy/github-terms/github-t... (https://archive.ph/susi0#4-license-grant-to-us)
[1]: One such example would be how Microsoft employees allegedly prevented themselves from reading refterm source code, cf. https://github.com/microsoft/terminal/issues/10462#issuecomm...
If we were somehow able to prevent AI models from ingesting a codebase, that would mean everyone else who wants to produce similar code would have to re-invent the wheel, wasting their time repeating work that has already been done.
All because... the person who did it first wants attribution? They want their name to be included in some credits.txt file that nobody will ever read? That's ridiculous.
Licensing operates on a continuum of permissiveness. They can only relax the restrictions that you as a creator are given by default. You can't write a copyright license that adds them. You could write a legal instrument that compels and prohibits certain behavior(s), but at that point you're talking about a contract. (And there's no way to coerce anyone to agree with the contract.)
Harry Potter has even more restrictions than the GPL or any other open source license. It's "All Rights Reserved—it enjoys the maximum protections that a work can. And yet it would still be possible feed it into an AI model, even if all of Rowling, Bloomsbury, and Scholastic didn't want you to. They don't get a say in that. Nor do open source software developers in their works which selectively abandon some of the protections that Rowling reserves for herself and her business partners.
The only real viable path to achieve this using an IP license alone would be a React PATENTS-like termination clause—if your company engages in any kind of AI training that uses this project as an input, then your license to make copies (including distributing modified copies) under the ordinary terms are revoked, along with revoking permission for a huge swathe of other free/open source software owned by a bunch of other signatories, too. This is, of course, contingent upon the ability to freely copy and modify a given set of works being appealing enough to get people to abstain from the lure of building AI models and offering services based on them.
You're right. It's a politeness law some people have invented.
It's also a value people have, but that's for themselves. I like contributing to OSS projects. But, as soon as it's imposed on others, and there are punishments for disobeying, it's a politeness law.
Exactly. People are getting mad that Microsoft is making good money while the people who made all that free software available mostly did it for free (like in no money and no recognition). It can sound unfair but that's the deal. If you didn't want people or AI to learn from your code, open source was not the right option.
There's nothing wrong with other people using - learning and creating derivative works of - one's open-source code, provided they respect the terms of the license. It seems to me that the real issue is the fact that these licenses don't have enough teeth.
All software licences are based on copyright, same as writing, art, music, etc. Some software licences are permissive. Some writing is permissive (e.g. Cory Doctorow). Some music is permissive (e.g. Amanda Palmer). It entirely depends on what the author wants. The fact that more software is permissive is a good thing, right?
I entirely agree that there are ethical problems with training AI on copyrighted training data. But please let's not start gatekeeping this. We need to have a serious discussion as a culture about it, and saying "you're way down the list of victims" isn't helping.
What I disagree with is the idea, that they should therefore not complain, or that there could not be an AI system, that does not code laundering, but keeps licenses in place and does this ethically and in an honest way. Adding "ethically" and "honest way", because I am sure that companies will try to find a way around being honest, if they ever are forced to add back the licenses.
In fact, artists might not be the group, that grasps the impact of training on that corpus as quickly as the dev communities. Perhaps it is exactly the devs, who need to complain loudest and first, to have a signal effect.
Well I guess you know why you may be hated for this already. For anyone who has surf HN since ~2010 would know or should notice the definition of open source has changed over the past 10-15 years. Giving Back and Communities are the two predominant Open Source ideals now. Along with making lots money on top of OSS code being somewhat a contentious issue to say the least.
But I want to side step the idealistic issue and think this is more of an economic issue. Where this could be attributed as a zero interest rate phenomenon. You now have developers ( especially those from US ) for most if not all of their professional life living under the money / investment were easy, comparatively speaking. And they should give back when money ( or should I say cash flow ) isn’t an issue. When $200K Total Comp were suppose to be the norm for fresh grad joining Google. And management thinking $500K is barely enough they need to work their way to $1M, while seniors developer believes if Junior were worth $200K than they are asking for $1M total comp is perfectly sane, or some other extreme where everyone in the company should earn exactly the same.
If Twitter or social media were any indication you see a lot of these ideals were completely gone from the conversation. Although this somehow started before the layoffs.
It is somewhat interesting to see the sociologic and idealogical changes with respect to economics changes. But then again, economics in itself is perhaps the largest field psychology study.
Leaving GitHub wont change that, OpenAI is training its models on every bit of code they can have, sourcehut, codeberg etc. If its public, they will train on it.
Also from my experience of trying to leave GitHub, you just end up having a couple of projects on your alternative platform, and everything else on GitHub. You are still active on GitHub, probably even more than your new alternative.
And if you want to build a community, you will quickly find out that the majority want to stick to GitHub, and leaving it can kill your projects chances of getting contributions.
Personally if the courts decide its fair use, that's it, I'm going back, its the best got platform out there, gitlab doesn't even compare in free features. However I have been eyeing Gitea and Gitea Actions, with it Codeberg could become a realistic choice for me.
To end it, here is a Hot take, I really hate Sourcehut.
it hard to use, the ui is .. Not great and trying to browse issues or latest commits is a nightmare.
Every time a project uses it, its a pain to deal with.
> And if you want to build a community, you will quickly find out that the majority want to stick to GitHub, and leaving it can kill your projects chances of getting contributions.
That's a defeatist attitude and a self-fulfilling prophecy at the same time. As more and more people leave GitHub (hopefully not to go to the same alternative), it becomes less and less of a must-have. The reason these things are somewhat true today is because of the network effect, and it's precisely that effect which we must actively attempt to squash by leaving.
It's why Facebook is still on top even though everyone hated it for a while; YouTube is the *only video platform, etc.
Not every bit of code, they are respecting proprietary licenses.
When MS puts the code for Windows, Office, Azure and everything else in front of ChatGPT, Copilot, whatever other AI learning model they have, then perhaps they have a leg to stand on.
Otherwise, they're just being hypocritical to claim that no injury is being done by using code for training, because they are refusing to train on any of their code.
Right now it just looks like they are ripping off open source licenses without meeting the terms of the license.
https://www.lelanthran.com/chap7/content.html
You could make a similar argument for not training on GPL code, but it's a lot easier to programmatically determine whether or not code is public than it is to programmatically determine what it's licensed under, particularly when you're training on massive amounts of unlabeled data. Not to mention it's way easier to delete an accidentally-added snippet of GPL code from a codebase than it is to "unleak" company secrets after they've been publicly revealed.
Sorry, but I consider that a plus.
One of the primary problems with GitHub right now is the "drive by" nature. Everybody is on Github because a bunch of idiotic big corporations made "community contribution" part of their annual review processes so we now have a bunch of people who shouldn't be on GitHub throwing things around on there.
Putting just a touch of friction into the comment/contribute cycle is a good thing. The people who contribute then have to want to contribute.
That's without talking about nice to have features like GitHub Sponsors, the for you tab, the (arguably) more popular UI layout, It's simply a better platform for Open source projects
What features is GitLab missing? I don't know, I'm curious.
And faceless entities use their hard work for who knows what, but mostly to fatten up their already oversized corp and give back NOTHING.
And people, seemingly without common sense suck up to companies that rob them, and even disseminate their shiny new "free" tools.
This would be a Hugo-Nebula award winner novel if it wouldn't be reality.
Is it? I can't think of a single professional dev making money right now that isn't making money because they did not have to reinvent the entire tech stack that they are skilled in.
If there was no open source, we'd all be making a lot less, and the state of tech would be far far smaller than it is right now.
There's a wide variety of people in the open source community at large. And a wide variety of motivations for contributing. I for one am happy that open source software is a thing. It's been a net good for mankind. Sure, there are abuses, and I'm sure many things could be improved. But I'm glad it's there all the same.
FWIW, I keep thinking about some kind of dual licensing, FOSS and something-something-royalties. (Sorry, IANAL, so haven't gotten any further.)
By a lot of measures many humans perform at just about the same level, including confidently making up bullshit.
This post reads like one of the "Goodbye X online video game" posts. I'll cut them some slack because this is their blog they're venting on and was likely posted here by someone else and not themselves doing some attention seeking, but meh.
Which is a pretty dumb position imo. Not that I personally think these newer LLMs are a stochastic parrot, or at least not to the degree proponents of the Stochastic Parrot argument would have you believe.
I think we’re now way past that now with LLMs now quickly taking on the role of a general reasoning engine.
And this right here is why it's important to emphasize the "stochastic parrot" fact. Because people think this is true and are making decisions based on this misunderstanding.
No we're not, and no they are not.
An LLM doesn't reason, period. It mimics reasoning ability by stochastically chosing a sequence of tokens. Alot of the time these make sense. At other times, they don't make any sense. I recently asked an LLM:
It answered correctly that Mike leaves first. Then I asked: And the answer was that Mike still leaves first, because he leaves at the 2nd floor, and that's the first floor the elevator reaches. Another time I asked an LLM how many footballs fit in a coffe-mug, and the conversation reached a point where the AI tried to convince me, that coffe-mugs are only slightly smaller than the trunk of a car.Yes, they can also produce the correct answers to both these questions, but the fact that they can also spew such complete illogical nonsense shows that they are not "reasoning" about things. They complete sequences, that's it, period, that's literally the only thing a language model can do.
Their apparent emergent abilities look like reasoning, in the same way as Jen from "The IT crowd" can sound like shes speaking Italian, when in fact she has no idea what she is even saying.
Guns are also useful tools because you can take them into a store and get things for free as a result. But that doesn't make okay to do.
The current model basically says that as soon as you publish something, others can pretty much do with it as they please under the disguise of "fair use", an aggressive ToS, the like.
I stand by the author that the current model is parasitic. You take the sum of human-produced labor, knowledge and intelligence without permission or compensation, centralize this with tech about 2 companies have or can afford, and then monetize it. Worse, in a way that never even attributes or refers to the original content.
Half-quitting Github will not do anything, instead we need legal reform in this age of AI.
We need training permission control as none of today's licenses were designed with AI in mind. The default should be no permission where authors can opt-in per account and/or per piece of content. No content platform's ToS should be able to override this permission with a catch-all clause, it should be truly free consent.
Ideally, we'd include monetization options where conditional consent is given based on revenue sharing. I realize that this is a less practical idea as there's still no simple internet payment infrastructure, AI companies likely will have enough non-paid content to train, plus it doesn't solve the problem of them having deep pockets to afford such content, thus they keep their centralization benefits. The more likely outcome is that content producers increasingly withdraw into closed paid platforms as the open web is just too damn hostile.
I find none of this to be anti-AI, it's pro-human and pro-creator.
If that is made mandatory, only then can these lists actually be checked against licenses.
There will also need to be a trial license, to establish whether an AI learning model can be considered derived from a licensed open source project - and therefore whether it falls under the license.
And finally, we'll likely get updated versions of the various OSS licenses that include a specific statement on e.g. usage within AI / machine learning.
>The more likely outcome is that content producers increasingly withdraw into closed paid platforms
Nah. You didn't get paid to write that post, did you? You did it for free. People nowadays are perfectly willing to create free content, and often high quality content, sometimes anonymously, even before generative AI.
There's no need for financial incentives anymore. As content creation becomes easier, people will start creating out of intrinsic motivation - to express themselves, to influence others and to inform. It's better that way.
Restricting content so that others can't benefit from it is not pro-human or pro-creator, it's selfish and wasteful. We should get rid of licenses altogether and feed everything humanity creates into a common AI model that is available for use by everyone.
The entire "Github doesn't give back" argument is wrong. For "free", Github lets me host our code, run thousands and thousands of hours of free CI (and we are aggressively using it), host releases and docker images, and lets us manage thousands of issues. Also, Copilot is free when you are eligible to it, so we are fortunate enough to not have to pay for it as well.
Yes, they monetize our attention and train Copilot with the code, but the only argument which can't be used against this company is that they don't give back.
[1]: https://github.com/monicahq/monica
"If you train an AI on this code, you must release the source code and generated neural net of that AI as open source" or something to that effect.
It won't stop it, but it will slow it down, and it seems like the right T&Cs to put on training against GPL code because it gives an advantage to open source AIs, however minor.
Currently GPL says:
> To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work.
> A "covered work" means either the unmodified Program or a work based
If in addition it would say something like "Generative AI models trained on the program source code as well as the text produced with such models is also a work "based on" the Program", then there will be little room for a fair use claim, I think.
Edit: I googled "fair use copyright US" and have now decided that US copyright law is stupid.
But: NOOOO.
In order to close off this possibility, which would restrict Copilot revenue, they instead would roll out a single undifferentiated product and with lots of "gee whiz!" and associated hooplah, and be sure to offer it for free for a while to suck everyone in and head off criticism.
The real rub will be the first court precedent on whether GPTs infringe on source data IP.
Could see it going either way: fundamentally transformative or not.
What it comes down to is that the Github ToS is illegal.
wait, what?
do you have more details on this? what rights does the ToS claim that would violate an existing license?
Then if I read that and built my own understanding
Then if I used that knowledge to implement my own version of windows that was compatible with Microsoft’s and distributed it under my own license
Would that be legal?
WINE etc are built in clean room environments for good reasons.