Ethically Microsoft has about as much claim to be able to use the data for co-pilot as anyone else.
On the other hand, maybe a MSFT v Amazon lawsuit over this could be the wake up call the world needs that maybe we should stop centralising critical infrastructure in the hands of a single company. Which is why I think they wouldn't do it - at most I could see Microsoft tightening request limits on accounts associated with Amazon.
> maybe we should stop centralising critical infrastructure in the hands of a single company
Managing your own on-prem or in-colo infrastructure sucks: it's expensive and a source of risk, which is why we moved things like source servers to a centralized model.
I'm surprised Amazon's legal team signed off on this. It's clearly against the GitHub terms of service[0], and Amazon employees acting on the instructions from Amazon had to approve those terms. It seems pretty much identical to the LinkedIn vs. hiQ scraping case, where as I understand the fake account creation was the key point.
[0] E.g. no API key sharing for the purposes of evading rate limits, only a single free account per person or organization.
When you pay your legal teams as much as Amazons, they probably tell you "Yeah, you'd probably lose any case, but the fine will be a couple of million dollars and you won't have to pay it for a decade, and by then you'd have cemented your market leadership".
Is the cover image itself generated via some ML model? The old guy in the middle is missing substantial parts of his arm. The box right by him also has some artifacting in the corner.
and the guy on the right.. umm.. what's with his face? Or is he an Alien maybe? Image credit goes to https://linkmedya.com - it doesn't say it is AI-generating content but yep, it certainly looks like it
This just rekindled my desire to self-host my git repos. The whole idea that a platform provider can use the IP I host there is obscene. That thieves steal by bounty from each other is not the story.
Separate from the courts, Microsoft could send a message to the AI gold rush field, about "abuse of Microsoft's resources", via ToS:
* All Amazon domain names could be banned from accounts on GitHub, or face annoying restrictions, implemented with trivial technical changes. And lawyers could send a letter to Amazon legal, about how Amazon may and may not use GitHub, including Amazon personnel having to disclose their affiliation (not hide it with GMail), and craft some language about how those employee accounts may and may not be used.
* More harshly, but fear-instilling to individuals throughout industry, the individuals who let their accounts be used for the scraping could be banned from GitHub, for ToS violation. Not only those particular accounts, but any accounts the individuals might use. (This would hurt, not only for genuine open source participation, but also given how open source is sometimes used for job-hunting appearances, and all the current employers that ask for candidate's "GitHub" specifically rather than open source in general.) If banning would have undesired effects of projects GitHub wants to host being pulled, or public reaction as too harsh and questioning why GitHub has so much power, there could instead be annoying restrictions.
> the individuals who let their accounts be used for the scraping could be banned from GitHub, for ToS violation.
That would work, assuming GH doesn’t make mistakes and ban someone else with the same name m. That would then be embarrassing for GH. I can already see news headline “Github banned my account because my name matches that of a web scraping account from Amazon”
The way git works means that you can check that you have an un-doctored clone of a repo just by checking that the commit hash matches. Which in this instance is quite unfortunate, because it would be very funny.
(barring a SHA-1 collision, of course)
EDIT: i suppose another approach could be to invent poisoned repos out of whole cloth and only show them to Amazon, but I susepct that'd be even easier to detect.
>> "In response, Amazon proposed a workaround: encouraging its employees to create multiple GitHub accounts and share their access credentials."
Ah, no, it's git pool.
On the other hand, maybe a MSFT v Amazon lawsuit over this could be the wake up call the world needs that maybe we should stop centralising critical infrastructure in the hands of a single company. Which is why I think they wouldn't do it - at most I could see Microsoft tightening request limits on accounts associated with Amazon.
Managing your own on-prem or in-colo infrastructure sucks: it's expensive and a source of risk, which is why we moved things like source servers to a centralized model.
[0] E.g. no API key sharing for the purposes of evading rate limits, only a single free account per person or organization.
Deleted Comment
* All Amazon domain names could be banned from accounts on GitHub, or face annoying restrictions, implemented with trivial technical changes. And lawyers could send a letter to Amazon legal, about how Amazon may and may not use GitHub, including Amazon personnel having to disclose their affiliation (not hide it with GMail), and craft some language about how those employee accounts may and may not be used.
* More harshly, but fear-instilling to individuals throughout industry, the individuals who let their accounts be used for the scraping could be banned from GitHub, for ToS violation. Not only those particular accounts, but any accounts the individuals might use. (This would hurt, not only for genuine open source participation, but also given how open source is sometimes used for job-hunting appearances, and all the current employers that ask for candidate's "GitHub" specifically rather than open source in general.) If banning would have undesired effects of projects GitHub wants to host being pulled, or public reaction as too harsh and questioning why GitHub has so much power, there could instead be annoying restrictions.
That would work, assuming GH doesn’t make mistakes and ban someone else with the same name m. That would then be embarrassing for GH. I can already see news headline “Github banned my account because my name matches that of a web scraping account from Amazon”
(barring a SHA-1 collision, of course)
EDIT: i suppose another approach could be to invent poisoned repos out of whole cloth and only show them to Amazon, but I susepct that'd be even easier to detect.