Readit News logoReadit News
steelbrain · 4 years ago
In case you don't get it, view the source :)

It's just a bunch of a sleep(random()) and visual changes on viewport and you download the exact file you uploaded

stavros · 4 years ago
I suspect they train the network on a sample size of one and then ask it to generate something, or at least that's the idea. The project seems like a fairly obvious rag on Copilot.
tyingq · 4 years ago
Click the "Careers" link in the top right.
cblconfederate · 4 years ago
The page code was written by co-pilot
londons_explore · 4 years ago
I tried to play the original and downloaded music a bunch of times to try to figure out any differences...
sellyme · 4 years ago
I've seen a lot of people ragging on Copilot for "copy+pasting" code - does anyone have links to cases where it has done this without the user intentionally trying to generate a specific (extremely famous) code snippet?

I've seen tons of comments here and on Reddit that talk about multiple instances of entire functions being copied verbatim, but the only thing even remotely close to that I've seen is the fast inverse square root, so I must have missed a few tweets or something.

tyingq · 4 years ago
Is it clear how many people have access? If access is fairly limited, I'm not surprised at the low number of examples.

Here's an example from their own docs:

Compare:

https://docs.github.com/assets/images/help/copilot/example_r...

To: https://github.com/nilavghosh/OMSCS/blob/master/AI4Robotics-...

sellyme · 4 years ago
> Is it clear how many people have access? If access is fairly limited, I'm not surprised at the low number of examples.

I wouldn't be too surprised either, what surprised me was the large number of people who explicitly said words like "many examples" and then couldn't link more than one.

> Here's an example from their own docs:

Thank you! This is definitely a much stronger example than the Quake one. Looking at the git repository it seems like this is generic startup code for a university assignment, so it probably shows up a significant number of times in the training data.

While it definitely makes sense to interpret a piece of code showing up several hundred times in the exact same format as being okay to straight-up copy+paste (e.g., imports, some boilerplate in more verbose languages, keyboard input switch statements), this does seem to highlight that Copilot can't distinguish between code snippets that are always identical because that's just the correct way to do it, and code snippets that are always identical because only one person did that thing, and it just happens to be in hundreds of repositories for one reason or another.

geekraver · 4 years ago
I wonder how many of these verbatim examples are because the training data code itself occurs multiple times in GitHub because it in turn was copied from an upvoted answer in StackOverflow, LOL.
meibo · 4 years ago
It only seems to if you give it no or very little "source" input, like an empty file with a comment that says "// X algorithm".

There's been a lot of bikeshedding on this, but GitHub decidedly hasn't given enough information on how it works and what the training dataset is, and the fair use question definitely needs to be answered, maybe even in court - it's just a matter of time.

MadVikingGod · 4 years ago
What I think is interesting and not talked about in the copilot cases is there is actually two different copyright actions that come from using it.

The first is if GitHub had a license to distribute the code they have learned on. This is assuming that the code it produces is a derivative, which I don't think would be much of a stretch. For this I think most of the licenses that are used, GPL MIT etc, allow for someone to distribute in such a way.

The second is the user of copilot. They would be getting and distributing code where they wouldn't know the original license, but that wouldn't be a viable defense of infringement. To actually comply with most licenses they would have to follow the requirements.

In both cases I don't know if a fair use would really apply. Maybe GitHub could stretch the research aspect, but if you just use the code in your product there is no fair use.

Hamuko · 4 years ago
>what the training dataset is

All non-private repositories on GitHub.

uberswe · 4 years ago
I have access and hints for entire functions have only appeared from code already in the same file. It uses the file you are in for context.

It also says on their website that the AI may generate api keys that look real but it’s actually just a “fake” key that the AI generated as placeholder.

I have had access for one day, spent my entire Saturday playing with while working on an addon for a game. I find it useful and most of the hints come from other code I have in the same files which saves me time or let’s me know when I’m too repetitive :)

tyingq · 4 years ago
"It also says on their website that the AI may generate api keys that look real but it’s actually just a “fake” key that the AI generated as placeholder."

Maybe this is what you meant, but that's already been shown to be untrue. https://fossbytes.com/github-copilot-generating-functional-a...

dogecoinbase · 4 years ago
It's happily spitting out licenses and copyright notices with other people's names on them, it's pretty clearly half-baked.
sellyme · 4 years ago
While that's obviously a UX flaw that definitely shouldn't have made it to release, I find it hard to envision it ever actually being a problem. If someone's accepting auto-generated copyright notices and licenses that don't actually apply, they can't really point the finger at Github - it's called "Copilot", not "Pilot".

The problem with full snippets of arbitrary obscure repositories being copy+pasted is that there's no realistic way for a user to know if that's happening without putting in more effort than just writing the code themselves, somewhat defeating the point. That's not really case when the first line of the file contains "(C) Someone Else 2003".

dwild · 4 years ago
> I've seen a lot of people ragging on Copilot for "copy+pasting" code - does anyone have links to cases where it has done this without the user intentionally trying to generate a specific (extremely famous) code snippet?

It's not important that the examples are really specific, the issue is that it does happens. This tool has the potential to infringe copyright (sure it does seems legal at some place, doesn't make it more right though).

Look at how EA reversed engineered the Genesis in the past [1]. They had 2 teams, one that did the reversing, and another that did the implementation. That made it safe to say that no infringing code was going through. Plenty of emulators developers try to avoid source code leaks for similar reason. Co-Pilot can't do that.

The fact that the copyright status of the code is hard to determine, doesn't means it's not copyrighted code nonetheless.

Personally I got nothing against this kind of technology, but I do have a pretty big issue with how it learn. Once people published their code on Github, they didn't know it could have been used to do machine learning and it's bad that it is. If Github asked for the copyright over the code to do Co-Pilot, I wouldn't mind.

[1] https://www.youtube.com/watch?v=x0qe1FNqtCo&t=280s

IshKebab · 4 years ago
Github did an analysis and found that it does do it, though very rarely, and usually when it has little context (e.g. at the start of a file). They're working on detecting those cases though so it doesn't happen accidentally, so it is unlikely to be a realistic problem.
lyxell · 4 years ago
I remember that the guys behind The Pirate Bay actually made a service like this back in the days. You would submit a song and get a mashup back of cuts from other songs where each cut would be short enough to fall under fair use. I can't find any references to it online anymore though. Maybe someone else remembers what the service was called.
rikkipitt · 4 years ago
I don't remember the site, but it reminds me of the Girl Talk album called "All Day" – https://en.wikipedia.org/wiki/All_Day_(Girl_Talk_album). It was originally released as a free digital download.

> Greg Gillis composed the album using overlapping samples of 372 songs by other artists.

This article goes into it a bit more: "Girl Talk, Fair Use, and Three Hundred Twenty-Two Reasons for Copyright Reform" – https://jipel.law.nyu.edu/ledger-vol-1-no-1-4-pearl/

gardnr · 4 years ago
Night Ripper is my favourite album by Girl Talk.
rkuykendall-com · 4 years ago
I recommend anybody fascinated by this rabbit hole check out the album "The Grey Album," the documentary "Good Copy, Bad Copy," and the album "1987 (What the Fuck Is Going On?)". Or just the Wikipedia articles for them to start. Amazing world.
stuntkite · 4 years ago
I remember reading an interview with GT where he said he was NOT A DJ because he was a guy with a laptop and a bunch of lawyers.
codetrotter · 4 years ago
Reminds me of this other thing from years ago called “sCrAmBlEd?HaCkZ!”

https://youtu.be/eRlhKaxcKpA

It splits music videos into small portions ahead of time and then later it reassembles them on the fly according to audio input.

brutal_chaos_ · 4 years ago
I remember when that came out, it blew my mind! Did anything public ever materialize from the project?
chrismcb · 4 years ago
"each cut would be short enough to fall under fair yet" huh? That isn't good fair use works. There is no such thing as "short enough" fair use had to do with how you use it, not how much of it you use.
voakbasda · 4 years ago
The whole experiment is a commentary on the need for copyright reform. That makes the usage a clear case of fair use.
lyxell · 4 years ago
This was supposed to be according to Swedish copyright law but you may very well be right.
jjcon · 4 years ago
When did the HN crowd become so defensive of copyright? I understand the concerns on copilot but it’s kinda weirding me out.
aurelian15 · 4 years ago
As weird as it may seem, you should not forget that free software licenses are built upon the fabric of copyright. Without copyright, free software could not exist in its current form. For GPL-like "copyleft" licenses, there would be no way to enforce that binary distributions of derived works are accompanied by their source code. Similarly, in the context of permissive BSD/MIT-style licenses, there would be no way to enforce attribution.

So, given that FOSS---which a large portion of the HN crowd depends on---cannot work without copyright (at least not in its current form), the recent discussions may be less of a surprise.

cortesoft · 4 years ago
Maybe... although I personally think that the GPL and other 'copy left' licenses aren't the reason open source has prospered, nor do I think enforcing attribution really helps the FOSS world that much.

People write and share code because it is useful to do that, not because licenses require them to.

I think FOSS would do fine with no copyright, and in fact more software might end up open source if we had ZERO copyright... why not make your code open source and get back contributions when your code would end up being shared anyway?

throw0101a · 4 years ago
> When did the HN crowd become so defensive of copyright?

Copyright is good in limited quantities. The current multi-decade time horizon is probably what a lot of people are against, and not the concept in general.

And limited time period seems to be consistent through history. From the paper "Copyrights and Creativity: Evidence from Italian Opera in the Napoleonic Age":

> Comparing changes in the creation of new operas across Italian states with and without copyrights, we show that the adoption of basic copyrights encouraged the creation of new work. Moreover, we find that copyrights changed the quality of creative output by encouraging composers to produce more popular and durable works. These results generalize to a broader set of musical compositions and to librettos, as the literary component to the score of operas. Based on these findings, we conclude that the adoption of basic levels of copyright protection – not exceeding the lifetime of the composer – can help to raise both the quantity and the quality of new creative works.

> Importantly, we find that extensions in the length of copyright beyond the composer’s life did not encourage creativity. Performance data reveal that few operas were played after the first 20 years, which suggests that only the most durable creative goods stand to gain from copyright extensions. […]

* https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2505776

ReactiveJelly · 4 years ago
Both the permissive and copyleft licenses are only enforcable through copyright law.

I don't mind that copyright exists, I just wish it was better.

Also there's a power difference between individuals violating the rights of a big company, and a big company violating the rights of many individuals.

If Copilot isn't reined in, it feels like yet another case of "The laws only apply to poor people".

lupire · 4 years ago
What does it mean to "enforce" a "permissive" license?
PaulKeeble · 4 years ago
Because its my (and many of ours) code they have "learnt" from, stripped the license and are intending to sell on. When we listed code under MIT or GPL we meant those licenses, they weren't random and Microsoft just seems to be completely ignoring the reality of reproducing those works which are covered by those licenses, they are making code private and paid for that is open source. Not OK.
breck · 4 years ago
"The heathen are sunk down in the pit that they made: in the net which they hid is their own foot taken"

Copyright is a horrible system. Microsoft has been one of the biggest proponents of that system. But now they've clearly violated it. They should either join in abolishing it, or face its consequences.

michaelmrose · 4 years ago
Consider people's reaction to people selling boot leg DVDs vs torrenting a movie. Although people may consider both morally incorrect the corrupting profit motive results in the former being seen far more negatively. In the current situation there is also the matter that the Microsoft is still perceived rightly I think very negatively and open source authors very positively. Also in a David v Golliath situation nobody wants to be seen rooting for the giant.

Personally I would be concerned about insert corp here accidentally stealing code from an open source project then years later going after the open source project for copyright infringement regarding the code they in fact stole from the open source project.

carom · 4 years ago
I guarantee this is not Microsoft's announcement that they are forfeiting their copyrights. This is just them abusing the spirit of ours.
NiceWayToDoIT · 4 years ago
Probably because when poor people give something for free to other people to lift them out of poverty it is called empowerment, but when billionaires take free work of poor people for their own personal selfish gain it is called - exploit.
hjek · 4 years ago
Say your AGPL code is Copiloted into someone's new program and they decide to release that under a non-free license; that's the issue. We're defensive of copyleft.
clusterfish · 4 years ago
This submission aside, it seems that most people are just concerned about getting sued on copyright grounds for using copilot.
zarzavat · 4 years ago
It's hypocrisy. People will defend entire books and research papers being shared on libgen/scihub, which is unarguably actual copyright infringement on a massive scale, but training an AI on open source code is somehow the worst thing ever even if there's no case law to say that this constitutes infringement at all.
abrokenpipe · 4 years ago
It's not really that hypocritical when you understand peoples perspective on it. In general people (here on HN/OS-community) care about sharing experiences and knowledge. Copyright on opensource content does not inhibit peoples ability to learn from it. Theres also the whole "big corp vs little guy" mentality at play here. If copilot was opensource then I don't think that anyone would have an issue with it, I actually think people would respond well to it if that were the case.
Dylan16807 · 4 years ago
It's not hypocrisy to say that different types of work should have different copyright lengths, and that for some types the answer is zero.

Especially if you remember the phrase "to promote the progress of science and useful arts".

asddubs · 4 years ago
I'm all for copilot if microsoft gets treated the same way libgen/scihub are for creating it. or if we abolish copyright, but the fact that they can just decide to do this and it's fine, but scihub gets DNS-blocked reveals the asymmetry at hand here.
wizzwizz4 · 4 years ago
Wow. And it's entirely client-side, too! Impressive.
er4hn · 4 years ago
Finally, Copilot for Music!
laurent92 · 4 years ago
Strangely, a lookalike of a music hit is nothing like the original, and it’s worth analyzing!

- Music is a vehicle for a common experience. Everyone knows the next notes of some Lady Gaga song. We feel like learning the lyrics will make us able to sing together if we were in a club, and share something with other clubbers. Any AI who would reproduce the voice and instruments would still not make you feel like you are sharing a common moment with the rest of the auditors,

- Hits are hits because we hear them a thousand times. It’s been proven that people don’t necessarily like it the first time. It’s the familiarity with the song which make us like it (or hate it when we’ve heard to too much).

- Even worse: We like some songs even more because we love the author. Be it because they are politically involved, have a cute face, has a nice life story, or seem to hide answers to life in the lyrics of their work - But an AI producing the same exact notes wouldn’t trigger similar affection from us. It’s like hearing our kid singing: Very cute, but we wouldn’t like the same song by another kid. Audiences have a genuine emotional attachment to the authors. It’s especially visible since the MCM revolution: Before MCM, music mattered; Now the image matters way more, bands have a face, a graphic style, a story to tell - and music could be as crap as possible, if we like the band it can still have success. MCM changed music forever, proving that AI can’t replace that feeling.

Can it?

imwillofficial · 4 years ago
I hear a lot of stated assumptions on how certain things trigger emotional investment and other don’t.

If you knew how manufactured the music industry was, and how nothing of what you see of celebrities is true, it might as well be AI plucking our heart strings, because it isn’t “real” in the sense that I think you mean, authentic human connection over shared experience.

Deleted Comment

nonbirithm · 4 years ago
So when are neural nets trained on images or text going to be confronted with the same copyright concerns? At the point that GitHub has forced the issue into the spotlight with Copilot I feel that it's only a matter of time before this reaches the courts. Nobody seemed to care about copyright at the time people were having fun creating AI dream collages or nonexistent anime girls from a model trained on the Danbooru imageset. In the latter case it's not clear that 100% of the original Pixiv and Twitter creators gave their consent to have their work rehosted on a different site in the first place, much less be involved in ML experiments. That data was from 2018.

I'm almost tempted to believe that the people at GitHub knew this was going to blow up as much as it did as some kind of a challenge to the status quo of copyright and licensing, if only so that everyone would start talking about the issue. Why did the GitHub representative plainly state that Copilot was trained on all of GitHub's codebase without seeming to care about the pushback on Twitter and HN that was bound to happen as a result?

jiminymcmoogley · 4 years ago
by the time the dinosaurs that dictate our laws begin to care about it, copyright will no longer exist
ChristianGeek · 4 years ago
Great way for the owner of the site to build up a library of free music!