Is this viable?
no
for many reasons
Would we use the same word if two different humans wrote code that solved two different problems, but one part of each problem was somewhat analogous to a different aspect of a third human's problem, and the third human took inspiration from those parts of both solutions to create code that solved a third problem?
What if it were ten different humans writing ten different-but-related pieces of code, and an eleventh human piecing them together? What if it were 1,000 different humans?
I think "plagiarism", "inspiration", and just "learning from" fall on some continuous spectrum. There are clear differences when you zoom out, but they are in degree, and it's hard to set a hard boundary. The key is just to make sure we have laws and norms that provide sufficient incentive for new ideas to continue to be created.
What if it was just a single person? I take it you didn't read any of the code in the ocaml vibe pr that was posted a bit ago? The one where Claude copied non just implementation specifics, but even the copyright headers from a named, specific person.
It's clear that you can have no idea if the magic black box is copying from a single source, or from many.
So your comment boils down to; plagiarism is fine as long as I don't have to think about it. Are you really arguing that's ok?
1. A patch is self-contained and applies to a codebase you have just as much access to as the author. A paper, on the other hand, is just the tip of the iceberg of research work, especially if there is some experiment or data collection involved. The reviewer does not have access to, say, videos of how the data was collected (and even if they did, they don't have the time to review all of that material).
2. The software is also self-contained. That's "prodcution". But a scientific paper does not necessarily aim to represent scientific consensus, but a finding by a particular team of researchers. If a paper's conclusions are wrong, it's expected that it will be refuted by another paper.
Given the repeatability crisis I keep reading about, maybe something should change?
> 2. The software is also self-contained. That's "prodcution". But a scientific paper does not necessarily aim to represent scientific consensus, but a finding by a particular team of researchers. If a paper's conclusions are wrong, it's expected that it will be refuted by another paper.
This is a much, MUCH stronger point. I would have lead with this because the contrast between this assertion, and my comparison to prod is night and day. The rules for prod are different from the rules of scientific consensus. I regret losing sight of that.
As a PR reviewer I frequently pull down the code and run it. Especially if I'm suggesting changes because I want to make sure my suggestion is correct.
Do other PR reviewers not do this?
Some do, many, (like peer reviewers), are unable to consider the consequences of their negligence.
But it's always a welcome reminder that some people care about doing good work. That's easy to forget browsing HN, so I appreciate the reminder :)
breaking the analogy beyond the point where it is useful by introducing non-generalising specifics is not a useful argument. Otherwise I can counter your more specific non-generalising analogy by introducing little green aliens sabotaging your imaginary CI with the same ease and effect.
But I agree, because I'd rather discuss the pragmatics and not bicker over the semantics about an analogy.
Introducing a token error, is different from plagiarism, no? Someone wrote code that can't compile, is different from someone "stealing" proprietary code from some company, and contributing it to some FOSS repo?
In order to assume good faith, you also need to assume the author is the origin. But that's clearly not the case. The origin is from somewhere else, and the author that put their name on the paper didn't verify it, and didn't credit it.
sufficiently advance some competences indistinguishable from actual malice.... and thus should be treated the same
Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.
So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.
No it's not. I think you're trying to make a different point, because you're using an example of a specific deliberate malicious way to hide a token error that prevents compilation, but is visually similar.
> and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.
What weird world are you living in where you don't have CI. Also, it's pretty common I'll test code locally when reviewing something more complex, more complex, or more important, if I don't have CI.
> Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.
I don't, because it won't compile. Not because I assume good faith. References and citations are similar to introducing dependencies. We're talking about completely fabricated deps. e.g. This engineer went on npm and grabbed the first package that said left-pad but it's actually a crypto miner. We're not talking about a citation missing a page number, or publication year. We're talking about something that's completely incorrect, being represented as relevant.
> So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.
I would never miss this, because the important thing is code needs to compile. If it doesn't compile, it doesn't reach the master branch. Peer review of a paper doesn't have CI, I'm aware, but it's also not vulnerable to syntax errors like that. A paper with a fake semicolon isn't meaningfully different, so this analogy doesn't map to the fraud I'm commenting on.
No.
Modern peer review is “how can I do minimum possible work so I can write ‘ICLR Reviewer 2025’ on my personal website”
I don't know, I still think this describes most of the reviews I've seen
I just hope most devs that do this know better than to admit to it.
This is less voice dictation software, and much more a shim to [popular LLM provider]