Having just completed a reviewer workshop with the top journal in my field, and having reviewed for multiple conferences, I have several points.
1. Don't underestimate how bad human reviewers can be. I've seen really bad reviews before. But the worst were for conferences, not journals.
2. The job of an associate editor is to field the reviews and make decision recommendations to the senior editor. A good associate editor will take care of this stuff, but may let a bad review through for the sake of the process. They might emphasize a particular review to help the author understand what the editor actually thinks is important, as opposed to letting the author think that all reviews are equal. That being said, it's up to the author to respond. If a reviewer is unequivocally wrong about something, the author can explain why they didn't follow the reviewer's recommendations. What the senior editor (and to some extent the associate editor) thinks is what matters, not what the reviewer thinks.
3. If the associate editor is not doing their job of fielding and reviewing the reviews, I question whether the journal is actually a top journal. My impression thus far is that top journals take their editing seriously. So far in grad school, I've met multiple editors from multiple journals, and gone to multiple journal workshops. The amount of work these people pour into doing journal work, many for free, is staggering. The burnout rate is significant accordingly, but the ones who stay keep it up because they want to be serious custodians of their discipline's research authority. It's a massive amount of work. I'm not sure I'd want to do it myself. I can't imagine such people brushing aside bad reviews and not realizing how bad they are. This is partly why good journals also have workshops to teach how to review. It's not easy to become an associate editor either. You need to become respected enough in the community to get nominated by editors and then voted in by editors. They have a standard based on how much they respect researchers.
Now... it's possible that my discipline (information systems) is unique in this manner. Is it possible that the top journals in computer science, physics, or other don't take this seriously? I doubt it?
Keep "AI" out of it. As described, the (suspected-AI) review seemed to only be based on the Abstract (didn't bother reading the rest of the submitted paper), and mentions several papers from irrelevant fields. Politely suggest to the editor that that reviewer was obviously struggling to review a paper well outside his area of expertise, and might best be replaced with a reviewer who is a better fit for the subject matter of your article.
In a lot of contexts, whether someone leaned on an LLM--lightly or heavily--is sort of irrelevant. The output is either good/reasonable or it's not. (Or some gradation between the two.) Any tools they used is beside the point.
For me, it adds a shadow of doubt on the reliability of a peer reviewed journal if one of the "peers" is an LLM during the "it doesn't even know it's lying" AI stage we are currently in.
I read a review of The Singularity is Near by Ray Kurzweil where it was described as seeing a table full of what appears to be very delicious food, but it is then revealed that there is absolutely some amount of dog feces mixed in with some of the dishes. You can't tell which is safe, and which is carefully crafted with dog feces.
An LLM in a peer reviewed journal currently has no place, unless it is part of an experiment where it is trained on the Journal's body of work and then tested for accuracy with future articles. As the tech progresses it may find a place but if it takes twice as long to fact check the LLM output it's saving nobody time and possibly hallucinating in hard to catch ways.
I understand the point you're making, philosophically, but the pragmatist in me says that this practice needs to be discouraged (though an outright ban is probably unenforceable).
If you give busy reviewers an easy "out", where they can just run the paper through an LLM, do a bit of editing then send off the review, people are going to do exactly that.
And the resulting review, with the right editing, might seem perfectly plausible and human-like. But that review isn't going to be able to offer suggestions with insight from recently published papers. It isn't going to be able to point out issues with the data, or with the statistical analysis, or with the paper's logical conclusions.
Maybe someday AI will be capable enough to replace the role of human reviewers. But right now, encouraging this practice is just going to let a lot of bad science slip through to publication without genuine peer review. (even more than the large amount that already does, let's be honest ...)
The purpose of a reviewer is to provide the reviewer's feedback on a paper. If the editor wants to get feedback from an LLM, they are perfectly capable of doing so themselves. There is an attribution chain here that may not be terribly relevant in the short term but in the long term is a big deal.
Historically, we speak of "plagiarism" as being something you do against human text, because human text is all there was. But I would suggest that most of the issues with plagiarism are actually around misattribution, which means that it is perfectly sensible to speak of "plagiarizing" an AI. The AI may not be victimized, but victimization is not the only issue with plagiarism and most or all of the rest of them apply here. It matters over time where the text comes from. Even if the text of the review is high quality, in order to tune the editor's own tracking of reputation they need to know if it is from a human reviewer, GPT-1, GPT-7.5, or NotGPTAtAllSciAI-2026.
This is especially true in this case, because the entire point of a reviewer's review is that they are doing something the editor is not supposed to be doing! If the editor has to do a deep due diligence on all reviews, the reviewer are failing to provide any value as the editor might as well directly review the paper in question. So reputation is not something we can just wave away with "well if it was a good review it doesn't matter"; trust is a huge deal here. The editors need reviews to be properly attributed. Even if they are fine with AI reviews they need to know they are from AIs, and as I said, which AIs.
>In a lot of contexts, whether someone leaned on an LLM--lightly or heavily--is sort of irrelevant
This isn't one of those contexts. it's called peer review for a reason. You don't get to outsource your duty to either a machine or some random person. It's explicitly you others have vested their trust in.
>The output is either good/reasonable or it's not.
In the world of human beings this isn't the only thing that matters. Reminds me of Zizek who pointed out the end result of the "AI revolution" isn't going to be machines acting like humans, but the reverse, humans LARPing as machines. Humans as obtuse as robots, rather than the other way around.
Journals and academia are starting to reward reviewing papers (you can mention your reviews on your resume), so I don't think it's irrelevant. The supposed AI reviewer here is probably polluting journals with dozens of poor quality reviews. This wouldn't be possible without AI help, so that makes it a big problem!
Obviously being LLM generated is a good data point because it shows that the OP isn't arguing against the statements of the review itself.
It's also good for the editor to know about. LLMs represent a new acute threat to review quality that they may currently be underestimating. I've literally heard of people bragging about using ChatGPT instead of doing reviews themselves. People who aren't LLM experts don't necessarily understand their limitations or that using them in this way should be unacceptable. The editors should know so they can improve the communication of review expectations.
That's probably a good keeping-your-head-low strategy, but I can't help but feel this doesn't treat it as being severe as it is. I don't want academia to overreact about LLMs (they have done this enough already, with the huge number of academic cheating accusations) but AI output that is entirely unchecked doesn't belong in the scientific peer review process.
Those using AI tools in such situations should be expected to remove anything from the LLM's output that they can't verify with their own expertise. Reviewing out of your expertise doesn't necessarily inevitably lead to mistakes, but unchecked AI output will.
It's about informing the only person who can get extra information, decide if it's severe and do something about it - the editor.
The author can't (and shouldn't) do anything directly about the anonymous reviewer, all the responsibility, authority and duty is up to the editor, who at least knows who that person is.
Just out of curiosity, are you aware that some lawyers have started submitting LLM-generated content as legal drafts in (US) courts of law?
Last I heard, the one had been censured by the court, but courts generally have no power over law licenses. We might have to wait awhile to find out if there will be any more serious repercussions.
I think that in many professional settings, we might in the near future discover that some large fraction have been "faking it until they make it", but without the "making it" conclusion.
The fun part is when congressional staffers use this for gigantic 10,000 page bills too large for anyone to catch it before the vote. It might already be happening.
Agree that this is a potentially system-damaging problem that will only get worse if not directly dealt with. In this case, I think the advice in the OP is good however: address the feedback from the good review, resubmit, and once the paper is accepted, then contact the editor with concerns once it's clear that you aren't objecting to needing to revise.
In practice - my advice is for the academic, who is trying to get an article published in "one of the well-reputable journals". That is a weak hand to be playing. Vs. the journal's editor is in a far stronger position, to hit back hard at whoever seems to be farming out their review job to a cut-rate bot.
One factor to consider is AI-Augmented content. I'm not an academic reviewer, but I certainly will do some sort of analysis, write up some key bullet points, then ask chatgpt to synthesis some prose for a report. I then make a few tweaks and edits and send it off. The core content is coming from my analysis, not generation, but if I'm being lazy the content ends up having the "default chatgpt style." I could imagine this ends up being common, especially for non-english natives.
I've noticed this style of usage greatly reduces the quality of work my colleagues produce. The computer is great at writing stuff that looks right, but is not.
Also, at least some of the "increased productivity" boils down to "I spent less time thinking about the underlying problem while I was composing text and copy-editing".
this reminds me the time I was 16 (which is almost 20 years ago), having an interest in communication theory, I somehow ended up on a IEEE journal review list on the topics. I received a paper to review from someone in china, and I bullshitted my way through that review thinking that was the start of my academic career.
It was the start of a multi-million dollar lifestyle business and the proximate cause of the reproducibility issues impeding scientific progress right now.
1. Don't underestimate how bad human reviewers can be. I've seen really bad reviews before. But the worst were for conferences, not journals.
2. The job of an associate editor is to field the reviews and make decision recommendations to the senior editor. A good associate editor will take care of this stuff, but may let a bad review through for the sake of the process. They might emphasize a particular review to help the author understand what the editor actually thinks is important, as opposed to letting the author think that all reviews are equal. That being said, it's up to the author to respond. If a reviewer is unequivocally wrong about something, the author can explain why they didn't follow the reviewer's recommendations. What the senior editor (and to some extent the associate editor) thinks is what matters, not what the reviewer thinks.
3. If the associate editor is not doing their job of fielding and reviewing the reviews, I question whether the journal is actually a top journal. My impression thus far is that top journals take their editing seriously. So far in grad school, I've met multiple editors from multiple journals, and gone to multiple journal workshops. The amount of work these people pour into doing journal work, many for free, is staggering. The burnout rate is significant accordingly, but the ones who stay keep it up because they want to be serious custodians of their discipline's research authority. It's a massive amount of work. I'm not sure I'd want to do it myself. I can't imagine such people brushing aside bad reviews and not realizing how bad they are. This is partly why good journals also have workshops to teach how to review. It's not easy to become an associate editor either. You need to become respected enough in the community to get nominated by editors and then voted in by editors. They have a standard based on how much they respect researchers.
Now... it's possible that my discipline (information systems) is unique in this manner. Is it possible that the top journals in computer science, physics, or other don't take this seriously? I doubt it?
Keep "AI" out of it. As described, the (suspected-AI) review seemed to only be based on the Abstract (didn't bother reading the rest of the submitted paper), and mentions several papers from irrelevant fields. Politely suggest to the editor that that reviewer was obviously struggling to review a paper well outside his area of expertise, and might best be replaced with a reviewer who is a better fit for the subject matter of your article.
This is only irrelevant if you place no value on the time of yourself and others.
I read a review of The Singularity is Near by Ray Kurzweil where it was described as seeing a table full of what appears to be very delicious food, but it is then revealed that there is absolutely some amount of dog feces mixed in with some of the dishes. You can't tell which is safe, and which is carefully crafted with dog feces.
An LLM in a peer reviewed journal currently has no place, unless it is part of an experiment where it is trained on the Journal's body of work and then tested for accuracy with future articles. As the tech progresses it may find a place but if it takes twice as long to fact check the LLM output it's saving nobody time and possibly hallucinating in hard to catch ways.
If you give busy reviewers an easy "out", where they can just run the paper through an LLM, do a bit of editing then send off the review, people are going to do exactly that.
And the resulting review, with the right editing, might seem perfectly plausible and human-like. But that review isn't going to be able to offer suggestions with insight from recently published papers. It isn't going to be able to point out issues with the data, or with the statistical analysis, or with the paper's logical conclusions.
Maybe someday AI will be capable enough to replace the role of human reviewers. But right now, encouraging this practice is just going to let a lot of bad science slip through to publication without genuine peer review. (even more than the large amount that already does, let's be honest ...)
Historically, we speak of "plagiarism" as being something you do against human text, because human text is all there was. But I would suggest that most of the issues with plagiarism are actually around misattribution, which means that it is perfectly sensible to speak of "plagiarizing" an AI. The AI may not be victimized, but victimization is not the only issue with plagiarism and most or all of the rest of them apply here. It matters over time where the text comes from. Even if the text of the review is high quality, in order to tune the editor's own tracking of reputation they need to know if it is from a human reviewer, GPT-1, GPT-7.5, or NotGPTAtAllSciAI-2026.
This is especially true in this case, because the entire point of a reviewer's review is that they are doing something the editor is not supposed to be doing! If the editor has to do a deep due diligence on all reviews, the reviewer are failing to provide any value as the editor might as well directly review the paper in question. So reputation is not something we can just wave away with "well if it was a good review it doesn't matter"; trust is a huge deal here. The editors need reviews to be properly attributed. Even if they are fine with AI reviews they need to know they are from AIs, and as I said, which AIs.
This isn't one of those contexts. it's called peer review for a reason. You don't get to outsource your duty to either a machine or some random person. It's explicitly you others have vested their trust in.
>The output is either good/reasonable or it's not.
In the world of human beings this isn't the only thing that matters. Reminds me of Zizek who pointed out the end result of the "AI revolution" isn't going to be machines acting like humans, but the reverse, humans LARPing as machines. Humans as obtuse as robots, rather than the other way around.
It's also good for the editor to know about. LLMs represent a new acute threat to review quality that they may currently be underestimating. I've literally heard of people bragging about using ChatGPT instead of doing reviews themselves. People who aren't LLM experts don't necessarily understand their limitations or that using them in this way should be unacceptable. The editors should know so they can improve the communication of review expectations.
Those using AI tools in such situations should be expected to remove anything from the LLM's output that they can't verify with their own expertise. Reviewing out of your expertise doesn't necessarily inevitably lead to mistakes, but unchecked AI output will.
The author can't (and shouldn't) do anything directly about the anonymous reviewer, all the responsibility, authority and duty is up to the editor, who at least knows who that person is.
Last I heard, the one had been censured by the court, but courts generally have no power over law licenses. We might have to wait awhile to find out if there will be any more serious repercussions.
I think that in many professional settings, we might in the near future discover that some large fraction have been "faking it until they make it", but without the "making it" conclusion.
The fun part is when congressional staffers use this for gigantic 10,000 page bills too large for anyone to catch it before the vote. It might already be happening.
Or for that matter if it had been dictated - and then unchecked?
In practice - my advice is for the academic, who is trying to get an article published in "one of the well-reputable journals". That is a weak hand to be playing. Vs. the journal's editor is in a far stronger position, to hit back hard at whoever seems to be farming out their review job to a cut-rate bot.
Edit: 's/is farming/seems to be farming/'
Also, at least some of the "increased productivity" boils down to "I spent less time thinking about the underlying problem while I was composing text and copy-editing".
That's what they are there for.
Dead Comment
Dead Comment