There is a general problem with rewarding people for the volume of stuff they create, rather than the quality.
If you incentivize researchers to publish papers, individuals will find ways to game the system, meeting the minimum quality bar, while taking the least effort to create the most papers and thereby receive the greatest reward.
Similarly, if you reward content creators based on views, you will get view maximization behaviors. If you reward ad placement based on impressions, you will see gaming for impressions.
Bad metrics or bad rewards cause bad behavior.
We see this over and over because the reward issuers are designing systems to optimize for their upstream metrics.
Put differently, the online world is optimized for algorithms, not humans.
LLMs are tools that make it easier to hack incentives, but you still need a person to decide that they'll use an LLM t do so.
Blaming LLMs is unproductive. They are not going anywhere (especially since open source LLMs are so good.)
If we want to achieve real change, we need to accept that they exist, understand how that changes the scientific landscape and our options to go from here.
What would be the point of blaming LLMs? What would that accomplish? What does it even mean to blame LLMs?
LLMs are not submitting these papers on their own, people are. As far as I'm concerned, whatever blame exists rests on those people and the system that rewards them.
> There is a general problem with rewarding people for the volume of stuff they create, rather than the quality.
If you incentivize researchers to publish papers, individuals will find ways to game the system,
I heard someone say something similar about the “homeless industrial complex” on a podcast recently. I think it was San Francisco that pays NGOs funds for homeless aid based on how many homeless people they serve. So the incentive is to keep as many homeless around as possible, for as long as possible.
It's a metric attribution problem. The real metric should be reduction in homeless, for example (though even that can be gamed through bussing them out, etc-- tactics that unfortunately other cities have adopted). But attributing that to a single NGO is tough.
Ditto for views, etc. Really what you care about as eg; youtube is conversions for the products that are advertised. Not impressions. But there's an attribution problem there.
> rewarding people for the volume ... rather than the quality.
I suspect this is a major part of the appeal of LLMs themselves. They produce lines very fast so it appears as if work is being done fast. But that's very hard to know because number of lines is actually a zero signal in code quality or even a commit. Which it's a bit insane already that we use number of lines and commits as measures in the first place. They're trivial to hack. You even just reward that annoying dude who keeps changing the file so the diff is the entire file and not the 3 lines they edited...
I've been thinking we're living in "Goodhart's Hell". Where metric hacking has become the intent. That we've decided metrics are all that matter and are perfectly aligned with our goals.
But hey, who am I to critique. I'm just a math nerd. I don't run a multi trillion dollar business that lays off tons of workers because the current ones are so productive due to AI that they created one of the largest outages in history of their platform (and you don't even know which of the two I'm referencing!). Maybe when I run a multi trillion dollar business I'll have the right to an opinion about data.
I think you will discover that few organizations use the size or number of edits as a metric of effort. Instead, you might be judged by some measure of productivity (such as resolving issues). Fortunately, language agents are actually useful at coding, when applied judiciously.
> What would a system that rewards people for quality rather than volume look like?
Hiring and tenure review based on a candidate’s selected 5 best papers.
Already standard practice at a few enlightened places, I think. (of course this also probably increases the review workload for top venues)
To a lesser extent, bean-counting metrics like citations and h-index are an attempt to quantify non-volume-based metrics. (for non-academics, h-index is the largest N such that your N-th most cited paper has >= N citations)
Note that most approaches like this have evolved to counter “salami-slicing”, where you divide your work into “minimum publishable units”. LLMs are a different threat - from my selfish point of view, one of the biggest risks is that it takes less time to write a bogus paper with an LLM than it does for a single reviewer to review it. That threatens to upend the entire peer reviewing process.
Everybody "creates content" (like me when I take a picture of beautiful sunset).
There is no such thing as "quality". There is quality for me and quality for you. That is part of the problem, we can't just relate to some external, predefined scale. We (the sum of people) are the approximate, chaotic, inefficient scale.
Be my guest to propose a "perfect system", but - just in case there is no such system - we should make sure each of us "rewards" what we find of quality (being people or content creators), and hope it will prevail. Seemed to have worked so far.
Crazily, I think the easiest way is to remove any and all incentives, awards, finite funding, and allegedly merit-based positions. Allow anyone who wants to research to research. Natural recognition of peers seems to be the only way to my thinking. Of course this relies on a post-scarcity society so short of actually achieving communism we'll likely never see it happen.
That might be the "prize" but the "bar" is most certainly in publish or perisch to work your way up the early academic carreer ladder. Every conference or workshop attendance needs a paper, regardless of wether you had any breakthrough. And early metrics are most often quantity based (at least 4 accepted journal articles), not citation based.
I think many with this opinion actually misunderstand. Slop will not save your scientific career. Really it is not about papers but securing grant funding by writing compelling proposals, and delivering on the research outlined in these proposals.
Ideally that is true. I do see the volume-over-quality phenomenon with some early career folks who are trying to expand their CVs. It varies by subfield though. While grant metrics tend to dominate career progression, paper metrics still exist. Plus, it’s super common in those proposals to want to have a bunch of your own papers to cite to argue that you are an expert in the area. That can also drive excess paper production.
So what they no longer accept is preprints (or rejects…) It’s of course a pretty big deal given that arXiv is all about preprints. And an accepted journal paper presumably cannot be submitted to arXiv anyway unless it’s an open journal.
For position (opinion) or review (summarizing state of art and often laden with opinions on categories and future directions). LLMs would be happy to generate both these because they require zero technical contributions, working code, validated results, etc.
So what? People are experimenting with novel tools for review and publication. These restrictions are dumb, people can just ignore reviews and position papers if they start proving to be less useful, and the good ones will eventually spread through word of mouth, just like arxiv has always worked.
> Technically, no! If you take a look at arXiv’s policies for specific content types you’ll notice that review articles and position papers are not (and have never been) listed as part of the accepted content types.
I suspect that any editorial changes that happened as part of the journal's acceptance process - unless they materially changed the content - would also have to be kept back as they would be part of the presentation of the paper (protected by copyright) rather than the facts of the research.
People have started to use arXiv as some resume-driven blog with white paper decorations. And people start citing these in research papers. Maybe this is a good change.
So we need to create a new website that actually accepts preprints like arXivs original goal from 30 years ago.
I think every project more or less deviates from its original goal given enough time. There are few exceptions in CS like GNU coreutils. cd, ls, pwd, ... they do one thing and do it well very likely for another 50 years.
I don't think being closed vs open is the problem because most of the open access journals will ask for thousands of dollars from authors as publication fees. Which is getting paid to them by public funding. The open access model is actually now a lucrative model for the publishers. And they still don't pay authors or reviewers.
google internally started working on "indexing" patent applications, materials science publications, and new computer science applications, more than 10 years ago. You the consumer / casual are starting to see the services now in a rush to consumer product placement. You must know very well that major mil around the world are racing to "index" comms intel and field data; major finance are racing to "index" transactions and build deeper profiles of many kinds. You as an Internet user are being profiled by a dozen new smaller players. arxiv is one small part of a very large sea change right now
Maybe it's time for a reputation system. E.g. every author publishes a public PGP key along with their work. Not sure about the details but this is about CS, so I'm sure they will figure something out.
I had been kinda hoping for a web-of-trust system to replace peer review. Anyone can endorse an article. You can decide which endorsers you trust, and do some network math to find what you think is reading. With hashes and signatures and all that rot.
Not as gate-keepy as journals and not as anarchic as purely open publishing. Should be cheap, too.
The problem with an endorsement scheme is citation rings, ie groups of people who artificially inflate the perceived value of some line of work by citing each other. This is a problem even now, but it is kept in check by the fact that authors do not usually have any control over who reviews their paper. Indeed, in my area, reviews are double blind, and despite claims that “you can tell who wrote this anyway” research done by several chairs in our SIG suggests that this is very much not the case.
Fundamentally, we want research that offers something new (“what did we learn?”) and presents it in a way that at least plausibly has a chance of becoming generalizable knowledge. You call it gate-keeping, but I call it keeping published science high-quality.
An endorsement system would have to be finer grained than a whole article. Mark specific sections that you agree or disagree with, along with comments.
I think it’s lovely that at the time of my reply, everyone seems to be taking your comment at face value instead of for the meta-commentary on “people upvoting content” you’re making by comparing HN karma to endorsement of papers via PGP signatures.
Ignoring the actual proposal or user, just looking at karma is probably a pretty terrible metric.
High karma accounts tend to just interact more frequently, for long periods of time.
Often with less nuanced takes, that just play into what is likely to be popular within a thread.
Having a Userscript that just places the karma and comment count next to a username is pretty eye opening.
I would be much happer if you explained your _reasons_ for disagreeing or your _reasons_ for agreeing.
I don't think publishing a PGP key with your work does anything. There's no problem identifying the author of the work. The problem is identifying _untrustworthy_ authors. Especially in the face of many other participants in the system claiming the work is trusted.
As I understand it, the current system (in some fields) is essentially to set up a bunch of sockpuppet accounts to cite the main account and publish (useless) derivative works using the ideas from the main account. Someone attempting to use existing reasearch for it's intended purpose has no idea that the whole method is garbage / flawed / not reproducible.
If you can only trust what you, yourself verify, then the publications aren't nearly as useful and it is hard to "stand on the shoulders of giants" to make progress.
Maybe there should be some type of strike rules. Say 3 bad articles from any institution and they get 10 year ban. Whatever their prestige or monetary value is. You let people under your name to release bad articles you are out for a while.
Treat everyone equally. After 10 years of only quality you get chance to get back. Before that though luck.
Maybe arXiv could keep the free preprints but offer a service on top. Humans, experts in the field, would review submissions, and arXiv would curate and publish the high quality ones, and offer access to these via a subscription or fee per paper....
They've also been putting their names on their grad students' work for eternity as well. It's not like the person whose name is at the top actually writes the paper.
it's clearly not sutainable to have the main website hosting CS articles not having any reviews or restrictions. (Except for the initial invite system)
There were 26k submission in october: https://arxiv.org/stats/monthly_submissions
Asking for a small amount of money would probably help.
Issue with requiring peer reviewed journals or conferences is the severe lag, takes a long time and part of the advantage of arxiv was that you could have the paper instantly as a preprint.
Also these conferences and journals are also receiving enormous quantities of submissions (29.000 for AAAI) so we are just pushing the problem.
A small payment is probably better than what they are doing. But we must eventually solve the LLM issue, probably by punishing the people that use them instead of the entire public.
I'll add the amount should be enough to cover at least a cursory review. A full review would be better. I just don't want to price out small players.
The papers could also be categorized as unreviewed, quick check, fully reviewed, or fully reproduced. They could pay for this to be done or verified. Then, we have a reputational problem to deal with on the reviewer side.
I don't know about CS, but in mathematics the vast majority of researchers would not have enough funding to pay for a good quality full review of their articles. The peer review system mostly runs on good will.
I like this idea. A small contribution would be a good filter. Looking at the stats it’s quite crazy. Didn’t know that we could access to this data. Thanks for sharing.
> Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.
Edit: original title was "arXiv No Longer Accepts Computer Science Position or Review Papers Due to LLMs"
Isn't arXiv where you upload things before they have gone through the entire process? Isn't that the entire value, aside from some publisher cartel busting?
Agree. Additionally, original title, "arXiv No Longer Accepts Computer Science Position or Review Papers Due to LLMs" is ambiguous. “Due to LLMs” is being interpreted as articles written by LLMs, which is not accurate.
No, the post is definitely complaining about articles written by LLMs:
"In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write."
"Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues."
Surely a lot of them are also about LLMs: LLMs are the hot computing topic and where all the money and attention is, and they're also used heavily in the field. So that could at least partially account for why this policy is for CS papers only, but the announcement's rationale is about LLMs as producing the papers, not as their subject.
I don’t know about this. From a pure entertainment standpoint, we may be denying ourselves a world of hilarity. LLMs + “You know Peter, I’m something of a research myself” delusions. I’d pay for this so long as people are very serious about the delusion.
i would like to understand what people get, or think they get, out of putting a completely AI-generated survey paper on arXiv.
Even if AI writes the paper for you, it's still kind of a pain in the ass to go through the submission process, get the LaTeX to compile on their servers, etc., there is a small cost to you. Why do this?
Gaming the h-index has been a thing for a long time in circles where people take note of such things. There are academics who attach their name to every paper that goes through their department (even if they contributed nothing), there are those who employ a mountain of grad students to speed run publishing junk papers... and now with LLMs, one can do it even faster!
Published papers are part of the EB-1 visa rubric so huge value in getting your content into these indexes:
"One specific criterion is the ‘authorship of scholarly articles in professional or major trade publications or other major media’. The quality and reputation of the publication outlet (e.g., impact factor of a journal, editorial review process) are important factors in the evaluation”
Great move by arXiv—clear standards for reviews and position papers are crucial in fast-moving areas like multi-agent systems and agentic LLMs. Requiring machine-readable metadata (type=review/position, inclusion criteria, benchmark coverage, code/data links) and consistent cross-listing (cs.AI/cs.MA) would help readers and tools filter claims, especially in distributed/parallel agentic AI where evaluation is fragile. A standardized “Survey”/“Position” tag plus a brief reproducibility checklist would set expectations without stifling early ideas.
If you incentivize researchers to publish papers, individuals will find ways to game the system, meeting the minimum quality bar, while taking the least effort to create the most papers and thereby receive the greatest reward.
Similarly, if you reward content creators based on views, you will get view maximization behaviors. If you reward ad placement based on impressions, you will see gaming for impressions.
Bad metrics or bad rewards cause bad behavior.
We see this over and over because the reward issuers are designing systems to optimize for their upstream metrics.
Put differently, the online world is optimized for algorithms, not humans.
Blame people, bad actors, systems of incentives, the gods, the devils, but never broach the fault of LLMs and their wide spread abuse.
Blaming LLMs is unproductive. They are not going anywhere (especially since open source LLMs are so good.)
If we want to achieve real change, we need to accept that they exist, understand how that changes the scientific landscape and our options to go from here.
LLMs are not submitting these papers on their own, people are. As far as I'm concerned, whatever blame exists rests on those people and the system that rewards them.
LLMs are not the root of the problem here.
I heard someone say something similar about the “homeless industrial complex” on a podcast recently. I think it was San Francisco that pays NGOs funds for homeless aid based on how many homeless people they serve. So the incentive is to keep as many homeless around as possible, for as long as possible.
Ditto for views, etc. Really what you care about as eg; youtube is conversions for the products that are advertised. Not impressions. But there's an attribution problem there.
I've been thinking we're living in "Goodhart's Hell". Where metric hacking has become the intent. That we've decided metrics are all that matter and are perfectly aligned with our goals.
But hey, who am I to critique. I'm just a math nerd. I don't run a multi trillion dollar business that lays off tons of workers because the current ones are so productive due to AI that they created one of the largest outages in history of their platform (and you don't even know which of the two I'm referencing!). Maybe when I run a multi trillion dollar business I'll have the right to an opinion about data.
How would an online world that is optimized for humans, not algorithms, look like?
Should content creators get paid?
Hiring and tenure review based on a candidate’s selected 5 best papers.
Already standard practice at a few enlightened places, I think. (of course this also probably increases the review workload for top venues)
To a lesser extent, bean-counting metrics like citations and h-index are an attempt to quantify non-volume-based metrics. (for non-academics, h-index is the largest N such that your N-th most cited paper has >= N citations)
Note that most approaches like this have evolved to counter “salami-slicing”, where you divide your work into “minimum publishable units”. LLMs are a different threat - from my selfish point of view, one of the biggest risks is that it takes less time to write a bogus paper with an LLM than it does for a single reviewer to review it. That threatens to upend the entire peer reviewing process.
I don't think so. Youtube was a better place when it was just amateurs posting random shit.
Everybody "creates content" (like me when I take a picture of beautiful sunset).
There is no such thing as "quality". There is quality for me and quality for you. That is part of the problem, we can't just relate to some external, predefined scale. We (the sum of people) are the approximate, chaotic, inefficient scale.
Be my guest to propose a "perfect system", but - just in case there is no such system - we should make sure each of us "rewards" what we find of quality (being people or content creators), and hope it will prevail. Seemed to have worked so far.
Sure, publishing on important papers has its weight, but not as much as getting cited.
> Technically, no! If you take a look at arXiv’s policies for specific content types you’ll notice that review articles and position papers are not (and have never been) listed as part of the accepted content types.
You cannot upload the journal’s version, but you can upload the text as accepted (so, the same content minus the formatting).
Why not? I don't know about in CS, but, in math, it's increasingly common for authors to have the option to retain the copyright to their work.
I think every project more or less deviates from its original goal given enough time. There are few exceptions in CS like GNU coreutils. cd, ls, pwd, ... they do one thing and do it well very likely for another 50 years.
Not as gate-keepy as journals and not as anarchic as purely open publishing. Should be cheap, too.
Fundamentally, we want research that offers something new (“what did we learn?”) and presents it in a way that at least plausibly has a chance of becoming generalizable knowledge. You call it gate-keeping, but I call it keeping published science high-quality.
I don't think publishing a PGP key with your work does anything. There's no problem identifying the author of the work. The problem is identifying _untrustworthy_ authors. Especially in the face of many other participants in the system claiming the work is trusted.
As I understand it, the current system (in some fields) is essentially to set up a bunch of sockpuppet accounts to cite the main account and publish (useless) derivative works using the ideas from the main account. Someone attempting to use existing reasearch for it's intended purpose has no idea that the whole method is garbage / flawed / not reproducible.
If you can only trust what you, yourself verify, then the publications aren't nearly as useful and it is hard to "stand on the shoulders of giants" to make progress.
Edit: For clarification I’m agreeing with OP
Her suggestion was simple: Kick out all non-ivy league and most international researchers. Then you have a working reputation system.
Make of that what you will ...
[1] https://en.wikipedia.org/wiki/Grigori_Perelman [2] https://www.ams.org/notices/200808/tx080800930p.pdf
Treat everyone equally. After 10 years of only quality you get chance to get back. Before that though luck.
Asking for a small amount of money would probably help. Issue with requiring peer reviewed journals or conferences is the severe lag, takes a long time and part of the advantage of arxiv was that you could have the paper instantly as a preprint. Also these conferences and journals are also receiving enormous quantities of submissions (29.000 for AAAI) so we are just pushing the problem.
The papers could also be categorized as unreviewed, quick check, fully reviewed, or fully reproduced. They could pay for this to be done or verified. Then, we have a reputational problem to deal with on the reviewer side.
You might be vastly underestimating the cost of such a feature
> Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.
Edit: original title was "arXiv No Longer Accepts Computer Science Position or Review Papers Due to LLMs"
ArXiv CS requires peer review for surveys amid flood of AI-written ones
- nothing happened to preprints
- "summarization" articles always required it, they are just pointing at it out loud
"In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write."
"Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues."
Surely a lot of them are also about LLMs: LLMs are the hot computing topic and where all the money and attention is, and they're also used heavily in the field. So that could at least partially account for why this policy is for CS papers only, but the announcement's rationale is about LLMs as producing the papers, not as their subject.
These things will ruin everything good, and that is before we even start talking about audio or video.
It is also turning people into spammers because it makes bluffers feel like experts.
ChatGPT is so revealing about a person's character.
Even if AI writes the paper for you, it's still kind of a pain in the ass to go through the submission process, get the LaTeX to compile on their servers, etc., there is a small cost to you. Why do this?
"One specific criterion is the ‘authorship of scholarly articles in professional or major trade publications or other major media’. The quality and reputation of the publication outlet (e.g., impact factor of a journal, editorial review process) are important factors in the evaluation”
I've never seen arXiv papers counted towards your publications anywhere that the number of your publications are used as a metric. Is USCIS different?