Citation cartels help mathematicians-and their universities-climb the rankings

The situation is out of control. I've also seen that during peer review a reviewer may give a list of publications they 'suggest' for the literature review, sometimes completely unrelated fields. If you choose not to, you will likely see your paper rejected.

The truth is that almost all papers published across almost all fields cannot be replicated, full stop. For those that can, the results rarely match up, indicating that the tests are chosen to boost the performance of the method.

If you want to boost citations legitimately, purposefully leak your own paper to get past the publisher pay wall, make it colourful and accessible (if it's too dense they are unlikely to understand it), give examples on how to replicate your method, the abstract is likely the only part that will be read so make it count.

eigenket · 2 years ago

> I've also seen that during peer review a reviewer may give a list of publications they 'suggest' for the literature review, sometimes completely unrelated fields. If you choose not to, you will likely see your paper rejected.

You are severely misrepresenting how things work here. A reviewer doesn't accept or reject a paper. A reviewer gives their response to an editor and the editor makes a decision. The editor knows who the reviewer is, and has access to the reviews and the author's response to them.

If the editor sees the reviewer has suggested adding bunch of references to their own work, and has suggested rejecting the paper after the authors refuse this then the editor is going to know what is going on. They aren't completely stupid.

> purposefully leak your own paper to get past the publisher pay wall

Many papers in maths, physics and computer science appear on arXiv before being submitted for publication. In the fields I have worked in you can replace the "most" with "pretty much all".

jwsteigerwalt · 2 years ago

I think you’re giving the editors too much credit.

eimrine · 2 years ago

> The truth is that almost all papers published across almost all fields cannot be replicated, full stop.

This is not true for Mathematics.

PaulHoule · 2 years ago

I dunno. When I was in grad school I regularly would find papers in Annals of Mathematical Physics that would go on for 50 pages and have critical errors on page 17 and 45. I did a calculation like that which I know what right at the equation-by-equation level because I made unit tests for everything, but the whole thing was still useless (published in my thesis but not Annals) because the coordinate system we were using converted a periodic orbit into something like a spiral so the we couldn't compute a topological factor because there was no topological invariant...

Verdex · 2 years ago

I don't know about that. IIRC the creation of HoTT was because someone was notified that one of his important papers from earlier in his career had a non-trivial flaw in it.

To be sure, reproduction is a bit easier in mathematics because you normally don't need exotic equipment, just time. However, that doesn't mean that people are necessarily taking the time to double check everything with sufficient rigor.

nickpsecurity · 2 years ago

In the article below, they claimed one study said a third of the papers had a false theorem.

https://news.ycombinator.com/item?id=17430577

A number of papers on formal verification reported that restructuring their specifications for proof revealed problems they missed in the specification. Other works found through empirical testing that the specs didn’t match reality. Many theories in physics changed that way.

So, we should assume they’re all false by default until the specs are reviewed, the proofs are checked, and there’s empirical validation. If theoretical, its component theories must be built on observed truths instead of imaginary constructs. Then, we might trust the paper (i.e. mathematical claims).

nxpnsv · 2 years ago

It is not true for Physics either.

sukruh · 2 years ago

Sure, but most Maths papers are read by like 3 people in total globally, so there's that.

asah · 2 years ago

Is replication a big problem in computer science?

If anything, I'd think replication has gotten easier with the advent of open source and the expectation that top papers will include git repos which make it easy to reproduce. The advent of docker, locked-version-dependencies etc have all contributed as well. 10-20 years ago, we didn't have this - but I also didn't see * widespread * problems even then: if someone published a result saying that technique <x> led to <y> performance gain, I thought it usually did?

(that said, computer science research doesn't have the same money on the line as medicine and the "perish" part of the publish-or-perish equation isn't the same when the fallback for failed academics is still six figure salaries at tech companies)

nradov · 2 years ago

What are the odds that in 2044 we will still be able to run the code published with CS papers today? If it's pure Java SE or ANSI C with no dependencies then it will probably still work, but anything more complex tends to decay pretty rapidly.

vladms · 2 years ago

> The truth is that almost all papers published across almost all fields cannot be replicated, full stop.

Isn't it ironic that we can't replicate your assessment?...

I personally have no idea how much papers can be replicated or not, but replicating an experiment would not guarantee that the idea/conclusion in the paper is sound (ex: paper extracting some data, observing a correlation and concluding it is an implication).

I do agree that it should be made easier to replicate for the fields where this is possible, but there are many other problems.

etrautmann · 2 years ago

this is overstating the case. Throwing out all publications is ridiculous. Yes, many fail to replicate, but not nearly all. A larger concern may be that many fail to generalize (e.g. mouse cancer cures fail to work in humans).

It's true that it's hard to ignore all of the citations a reviewer suggests, but that's part of the role of an editor, and the response to reviewers (while being a frustrating exercise) can make it clear what is and isn't an appropriate citation, which serves as communication to the editor. Yes, this process is frustratingly low bandwidth

stathibus · 2 years ago

How do you know? It's rare that anyone even tries to replicate scientific results outside of clinical medicine

patall · 2 years ago

Exactly. What kind of editors does your field have? In my field, that kind of oversight would have been called out on twitter immediately.

newzisforsukas · 2 years ago

> The truth is that almost all papers published across almost all fields cannot be replicated, full stop.

And yet we still have advances in technology. I wonder how this academic nihilism matches reality?

mcmoor · 2 years ago

This is where I got cognitive dissonance. Can both actually be true at once? If not, which one is wrong?

I wonder what percentage of academic work is based on cargo culting and practices like these. There are more "scientists" than ever and yet we dont really have a quantitative increase of people that are as productive as Euler, Einstein, Newton, Neumann or whoever you want to pick as a luminary

ants_everywhere · 2 years ago

These big names are usually promoted beyond their actual achievements [0] because of the tendency to make stories about great people who changed the world.

The reality is murkier. Huge names like Lorentz and Poincare had worked on relativity before Einstein. Gregory and Barrow had proved the fundamental theorem of calculus before Newton, not to mention Leibniz's work.

If you mythologize the past it's easier to look around and wonder why we don't have immortals like Zeus or incredible warriors like Achilles anymore. But the truth is science always proceeds by steps that look small at the time and it's often only in retrospect that things seem amazing and unprecedented.

Semi-relatedly, I used to attend a seminar with a well-known Russian mathematician who would often chime in with Russian/Soviet priority over historical results mentioned by visiting speakers. The cold war created two mathematical cultures that had limited contact. So famous European and American mathematical results from, say the 40s to the 90s often had Soviet versions worked out independently in journals nobody here had ever heard of and written in a language they can't read.

So this is all just a way of saying that empirically, the big kahuna theory of mathematical development seems more fiction than reality. And it should be treated with skepticism when you hear things framed in those terms.

[0] Not that their achievements aren't great. They are all incredible. I just mean they seem more incredible than they are if we forget all the other incredible achievements. Shoulders of giants etc etc.

jackcosgrove · 2 years ago

It sounds to me like the who of science doesn't matter as much as the where and the when. There are ideas floating around in the ether, and the discovery of these ideas generate more ether for future discoveries. To a large extent being a scientific luminary is being in the right place at the right time.

Which doesn't seem at odds with the grandparent post, given that we shouldn't expect throwing more people at the problem to accelerate discoveries.

lupire · 2 years ago

Where all of today's Lorentzes and Poincares?

epolanski · 2 years ago

Problem is as always greed for fame, power and money.

Labs and researchers are given funds based on "impact".

As we all know, when a metric becomes the goal, the metric gets gamified.

This is very hard to fix.

Gonna give you an example. As soon as you move in any direction in science you're entering a niche. Pretty much everything is its own niche where there's a limited number of people really able to review your paper.

No, there's not thousands of experts dedicating their life to helicoidal peptides interactions with metal layers. It's an extremely small number of people. They all know each other and are gonna review each other's papers regularly.

Solar-powered water splitting to produce hydrogen and oxygen? Again, extremely small club.

Perovskite? Graetzel solar cells? Bigger club, but still, at the end of the day the relevant luminaries are a handful again.

You think it's much different for quaternions or advanced complex numbers analysis?

And those small clubs set the rules and standards, and there is really not much way to have oversight on those small clubs.

Plenty of terrible science is published every day, I can confidently say 90 to 95% of experiments in Physics or Chemistry are impossible to reproduce (numbers people get by taking outlier results or even more often by tweaking the data).

When it comes to softer sciences like psychology it's even worse. It's crap.

It's sad, but I think the world desperately needs a *free* alternative to the biggest publishers out there like Nature or ACM or all these things. That free alternative has to be funded by universities and governments globally. This entity should only allow papers that present clear experiments that are reproduced elsewhere or under supervision.

This would greatly enhance the quality and reliability of papers.

questinthrow · 2 years ago

Yeah its very true, the incentives are screwed. I think you're touching on the key difference here. Someone like Euler would have had to been forced not to do math, it was his passion. But you have all these "scientists" going through the motions doing "science" because its their job. Passionate individuals will always be rare and the worse part is that the current scientific system in place is tailor made to root these passionate individuals out. A good modern example is Grigori Perelman.

SkyBelow · 2 years ago

>You think it's much different for quaternions

It feels like it should be. Maybe the following is just the naivety of a novice looking at experts they don't understand.

Learning numbers, integers, rationals, and real numbers, introduced so much power with each step, to the point that every child is expected to know the basics about real numbers. Maybe not what they are theoretically, but the basic ability to do something like .5 * pi or root(2)^4. They likely never dig any deeper, most don't really get the idea of never ending never repeating decimals, but they are able to work with them on a simple level. For most fields, an introduction only requires this level of math.

When one does move to the complex numbers, it opens up far more possibilities. Fourier transform is everywhere, to the extent many use it without having the math to understand how it does what it does. More complex problems, and I mean outside of the field of math, only have general solutions if you allow for complex numbers to be used. These are difficult, mapping from C to C requires 4D to represent and thus are much harder to visualize, but people still struggle through it.

When I realized that complex numbers weren't the end, but the second step in a tower of infinite height, I wondered what otherwise unsolvable problems needed higher levels from that, just like how many problems needed complex numbers. The difficulty working with them grows, though somewhat given C to C is already 4D, we have already reached the limit of 3D viewing power.

Yet they are rarely used, and higher levels used even less. Maybe there is something fundamental that makes them less useful, far weaker than complex numbers. But from a viewpoint of a novice, I find that surprising.

TotalCrackpot · 2 years ago

I know that at least in AI/ML there are reproducibility challenges, I even had a course during my masters where we had to reproduce a paper. Not perfect, but some disciplines try to address that. AI/ML has a nice feature of publishing almost only in open conferences/journals, there are not that many closed publishing venues in this field.

mjburgess · 2 years ago

Would any of them work in academia today? It seems unlikely.

There are research problems to solve in industry, with more prestige, more pay, and so on.

bsdpufferfish · 2 years ago

Companies hire few of these people too (bad culture fit).

tutfbhuf · 2 years ago

I think that also has to do with the amount of complexity. Henri Poincaré is often said to have been the last man who knew all of math. Today, you need research teams across the world working on a very narrow topic to make tiny progress iteratively in the best case. I think the time when one man could bring us a century forward like Newton or Einstein is gone.

mandmandam · 2 years ago

> I think the time when one man could bring us a century forward like Newton or Einstein is gone.

I'm certain his contemporaries said the same about Pythagoras.

You're not wrong about how much more complex things have gotten, certainly.

But you don't know what you don't know. Another Ramanujan or von Neumann could be right around the corner.

Just look at all the wasted potential in our education system - it's impossible to quantify just how much effort is profoundly wasted.

This article points directly to that wastage. We've tolerated an academic system that's hated by just about everyone except publishers, in an era where publishing is as close to free as it could be.

questinthrow · 2 years ago

Yeah the quantity and complexity of what you need to know now is daunting. But even so I dont think Einstein for example had to know the entirety of physics and math in order to produce a groundbreaking revolution in physics.

User3456335 · 2 years ago

No offence to them of course, but they already took the lower hanging fruit. It's hard to come up with fundamental axioms and laws when they have already been found. And there are definitely people around that have the same potential.

These citation games are at least decades old. The PageRank algorithm came out of the literature on ranking academic papers by citation, and many of the SEO manipulation techniques work in both cases. I'd guess it's being done on an increasing scale given the article.

As a semi-related phenomenon, I also feel like mathematicians are culturally prone to under-citation. I don't just mean that nobody cites older mathematicians like, say, Leibniz or Euler, for techniques they invented. But pretty frequently I'd read papers that really seemed like they were heavily indebted to a paper that came only a few years before but didn't appear in the 5-10 citations. Maybe there it wasn't mandatory to cite those papers, but it made it hard to chain backwards into the literature to get more context on methods and ideas. And sometimes it came off as the author trying to make a tool appear ex nihilo. I'm sure this happens in other fields too.

fluoridation · 2 years ago

My understanding is that citations are used to avoid having to argue for the truth value of a proposition, not to give credit, nor to establish some kind of chain of trust. A sort of "I assume this is true, if you want to argue about it take it up with this guy". If two papers are very similar in structure, but neither uses the conclusions of the other to support the truth of a dependent claim, then I don't see why the older paper should be cited by the newer paper.

michaelt · 2 years ago

Academic papers often include an introduction which summarises the context in which the paper is written. Consider https://arxiv.org/pdf/2201.03545v2.pdf where the paper says "For many decades, this has been the default use of ConvNets, generally on limited object categories such as digits [43], faces [58, 76] and pedestrians [19, 63]. Entering the 2010s, the region-based detectors [23, 24, 27, 57] further elevated ConvNets to the position of being the fundamental building block in a visual recognition system."

This shows the authors are familiar with the field, avoids inadvertent plagiarism, allows them to make it clear precisely how their contribution contributes to knowledge, shows people from funding bodies that this is a cutting edge and important field of research, and points readers in the right direction if they want to see earlier ideas, or ideas in a more applied context.

If among those 92 citations should be a few papers from your boss and your colleagues, nobody will see that as unusual - as long as they're at least marginally relevant.

“My understanding is that citations are used to avoid having to argue for the truth value of a proposition, not to give credit, nor to establish some kind of chain of trust.”

People do this in science, politics, and church. It’s a call to act only (or mostly) on that person’s testimony. An act of faith. It’s both a red flag to watch for and a necessary aid for people.

A red flag because it’s often a sign of fallacy. I default on looking that person up along with folks that might disagree with them. Every success adds credibility. Failures mean I’ll default in a different direction if I hear their name come up. Might even dig into what other claims they have which are spreading.

Those sources who keep getting the job done in specific areas can graduate from red flag to reliable enough to default on. Only in that area for claims not to far from what they’ve been good at. If outside or a big claim, still double check. Also, be able to provide verifiable evidence of why you trust the source.

michaelmior · 2 years ago

It's somewhat field dependent. I would generally expect citations to be used for both purposes. That is, if someone is proposing work substantially similar to existing work, they should be expected to cite that other work and identify differences with the new work since novelty is a key component generally (although not always) required for publication in my area.

canadiantim · 2 years ago

Qualifications of truth are often written into the paper too tho, so e.g. in biology paper people will cite studies, but then explain their mitigating circumstances and context and how they need to be balanced with other studies.

So while I agree that generally citations are a statement of claim and the reference given allows people to see if that statement of claim is actually supported, within the paper itself good authors also explicitly weigh the "truth value of a proposition".

My impression is that math is strange and disconnected. When I was a grad student the (large) physics department had a colloquium series that brought in speakers that were interesting to the whole department despite that department having major splits between astrophysics, condensed matter, accelerator physics, high energy physics, biophysics, etc.

There was a chemistry colloquium that I went to occasionally, partially because I had some friends in the chemistry department who were into the same quantum chaos stuff I was into, but it was clear that a certain faction of the chemistry department showed up there and the rest of them went to other talks.

The math department didn't have any lecture series which was of broad interest to the department.

Physics brought in senior people who had something interesting to say at the colloquium, but more than half of the people the CS department brought in for their colloquium (which I attended a lot later when I had a software dev job at the library) were job talks and often the people did not know what they were talking about and it could get embarrassingly bad. I was lucky to see Geoff Hinton speak before he got famous but there were times I would team up with "the meanest physicist in the world" (for real, not my evil twin) to ask questions at the end that would call out the weakness of the speaker because the audience was just not holding therm accountable.

sfpotter · 2 years ago

All of the math departments I’ve been involved with have had department-wide lecture series. Actually, all of them even had regular lectures meant to apply to the entire college.

_the_inflator · 2 years ago

SEO and CO (citation optimization) maybe rhyme for a reason.

BobaFloutist · 2 years ago

>The PageRank algorithm came out of the literature on ranking academic papers by citation

I never made this connection, but it makes so much sense.

bArray · 2 years ago

maranas · 2 years ago

This has been an issue with university rankings for a while. A lot of top US universities engage in this practice too - anecdotal, but I've heard a lot of professors force students to add/remove some citations, or even add their names to the list of people who worked on a paper to help the numbers for their university.

It would be good to see what the criteria is for deciding if a journal is "to be taken seriously". I imagine for example that Chinese or Arabic language journals wouls be published and citsd in journals of those languages. That doesn't necessarily mean that they arent to be raken seriously in the field, it's just that they aren't Western publications.

chriskanan · 2 years ago

Regarding inflating the number of authors, it is especially bad in medicine, where I've observed a lot of names being added to papers for "political" reasons, despite the "author" playing no role in the paper.

Some journals now require an "Author Contributions" section to at least partially address this issue.

kjkjadksj · 2 years ago

The worst part of this in the biomedical field is the conferences. Thats because sometimes you get toddlers with an advanced degree and a chair position picking the conference presenters, who will unilaterally reject or accept people on the grounds of whether they like them or see them as a competitor, even within the same department, no regards to what the poster or talk might be. At least with journals you have the editor who can sometimes mediate a hotheaded reviewer dispute in a level headed manner.

xqcgrek2 · 2 years ago

I've seen it go in the other direction too. Groups deliberately not citing other competing groups because it might help them. It's like the other groups don't exist.

I've observed this in multiple AI niches. In some cases I've emailed people saying they ignored very similar work and failed to cite it, and in at least some cases, they were apologetic and said they would update the arXiv version. Although of those times, they do that 50% of the time. Kind of tells you that the reviewers at top AI conferences themselves aren't that familiar with the breadth of the literature.

lucioperca · 2 years ago

"All metrics of scientific evaluation are bound to be abused. Goodhart's law [...] states that when a feature of the economy is picked as an indicator of the economy, then it inexorably ceases to function as that indicator because people start to game it."

https://en.wikipedia.org/wiki/Goodhart%27s_law

Dead Comment

mycologos · 2 years ago

Note that the article specifies that these cartels mostly feature participants from bad universities in countries that have really whacked-out academic reward mechanisms that arose because some powerful central government force that doesn't understand research productivity came up with a dumb way to incentivize it.

Prestige is not a failsafe guarantor of quality, but it kind of is, in the sense that somebody with a reputation to protect is going to be more careful about coming across as "unserious", and doing stuff like this is a good way to get a bad reputation quickly.

"Prestigious" people just do it in a more refined way, managing perception instead of raw numbers.

alephnerd · 2 years ago

We can safely assume MIT has a much higher quality output than King Abdulaziz University back in Jeddah.

In general, State Flagships and Ivy Leagues+Ivy Tier Privates are largely comparable research output wise.

The GP is right about prestige minimizing bad practices. Look at how diluted Northeastern's brand has become by trying to artificially inflating their prestige [1]

[1] - https://www.bostonmagazine.com/news/2014/08/26/how-northeast...

ineedasername · 2 years ago

>powerful central government force that doesn't understand research productivity

[1]*

I'm sure they understand it. But their goal in doing this isn't really research productivity (though it would be a happy byproduct) it's to obtain prestige, or its illusion. And/or be able to tell the people of your country how great you're doing. "Our country is a rising powerhouse in field X, in the top 3 in the world and still rising!"

It's internal propaganda. And externally targeted towards populations of other countries unfamiliar with the academic publishing system, which is most people. So for example the vast majority of people in the US that hear these claims by China, which are then echoed in click-bait headlines or simply taken at face value after a journalist checks something like the various HCR lists.

It can even be propaganda, not from China, but from other countries. One of the studies was funded by the US State Department to an Australian group, thus obscuring a primary influence on the results of the study.

Then even a single newspaper article can spark a dozen others citing it, propagating the message. All of these directly or indirectly rely on citations to support their headlines [2]

https://www.science.org/content/article/china-rises-first-pl...

https://www.wsj.com/articles/american-universities-continue-...

https://money.usnews.com/investing/news/articles/2023-03-02/...

https://www.reuters.com/article/idUSTRE72R6FQ/

[1] The quality of this comment will be measured by metrics that I made up myself.

[2] https://www.reuters.com/technology/china-leads-us-global-com...

*This comment was funded by a grant from its author.

maxglute · 2 years ago

>isn't really research productivity (though it would be a happy byproduct)

Or the goal is research productivity and citation gaming is just the unhappy byproduct. Yes some motivated actors to playup PRC capabilities (ASPI/US think tanks) exist, but unless you think every western index on science and innovation, controlled for quality of citations (like Nature) are incentivized/coordinated to carry water for PRC's massive increase in research productivity in the last few years, then the parsimonious answer is PRC research productivity has gotten really good and world leading in some domains.

Which should not be surprising because what OP fails to understand incentivizing citation gaming is a smart way to incentivize output, FAST, and at PRC scale output quantity has quality of it's own. If system spams research, and start off with only 2/10 good research, but leader is 3/5 good research, then PRC has nearly caught up. Refine to 3/10 and there's parity. Refine system to 4/10 and PRC leads. Emphasis being fast because PRC started focusing on seriously improving tertiary and R&D/S&T ~10 years ago and the goal was to develop capabilites fast by customary overproduction. You don't get that by slow careful growth, you do that by incentivizing easy KPIs everyone in sector can coordinate around and the product of said KPI is ENOUGH good research after byproduct of citation gaiming. Having a bunch of chaff leftover to get wheat is the point.

jampekka · 2 years ago

It happens everywhere to some extent.

throwaway12314 · 2 years ago

I worked as a research assistant for a bit over a year after the graduation and I had an experience that put me off. We submitted a paper and got some review feedback and one of them contained something like this

  ... this and that paper also worked on this research area and contain this and that stuff you can citate ...

The prof I was working with told me that this anonymous reviewer was probably the writer of those papers and asking for citation.

enriquto · 2 years ago

Why did that put you off? This is a very common thing for reviewers to do. It is useful for you, because it makes you aware of other people who work in the same problem as you, leading to future job prospects, etc. But more importantly, it is useful for the readers of the article!

The alternative is several independent communities of researchers that work separately on the same problem, but do not acknowledge the existence of each other. Now, that would put you off!

It depends on context. Often reviewers are chosen because they work specifically in that area, and of course they know their work best. I've often done this as a reviewer in appropriate situations, where I point out closely related work that I've or others have done.

Of course, I've also seen EDITORS email me saying they would like me to cite some papers from their journal after I submitted my paper. That was definitely a turn off.

> The prof I was working with told me that this anonymous reviewer was probably the writer of those papers and asking for citation.

I think this is mostly just a kind of fun way of complaining about reviewers. Having been a reviewer for many computer science conference, I've often had the names of other reviewers visible to me. Most requests for citations are not for the reviewer's own work, and truly irrelevant citation requests seem to be pretty rare.

I do perceive journal/lab-centric fields to be worse about this, though.

fock · 2 years ago

well, if it's that transparent... I still wonder what to do with a handful of slightly off, but relatively good papers suggested to us. 2/5 would augment existing citations, 3/5 are related but not really relevant... No author is shared between them.

_kb · 2 years ago

So this essentially appears to be the SEO blogspam playbook oozing into academia? I wonder if any (likely well considered) techniques developed to combat this context may be able to be backported to help reclaim the usefulness of general internet search.

> I wonder if any (likely well considered) techniques developed to combat this context may be able to be backported to help reclaim the usefulness of general internet search.

For a level of an individual researcher this is easy, you just learn what is true and what is false. For a level of the whole society the battle is already lost, your (highly valuable and competent) solution to make a gigachad search just will be either lost or gamed or sued/prohibited.

SEO spam is worse than ever. The only thing Google figured out is to raise the cost of website above 0 such as by requiring ssl and well written English.

amadeuspagel · 2 years ago

Google is already way more advanced then this. If anything the question is more the reverse: Can Google Scholar publish a university ranking?

Google is advancing only in the field of showing ads.