The crisis in science can only be fixed by addressing the slew of bad incentives built into the system. We can't predicate job security, promotion and prestige of every early career scientist on publishing as many papers as possible, and on obtaining grants (which requires publishing as many papers as possible) and then expect high-quality science. We can't starve universities of public funding and expect them not to selectively hire scientists whose main skill is publishing hundreds of "exciting" papers, and not overproduce low-quality future "scientists" who were trained in the dark arts of academic survival. Reform is more urgent than ever; AI has essentially obsoleted the mental model that equates the count of published papers with productivity and quality.
I can't say this enough, independent reproduction must be a part of the process or we'll continue seeing this issue. As you say it's the incentives. One solution that's seems reasonably possible for 95+% of research would be to put 30% or so of the research funds locked away, to be then given to another team ideally at another university that get's access only to the original teams' publication and has the goal to reproduce the study. The vast majority of papers released don't contain enough information to actually repeat their work.
And since we are talking about science reform, let's start with the much easier and cheaper preregistration [1] which helps massively with publication bias.
Recently I read some lectures from Jacob Bronowski. If you never heard about him, he was sort of a predecessor of personalities like Bill Nye or Neil Tyson: he wrote books that popularize science, gave simplified introduction to philosophical and scientific topics etc.
He advocated (very naively, as it appears today) for science as a human endeavor that has no reason for falsification. His justification was that scientists have nothing to lose from being proved wrong, and, as an example, he gave some University dean who published some works that were shown to be completely wrong in a course of few decades, but still retained his position in a university (because his approach was valid and he never attempted to manipulate the truth, he just made an honest error).
But, the more I think about how did we come to this, in many human activities it is often the case that whoever undertook such activities relied on their own wealth and not being incentivized to commercialize their discoveries. It was the aristocrats or monks or some other occupation that made their life affordable, and boring enough for them to look for challenge in art or science. Once science became professional, it started to be incentivized in the same way any other vocation is: make more of it--be paid more; make more immediately useful things--be paid more.
I don't know if we should return to the lords and monks system :) But I'm also doubtful that we can make good progress by pulling the levers on financial incentives of commercializing science.
Incentives like these exist in basically all areas of work. Perform well and you get "job security, promotion and prestige". Yet somehow there is no decade long ongoing crisis in industry of corporations lying about their products. When these cases happens (obviously they do), corporations and individuals get punished.
How would you reform the system? More funding definitely is not the answer.
You mean like how tobacco company CEO's went to Congress telling them that there is no cancer risk? Oil companies pretending they didn't know about global warming? Shrinkflation? Corporations lie all the time and the people running them are never punished.
Because we can't usually measure our goals directly. We want outcomes like relativity and the two-slit experiment. Those results take a lot of time to uncover and have a meaningful chance of failure. If you look at an early-career scientist who hasn't produced (m)any papers, chances are they're fully qualified _and_ doing all the right things with respect to our society-level goals. However, that's hard to distinguish from outright fraud and freeloading from the outside, so we've imposed a crappy proxy measure, used for career advancement.
That's different from many jobs, where it's easy to measure incremental progress and where the results are more certain. You can directly weed out poor performers because you can watch them perform poorly.
> no decade long ongoing crisis of corporations lying about their products
Really? Flame retardants in our "food-grade" spatulas, lead leaching out from ceramic bowls into your soup and cereal, products "sold" as physical devices with a backdoor to start requiring a subscription years later, the pattern of building a brand on quality and then gutting the bill of materials to ramp up profits while deceiving customers into thinking it's the same thing, WalMart explicitly requiring manufacturers to not have any change in product numbers for the sub-par products sold there, .... Fraud is rampant, enough so that for most products I find it quite hard to actually make a sound purchasing decision, and those corporations seem to be wildly profitable.
> individuals get punished
That's true to an extent, but how many doc jockeys exist in some unimportant department in FAANG? You can have a very comfortable career skating by on minimal productive output when cause and effect for the business operate on sufficiently long timescales and with nonlocal, diffuse connections.
How about punishment for terrible behavior. If you design bad experiments then why are you a researcher? Fired. If you commit fraud, fined and fired. Weed out these fuckers.
The median sample size of the studies subjected to replication was n = 5 specimens (https://osf.io/atkd7). Probably because only protocols with an estimated cost less than BRL 5,000 (around USD 1,300 at the time) per replication were included. So it's not surprising that only ~ 60% of the original biomechemical assays' point estimates were in the replicates' 95% prediction interval. The mouse maze anxiety test (~ 10%) seems to be dragging down the average. n = 5 just doesn't give reliable estimates, especially in rodent psychology.
This should be the top comment on HN where most users claim to have some grasp of statistics. N=5 implies a statistical uncertainty of about 45%, so they measured what one would expect, which is essentially nothing. Also this is specifically about Brazilian biomedical studies, and contains no evidence to support people's various personal vendettas against other fields in other countries. At least read the article people.
It would be interesting for reproducibility efforts to assess “consequentiality” of failed replications, meaning: how much does it matter that a particular study wasn’t reproducible? Was it a niche study that nobody cited anyway, or was it a pivotal result that many other publications depended on, or anything in between those two extremes?
I would like to think that the truly important papers receive some sort of additional validation before people start to build lives and livelihoods on them, but I’ve also seen some pretty awful citation chains where an initial weak result gets overegged by downstream papers which drop mention of its limitations.
It is an ongoing crisis how much Alzheimer’s research was built on faked amyloid beta data. Potentially billions of dollars from public and private research which might have been spent elsewhere had a competing theory not been overshadowed by the initial fictitious results.
The amyloid hypothesis is still the top candidate for at least a form of Alzheimer's. But yes, the issues with one of the early studies has caused significant issues.
I say "a form of Alzheimer's" because it is likely we are labelling a few different diseases as Alzheimer's.
The issue is null results on these kinds of studies don’t actually mean much.
Here sample sizes were tiny, which introduced a vast amount of random noise. The fact so many studies where replicated suggests the vast majority of the underlying studies where valid not just the ones they could reproduce.
For central limit theorem to hold, the random variables must be (independently and identically dustributed) i.i.d.
How do we know our samples are i.i.d.? We can only show if they are not
In my field, trying to reproduce results or conclusions from papers happens on a regular basis especially when the outcome matters for projects in the lab. However, whatever the outcome, it can't be published because either it confirms the previous results and so isn't new or it doesn't and no journal wants to publish negative results. The reproducibility attempts are generally discussed at conferences in the corridors between sessions or at the bar in the evening. This is part of how a scientific consensus is formed in a community.
This doesn’t really surprise me at all. It’s an unrelated field, but part of the reason I got completely disillusioned with research to the point I switched out of a program with a thesis was because I started noticing reproducibility problems in published work. My field is CS/CE, generally papers reference publicly available datasets and can be easily replicated… except I kept finding papers with results I couldn’t recreate. It’s possible I made mistakes (what does a college student know, after all), but usually there were other systemic problems on top of reproducibility. A secondary trait I would often notice is a complete exclusion of [easily intuited] counter-facts because they cut into the paper’s claim.
To my mind there is a nasty pressure that exists for some professions/careers, where publishing becomes essential. Because it’s essential, standards are relaxed and barriers lowered, leading to the lower quality work being published. Publishing isn’t done in response to genuine discovery or innovation, it’s done because boxes need to be checked. Publishers won’t change because they benefit from this system, authors won’t change because they’re bound to the system.
All it takes is 14 grad students studying the same thing targeting a 95% confidence interval for, on average, one to stumble upon a 5% case. Factor in publication bias and you get a bunch of junk data.
I think I heard this idea from Freakonomics, but a fix is to propose research to a journal before conducting it and being committed to publication regardless of outcome.
Not familiar with this idea, but this idea is commonly applied for grant applications: only apply for a grant when you finished the thing you promise to work on. Then use the grant money to prototype the next five ideas (of which maybe one works), because science is about exploration.
Most pharma / medicine studies are pre-registered now. Sometimes the endpoints change based on what the scientists are seeing, but if they're worth their salt, they still report the original scoped findings as well.
>but a fix is to propose research to a journal before conducting it and being committed to publication regardless of outcome.
Does not fix the underlying issue. Having a "this does not work" paper on your resume will do little for your career. So the incentives to make data fit a positive hypothesis are still there.
The state of CS papers is truly awful, as they're uniquely poised to be 100% reproducible. And yet my experience aligns with yours in that they very rarely are.
Even more ridiculous is the number of papers that do not include code. Sure, maybe Google cannot offer an environment to replicate the underlying 1PB dataset, but for mortals, this is rarely a concern.
Even better is when the paper says code will be released after publication, but they cannot be bothered to post it anywhere.
I can second this, even availability of the code is still a problem. However, I would not say CS results are rarely reproducible, at least from the few experineces I had so far, but I heard of problematic cases from others. I guess it also differs between fields.
I want to note there is hope. Contrary to what the root comment says, some publishers try to endorse reproducible results. See for example the ACM reproducibility initiative [1]. I have participated in this before and believe it is a really good initiative. Reproducing results can be very labor intensive though, loading a review system already struggling under massive floods of papers. And it is also not perfect, most of the time it is only ensured that the author-supplied code produces the presented results, but I still think more such initiatives are healthy. When you really want to ensure the rigor of a presented method, you have to replicate it, i.e., using a different programming language or so, which is really its own research endeavor. And there is also a place to publish such results in CS already [2]! (although I haven‘t tried this one). I imagine this may be especially interesting for PhD students just starting out in a new field, as it gives them the opportunity to learn while satisfying the expectation of producing papers.
This same post appears at the top of every single HN story on reproducibility. “I was a student in [totally unrelated field] and found reproducibility to be difficult. I didn’t investigate it deeply and ultimately I left the field, not because I was unsuccessful, of course, but because I understood deeply despite my own extremely limited experience in the area that all of the science was deeply flawed if not false.”
Imagine the guy who got a FAANG job and made it nine weeks in before washing out, informing you how the entire industry doesn’t know how to write code. Maybe they’re right and the industry doesn’t know how to write code! But I want to hear it from the person who actually made a career, not the intern who made it through part of a summer.
This seems like a straw-man. The stories are much more complex than this (in my experience/opinion), usually directly reporting about immoral acts by peers, lack of support, unfair/inequal treatment, hypocrisy, and so on. The event of the failed reproduction is at best an intermezzo.
Not to mention that we know a lot of overhyped results did fail replication and then powerful figures in academia did their best to pretend that still their thrones were not placed on top of sandcastles.
The problem is the negative feedback cycle: someone who has spent decades in academia and is highly published, almost by definition alone, has not experienced the pains of industry practitioners.
Their findings are often irrelevant to industry at best and contradictory at worst.
A lot of people have pointed out a reproducibility crisis in social sciences, but I think it's interesting to point out this happens in CompSci as well when verifying results is hard.
Reproducing ML Robotics papers requires the exact robot/environment/objects/etc -> people fudge their numbers and have strawman implementation of benchmarks.
LLMs are so expensive to train + the datasets are non-public -> Meta trained on the test set for Llama4 (and we wouldn't have known if not for some forum leak).
In some way it's no different than startups or salesmen overpromising - it's just lying for personal gain. The truth usually wins in the end though.
And since we are talking about science reform, let's start with the much easier and cheaper preregistration [1] which helps massively with publication bias.
[1] https://en.wikipedia.org/wiki/Preregistration_(science)
He advocated (very naively, as it appears today) for science as a human endeavor that has no reason for falsification. His justification was that scientists have nothing to lose from being proved wrong, and, as an example, he gave some University dean who published some works that were shown to be completely wrong in a course of few decades, but still retained his position in a university (because his approach was valid and he never attempted to manipulate the truth, he just made an honest error).
But, the more I think about how did we come to this, in many human activities it is often the case that whoever undertook such activities relied on their own wealth and not being incentivized to commercialize their discoveries. It was the aristocrats or monks or some other occupation that made their life affordable, and boring enough for them to look for challenge in art or science. Once science became professional, it started to be incentivized in the same way any other vocation is: make more of it--be paid more; make more immediately useful things--be paid more.
I don't know if we should return to the lords and monks system :) But I'm also doubtful that we can make good progress by pulling the levers on financial incentives of commercializing science.
[1] https://www.bbc.co.uk/iplayer/episodes/b00wms4m/the-ascent-o...
Incentives like these exist in basically all areas of work. Perform well and you get "job security, promotion and prestige". Yet somehow there is no decade long ongoing crisis in industry of corporations lying about their products. When these cases happens (obviously they do), corporations and individuals get punished.
How would you reform the system? More funding definitely is not the answer.
Because we can't usually measure our goals directly. We want outcomes like relativity and the two-slit experiment. Those results take a lot of time to uncover and have a meaningful chance of failure. If you look at an early-career scientist who hasn't produced (m)any papers, chances are they're fully qualified _and_ doing all the right things with respect to our society-level goals. However, that's hard to distinguish from outright fraud and freeloading from the outside, so we've imposed a crappy proxy measure, used for career advancement.
That's different from many jobs, where it's easy to measure incremental progress and where the results are more certain. You can directly weed out poor performers because you can watch them perform poorly.
> no decade long ongoing crisis of corporations lying about their products
Really? Flame retardants in our "food-grade" spatulas, lead leaching out from ceramic bowls into your soup and cereal, products "sold" as physical devices with a backdoor to start requiring a subscription years later, the pattern of building a brand on quality and then gutting the bill of materials to ramp up profits while deceiving customers into thinking it's the same thing, WalMart explicitly requiring manufacturers to not have any change in product numbers for the sub-par products sold there, .... Fraud is rampant, enough so that for most products I find it quite hard to actually make a sound purchasing decision, and those corporations seem to be wildly profitable.
> individuals get punished
That's true to an extent, but how many doc jockeys exist in some unimportant department in FAANG? You can have a very comfortable career skating by on minimal productive output when cause and effect for the business operate on sufficiently long timescales and with nonlocal, diffuse connections.
You want to get a Phd? you have to publish something... anything.
You want money for experiments? You need publications even if you do the rest of the theoretical work on your own.
You want to get funds for some new or to continue some research? You need publications.
I'm not defending those that publish all sorts of crap as research but the whole system is rigged.
Everyone is asking for as many publications and citations as possible to even lend you a lab for 1 day to test something.
Excuse my language but what the f are you expecting?
Edit: formatting
I would like to think that the truly important papers receive some sort of additional validation before people start to build lives and livelihoods on them, but I’ve also seen some pretty awful citation chains where an initial weak result gets overegged by downstream papers which drop mention of its limitations.
I say "a form of Alzheimer's" because it is likely we are labelling a few different diseases as Alzheimer's.
Here sample sizes were tiny, which introduced a vast amount of random noise. The fact so many studies where replicated suggests the vast majority of the underlying studies where valid not just the ones they could reproduce.
For central limit theorem to hold, the random variables must be (independently and identically dustributed) i.i.d. How do we know our samples are i.i.d.? We can only show if they are not
Add to that https://en.m.wikipedia.org/wiki/Why_Most_Published_Research_...
We've got to do better or science will stagnate
To my mind there is a nasty pressure that exists for some professions/careers, where publishing becomes essential. Because it’s essential, standards are relaxed and barriers lowered, leading to the lower quality work being published. Publishing isn’t done in response to genuine discovery or innovation, it’s done because boxes need to be checked. Publishers won’t change because they benefit from this system, authors won’t change because they’re bound to the system.
I think I heard this idea from Freakonomics, but a fix is to propose research to a journal before conducting it and being committed to publication regardless of outcome.
https://en.m.wikipedia.org/wiki/Preregistration_(science)
Does not fix the underlying issue. Having a "this does not work" paper on your resume will do little for your career. So the incentives to make data fit a positive hypothesis are still there.
Even better is when the paper says code will be released after publication, but they cannot be bothered to post it anywhere.
I want to note there is hope. Contrary to what the root comment says, some publishers try to endorse reproducible results. See for example the ACM reproducibility initiative [1]. I have participated in this before and believe it is a really good initiative. Reproducing results can be very labor intensive though, loading a review system already struggling under massive floods of papers. And it is also not perfect, most of the time it is only ensured that the author-supplied code produces the presented results, but I still think more such initiatives are healthy. When you really want to ensure the rigor of a presented method, you have to replicate it, i.e., using a different programming language or so, which is really its own research endeavor. And there is also a place to publish such results in CS already [2]! (although I haven‘t tried this one). I imagine this may be especially interesting for PhD students just starting out in a new field, as it gives them the opportunity to learn while satisfying the expectation of producing papers.
[1] https://www.acm.org/publications/policies/artifact-review-an... [2] https://rescience.github.io
It's lack of industry experience. I complained about this is a recent comment here: https://news.ycombinator.com/item?id=43769856
Basically, in SE anyway, the largest number of publications are authored by new graduates.
Think about how clueless the new MSc or PhD graduate is when they join your team: thesebare the majority of authors.
The system is set up to incentivise the wrong thing.
Imagine the guy who got a FAANG job and made it nine weeks in before washing out, informing you how the entire industry doesn’t know how to write code. Maybe they’re right and the industry doesn’t know how to write code! But I want to hear it from the person who actually made a career, not the intern who made it through part of a summer.
Not to mention that we know a lot of overhyped results did fail replication and then powerful figures in academia did their best to pretend that still their thrones were not placed on top of sandcastles.
Their findings are often irrelevant to industry at best and contradictory at worst.
Of course I'm talking almost solely about SE.
Reproducing ML Robotics papers requires the exact robot/environment/objects/etc -> people fudge their numbers and have strawman implementation of benchmarks.
LLMs are so expensive to train + the datasets are non-public -> Meta trained on the test set for Llama4 (and we wouldn't have known if not for some forum leak).
In some way it's no different than startups or salesmen overpromising - it's just lying for personal gain. The truth usually wins in the end though.