I will never forget the day a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted. The primary author is now a hot shot professor.
My whole perception of academia and peer review changed that day.
Edit to elaborate: like many of our institutions, peer review is an effective system in many ways but was designed assuming good faith. Reviewers accept the author’s results on faith and largely just check to make sure you didn’t forget any obvious angles to cover and that the import of the work is worth flagging for the whole community to read. Since there’s no actual verification of results, it’s vulnerable to attack by dishonesty.
When I was in school pre-university, this type of "crap we can't get the what we wanted to happen so let's just fiddle around with it until it seems about right" was very common. I was convinced this was how children learned, so that as adults they wouldn't have to do things that way.
When I got into university and started alternating studying and work, I realised just how incredibly clueless even adults are. The "let's just try something and hope nothing bad happens" attitude permeates everything.
It's really a miracle the civilisation works as well as it does.
The upshot is that if something seems stupid, it probably is and can be improved.
In the lyceum where I studied, there was one lab on Physics, where the book that accompanied the lab was deliberately wrong. We were told to perform an experiment that "should" support a certain conclusion, but actually neither the "correct" conclusion nor its opposite could be done because of the flawed setup which measured something slightly different. A lot of students (in some groups, all students) fell into this trap and submitted paperwork with the "correct" conclusion according to the book.
I've come to think things works as well as it does largely because a whole lot of what people do has no effect either way. I see so much stupidity where the only saving grace is that it is directed into pointless efforts that won't be allowed to do any real damage.
A bio professor of mine said something that stuck with me: “life doesn’t work perfectly, it just works.”
It has to work well enough to… work… and reproduce. That’s it. It’s not “survival of the fittest.” It’s “survival of a randomized subset of the fit.”
There’s even a set of thermodynamic arguments to the effect that systems are unlikely to exceed such minimum requirements for a given threshold. For example, if we are visited by interstellar travelers they are likely to be the absolute dumbest and most dysfunctional possible examples of beings capable of interstellar travel since anything more is a less likely thermodynamic state.
So much for Star Trek toga wearing utopian aliens.
The problem is the incentives. To do well, you must publish. To publish, you must have a good story, and ‘we tried this and it didn’t work’ is not one.
So after a certain time spent, you are left with a choice of ‘massaging’ the data to get some results, or not and getting left behind those that do or were luckier in their research.
Distracting from the main point, "let's just try something and hope nothing bad happens" (trial and error) is precisely the reason civilization made it this far :)
I'm just done with a 3-hour reading session of an evolutionary psychology book by one of the leading scientists in the field. The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.
Still, reading your comment makes me despair. It plants a nagging doubt in my mind, "how many of these zillion studies cited that are actually replicable?" This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.
What are the solutions here? A big incentive-shift to reward replication more? Public shaming of misleading studies? Influential conferences giving more air-time for talks about "studies that did not replicate"? I know some of these happen at a smaller-scale[1], but I wonder about the "scaling" aspect (to use a very HN-esque term).
PS: Since I read Behave by Sapolsky — where he says "your prefrontal cortex [which plays critical role in cognition, emotional regulation, and control of impulsive behavior] doesn't come online until you are 24" — I tend to take all studies done on university campuses with students younger than 24 with a good spoon of salt. ;-)
Evo psych is questionable to me for more basic reasons. It seems full of untestable just so stories to explain apparent biases that are themselves hard to pin down or prove are a result of nature not nurture.
It’s probably not all bullshit but I would bet a double digit percentage of it is.
> “how many of these zillion studies cited that are actually replicable?” This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.
I think the problem is much bigger than simply a binary is it replicable or not. It’s extremely easy to find papers by “leading experts” that have valid data with replicable results where the conclusions have been generalized beyond the experiments. The media does this more or less by default when reporting on scientific results, but researchers do it themselves to a huge degree, use very specific conditions and results to jump to a wider conclusion that is not actually supported by the results.
A high profile example of this is the “Dunning Kruger” effect; the data in paper did not show what the flowery narrative in the paper claimed to show, but there’s no reason to think they falsified the results. Some researchers have reproduced the results, as long as the conditions were very similar. Other researchers have tried to reproduce the results under different conditions that should have worked according to the paper’s narrative and conclusions, but found that they could not reproduce, because there were specific factors in the original experiment that were not discussed in the original paper’s conclusions -- in other words, Dunning and Kruger overstated what they measured such that the conclusion was not true. They both enjoyed successful academic careers and some degree of academic fame as a result of this paper that is technically reproducible but not generally true.
To make matters worse, the public has generally misinterpreted and misunderstood even the incorrect conclusions the authors stated, and turned it into something else. Almost never in discussions where the DK effect is invoked do people talk about the context or methodology of the experiments, or the people who participated in them.
This human tendency to tell a story and lose the context and details and specificity of the original evidence, the tendency to declare that one piece of evidence means there is a general truth, that is scarier to me than whether papers are replicable or not, because it casts doubt on all the replicable papers too.
> The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.
One approach that can be adopted on a personal level is simply changing the way one thinks. For example, switch from a binary (true/false) method of epistemology to trinary (true/false/unknown), defaulting to unknown, and consciously insist on a high level of certainty to reclassify an idea.
There's obviously more complexity than this, but I believe that if even a relatively small percentage of the population started thinking like this (particularly, influential people) it could make a very big difference.
Unfortunately, this seems to be extremely counter to human nature and desires - people seem seem compelled to form conclusions, even when it is not necessary ("Do people have ideas, or do ideas have people?").
Yes, but you have to convince your readers that you did a more careful and meticulous job than 'Top Institution's Best Paper Award' did. After all, a failure to replicate only means that one of you is wrong, but it doesn't give any hint as to who.
> a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted.
Isn't that the moment where you try even harder to falsify the claims in that paper? You already know that you'll succeed so it wouldn't be a waste of time in your effort.
The problem with experimental results is that they are difficult to replicate. In software you can "git clone x.git & cd x & make" and replicate the correct or incorrect results. In hardware, it's more difficult.
The main problem is that even if you reproduce their experiment, they can claim that you did some step wrong, perhaps you are mixing it too fast or too slow, or the temperature is not correctly controlled, or that one of your reactive have a contamination that destroy the effect, or magically realize that their reactive that is important.
It's very difficult to publish papers with negative results. So there is a high chance it will not count in your total number of publications. Also, expect a low number of citation, so it's not useful for other metrics like citation count or h.
For the same reason, you will not see publications of exact replications. A good paper X will be followed by almost-replications by another teams, like "we changed this and got X with a 10% improvement" or "we mixed the methods of X and Y and unsurprisingly^W got X+Y". This is somewhat good because it shows that the initial result is robust enough to survive small modifications.
It is not possible (in principle) and it was never intended for peer review to protect against fraud. And this is ok. Usually if a result is very important and forged, other groups try to replicate and fail, after some time the original dataset (which needs to be kept for 10 years I think) will be requested and then things go done from there.
Assuming not good faith for peer review would make academia more interesting, only way would probably for the peer reviewer go to the lab and get live measurements shown. Then check the equipment...
I wonder if it's a better system to just hire smart professors and give them tenure immediately. The lazy ones in it just for the status won't do any work, but the good ones will. Sure, there will be dead weight that gets salaries for life, but I feel like that's a lesser problem than incentivizing bad research.
The problem isn't just the scientists, it goes all the way up. Let's say we implement your system. Who decides how many 'smart professors' the Type Theory group gets to hire? What if the Type Theory and Machine Learning departments both want to hire a new 'smart professor' but the Computer Science department only has money to hire one more person?
One reasonable approach might be to look at which group has produced the 'best' research over the past few years. But how do you judge that in a way that seems fair? Once you have a criteria to judge that, then people will start to game that criteria.
Or taking a step up, The university needs to save money. How do you judge if the Chemistry department or the Computer Science department should have its funding cut.
No matter how you slice it at some point you're going to need a way for someone to judge which of two departments is producing the 'best' research and thus deserves more money, and that will incentivize people to game that metric.
Smart isn't the biggest criteria for success as a professor. The PhD degree is a good filter because it trains and tests research aptitude, work ethic, ability to collaborate, ability to focus on a single problem for a long period of time, and others.
One problem is PhD degrees are too costly to those who don't get academic or industrial success from them. But as long as talented people are willing to try to become a professor I don't see the system changing.
In the past people who did science could do so with less personally on the line. In the early days you had men of letters like Cavendish who didn't really need to care if you liked what he wrote, he'd be fine without any grants. That obviously doesn't work for everyone, but then the tenure system developed for a similar reason: you have to be able to follow an unproductive path sometimes without starving. And that can mean unproductive in that you don't find anything or in that your peers don't rate your work. There'd be a gap between being a young researcher and tenured, sure.
Nowadays there's an army of precariously employed phds and postdocs. Publish or perish is a trope. People get really quite old while still being juniors in some sense, and during that time everyone is thinking "I have to not jeopardise my career".
When you have a system where all the agents are under huge pressure, they adapt in certain ways: take safer bets, write more papers from each experiment, cooperate with others for mutual gain, congregate around previous winners, generally more risk reducing behaviour.
Perhaps the thing to do is make a hard barrier: everyone who wants to be a researcher needs to get tenure after undergrad, or not at all. (Or after masters or whatever, I wouldn't know.) Those people then get a grant for life. It will be hard to get one of these, but it will be clear if you have to give up. Lab assistants and other untenured staff know what they are negotiating for. Tenured young people can start a family and not have the rug pulled out when they write something interesting.
Solution is to publish data, not „papers“ first and assign it a replication score - how many times it was verified by independent research.
The paper can follow with the explanation, but citations will no longer be important - what will matter is the contribution to the replication score (will also work as an incentive to confirm other‘s results).
Im not sure prosecuting academics is particularly obvious: you'd need to prove malicious intent (rather than ignorance) which is always difficult.
For me a better solution would be to properly incentivise replication work and solid scientific principles. If repeating an experiment and getting a contradictory result carried the same kudos as running the original experiment then I think we'd be in a healthier place. Similarly if doing the 'scientific grind work' of working out mistakes in experimental practice that can affect results and, ultimately, our understanding of the universe around us.
I think an analogy with software development works pretty well: often the incentives point towards adding new features above all else. Rarely is sitting down and grinding through the litany of small bugs prioritised, but as any dev will tell you doing that grind work is as important otherwise you'll run in to a wall of technical debt and the whole thing will come tumbling down.
One perspective is that, “knowledge generation wise,” the current system really does work from a long term perspective. Evolutionary pressure keeps the good work alive while bad work dies. Like that [Top Institution] paper: if nobody else could reproduce it, then the ideas within it die because nobody can extend the work.
But that comes at the heavy short term cost of good researchers getting duped into wasting time and bad researchers seeing incentives in lying. Which will make academia less attractive to the kind of people that ought to be there, dragging down the whole community.
Due to career and other reasons, there is a publish or perish crisis today.
Maybe we can do better by accepting not everyone can publish ground breaking results, and it's okay.
There are lots of incompetent people in academia, who later go to upper positions and decide your promotions by citation counts and how much papers you published. I have no realistic ideas how to counter this.
We need to create new a social institution of Anti-Science, which would work on other stimuli correlated with the amount of refuted articles. No tenures, no long-term contracts. If anti-scientist wished to have income it would need to refute science articles.
Create a platform allowing to hold a scientific debate between scientists and anti-scientists, for a scientist had an ability to defend his/her research.
No need to do anything special to prosecute, because Science is a very competitive, and availability of refutations would be used inevitable to stop career progressions of authors of refuted articles.
Data and code archives, along with better methods training.
Data manipulation generally doesn't happen by changing values in a data frame. It's done by running and rerunning similar models with slightly different specifications to get a P value under .05, or by applying various "manipulations" to variables or the models themselves for the same effect. It's much easier to identify this when you have the code that was used to recreate whatever was eventually published.
I don't think prosecution is the right tool but if we were going down that road material misrepresentations only would fit with anti-fraud standard for companies. Just drawing dumb, unpopular, or 'biased' conclusions shouldn't be a crime but data tampering would fall into the scope. Not a great idea as it would add a chilling effect, lawyer-friction and expenses and still be hard to enforce for little direct gain.
I personally favor requirements which call for bundling raw datasets with the "papers". The data storage and transmission is very cheap now so there isn't a need to restrict ourselves to just texts. We should still be able to check all of the thrown out "outliers" from the datasets. An aim should be to make the tricks for massaging data nonviable. Even if you found your first data set was full of embarassing screw ups due to doing it hungover and mixing up step order it could be helpful to get a collection of "known errors" to analyze. Optimistically it could also uncover phenomenon scientests thought was them screwing up like say cosmic background radiation being taken as just noise and not really there.
Paper reviewing is already a problem but adding some transparency should help.
We could use public funding to do the work OP tried to do.
Something like a well funded ten year campaign to do peer review, retrying experiments and publishing papers on why results are wrong.
I have a co-worker who had a job than involved publishing research papers. Based on his horror stories it seems like the most effective course of action is to attack the credibility of those who fudges results.
The single biggest impediment to "fixing this" is that you haven't identified what "this" is or in what manner it is broken.
There will always be cases of fraud if someone deeps deeply enough into large institutions. That doesn't actually indicate that there is a problem.
Launching in to change complex systems like the research community based on a couple of anecdotes and just-so stories is a great way not actually achieving anything meaningful. There needs to be a very thorough, emotionally and technically correct enumeration of what the actual problem(s) are.
By waiting until scientists address this? Note that the 'replication crisis' is something that originated inside science itself, so, despite there being problems science has not lost its self-correcting abilities. The scientists themselves can do something by insisting on reliable and correct methods and pointing it out wherever such methods are not in use. It is also not like there are no gains in doing this. Brian Nosek became rather famous.
Scientists with a proven track record should have life-long funding of their laboratory without any questions asked. So they can act as they want without fear of social repercussions. Of course some money will be wasted and the question of determining whether a track record is proven is still open, but I think that's the only way for things to work (except when the scientist himself have enough money to fund his own work).
My personal opinion is this problem fixes itself over time.
When I was in graduate school papers from one lab at Harvard were know to be “best case scenario”. Other labs had a rock solid reputation - if they said you could do X with their procedure, you could bet on it.
So basically we treated every claim as potential BS unless it came from a reputable lab or we or others had replicated it.
An approach to how go about it is to include a replication package with the paper, including the dataset... This should be regarded as standard practice today, as sharing something was never easier. However, adding a replication package is still done by the minority of researchers...
I can understand why journals don’t publish studies which don’t find anything. But they really should publish studies that are unable to replicate previous findings. If the original finding was a big deal, its potential nullification should be equally noteworthy.
While I would have agree with that when I was younger. I learned there is a lot of possibilities why PhD students (the guys who do studies) fail to replicate anything (and I am talking about fundamental solid engineering).
this was exactly my experience and I remember the paper that I read that finally convinced me. It turns out the author had intentionally omitted a key step that made it impossible to reproduce the results, and only extremely careful reading and some clever guessing found the right step.
There are several levels of peer review. I've definitely been a reviwer on papers where the reviewers requested everything required and reproduced the experiment. That's extremely rare.
Their username is publicly linked to their real-life identity. Revealing the name and institution has a reasonable chance of provoking a potentially messy dispute in real life. Maybe eob has justice on their side, but picking fights has a lot of downsides, especially if your evidence is secondhand.
From what I have read, peer review was a system that worked when academia and the scientific world were much smaller and much more like "a small town." It seems to me like growth has caused sheer numbers to make that system game-able and no longer reliable in the way it once was.
may i ask what field of knowledge the manipulated paper was from? Your page lists CS/NLP, so that field may also be linguistics or neurology (linguistics which would be easier to swallow for me) https://scholar.google.com/citations?user=FMScFbwAAAAJ&hl=en
Some wider questions would be: Are there similar problems in Mathematics/physics versus the life sciences/other social sciences? Are there the same kind of problems across different fields of study?
Also i wonder if replication issues would be less severe if there was a requirement to publish the software and raw data that any study is based on as open source / data. It is possible that a change in this direction would make it more difficult to manipulate the results (after all it's the public who paid for the research, in most cases)
I worked at a prestigious physics lab working for the top researcher in a field. It absolutely happens there and probably everywhere.
The only way to fix replication issues is to give financial and career incentives for doing replication work. Right now there are few carrots and many sticks.
Frankly, sir, it is the reason you wish your anecdote to remain anonymous that such perfidy survives. If these traitors to human reason and the public’s faith in their interests serving the general welfare - after all who is the one feeding them? - became more public, perhaps there would be less fraudulence? But I suppose you have too much to lose? If so, why do you surround yourself in the company of bad men?
The issue is that the authors of bad papers still participate in the peer-review process. If they are the only expert reviewers and you do not pay proper respect to their work, they will squash your submission. To avoid this, papers can propagate mistakes for a long time.
Personally, I'm always very careful to cite and praise work by "competing" researchers even when that work has well-known errors, because I know that those researchers will review my paper and if there aren't other experts on the review committee the paper won't make it. I wish I didn't have to, but my supervisor wants to get tenured and I want to finish grad school, and for that we need to publish papers.
Lots of science is completely inaccessible for non-experts as a result of this sort of politics. There is no guarantee that the work you hear praised/cited in papers is actually any good; it may have been inserted just to appease someone.
I thought that this was something specific to my field, but apparently not. Leaves me very jaded about the scientific community.
What is it that makes you have a nice career in research? Is it a robust pile of publishing or is it a star finding? Can you get far on just pure volume?
I want to answer the question "if I were a researcher and were willing to cheat to get ahead, what should be the objective of my cheating?"
I suppose it depends on how you define nice? If you cheat at some point people will catch on, even if you don't face any real consequences. So if you want prestige within your community then cheating isn't the way to go.
If you want to look impressive to non-experts and get lots of grant money/opportunities, I'd go for lots of straightforward publications in top-tier venues. Star findings will come under greater scrutiny.
For grants and tenure, 100 tiny increments over 10 years are much better for your research career then 1 major paper in 5 years that is better than all of them put together.
If you want to write a pop book and on TV and sell classes, you need one interesting bit of pseudoscience and a dozen followup papers using the same bad methodology.
You don't make a nice career in a vacuum. With very few exceptions, you don't get star findings in a social desert. You get star findings by being liked by influential supervisors who are liked by even more influential supervisors.
>Lots of science is completely inaccessible for non-experts as a result of this sort of politics
As a non-expert, this is not the type of inaccessibility that is relevant to my interests.
"Unfortunately, alumni do not have access to our online journal subscriptions and databases because of licensing restrictions. We usually advise alumni to request items through interlibrary loan at their home institution/public library. In addition, under normal circumstances, you would be able to come in to the library and access the article."
This may not be technically completely inaccessible. But it is a significant "chilling effect" for someone who wants to read on a subject.
Some journals allow you to specify reviewers to exclude. True that there is no guarantee about published work being good, but that is likely more about the fact that it takes time to sort out the truth than about nefarious cabals of bad scientist.
I think the inaccessibility is for different reasons, most of which revolve around the use of jargon.
In my experience, the situation is not so bad. It is obvious who the good scientist are and you can almost always be sure that if they wrote it it's good.
In many journals it's abuse of process to exclude reviewers you don't like. Much of the times this is supposed to be used to declare conflicts of interest based on relationships you have in the field.
Why do people need to publish? The whole point of publishing was content discovery. Now that you can just push it to a preprint or to your blog what’s the point? I’ve written papers that weren’t published but still got cited.
I need money to do research, available grants require achieving specific measurable results during the grant (mostly publications fitting specific criteria e.g. "journal that's rated above 50% of average citation rating in your subfield" or "peer reviewed publication that's indexed in SCOPUS or WebOfScience", definitely not a preprint or blog), and getting one is also conditional on earlier publications like that.
In essence, the evaluators (non-scientific organizations who fund scientific organizations) need some metric to compare and distinguish decent research from weak, one that's (a) comparable across fields of science; (b) verifiable by people outside that field (so you can compare across subfields); (c) not trivially changeable by the funded institutions themselves; (d) describable in an objective manner so that you can write up the exact criteria/metrics in a legal act or contract. There are NO reasonable metrics that fit these criteria; international peer-reviewed-publications fitting certain criteria are bad but perhaps least bad from the (even worse) alternatives like direct evaluation by government committees.
At some point, there's not going to be enough budget for both the football coach and the Latin philology professor. We should hire another three layers of housing diversity deans just to be safe.
What’s crazy to me is nothing should stop an intelligent person from submitting papers, doing research, etc. even outside the confines of academia and having a PhD. But in practice you will never get anywhere without such things because of the politics involved and the incestuous relationship between the journals and their monetarily-uncompensated yet prestige-hungry army of researchers enthralled to the existing system.
If you add 'self funded' to this hypothetical person, then it would not matter if they play any games. Getting published is really not that hard if your work is good. And if it is good it will get noticed (hopefully during the hypothetical person's lifetime). Conferences have less of these games in my experience and would help.
Also, I know of no researchers personally who are enthralled by the existing system.
This has nothing to do with gatekeeping. I agree that the current publication and incentive system is broken, but it's completely unrelated to the question if outsiders are being published. The reason why you see very little work from outsiders is because research is difficult. It typically requires years of full time dedicated work, you can't just do it on the side. Moreover, you need to first study and understand the field to identify the gaps. If you try to identify gaps on your own, you are highly likely to go off into a direction which is completely irrelevant.
BTW I can tell you the the vast majority of researchers are not "enthralled" by the system, but highly critical. They simply don't have a choice but to work with it.
I think this is a bit naive. One thing that stops a smart person doing research without a PhD is that it takes a long time to learn enough to be at the scientific frontier where new research can be done. About a PhD’s length of time, in fact. So, many people without a PhD who try to do scientific research are cranks. I don’t say all.
Some quality journals and conferences have double blind reviews now. So the work is reviewed without knowing who the work belongs to. It's not so much the politics of the system as the skills required to write a research paper being hard to learn outside of a PhD. You need to understand how to identify a line of work in a very narrow field so that you can cite prior work and demonstrate a proper understanding of how your work compares and contrasts to other closely related work. That's an important part of demonstrating your work is novel and it's hard to do (especially for the first time) without expert guidance. Most students trying this for the first time cite far too broadly (getting work that's somewhat related but not related to the core of their ideas) and miss important closely related work.
You're like that Chinese sports boss who was arrested for corruption and complained that it would be impossible to do his job without participating in bribery. Just because you stand to personally gain from your corrupt practices doesn't excuse them. If anything, it makes them morally worse!
I don't tell lies about bad papers, only give some peremptory praise so that reviewers don't have ammunition to kill my submission. E.g. if a paper makes a false contribution X and a true contribution Y, I only mention Y. If I were to say "So-and-so claimed X but actually that's false" I would have to prove it, and unless it's a big enough issue to warrant its own paper, I don't want to prove it. Any ways, without having the raw data, source code etc for the experiments, there is no way for me to prove that X is false (I'm not a mathematician). Then the reviews will ask why I believe X is not true when peer-review accepted it. Suddenly all of my contributions are out the window, and all anybody cares about is X.
The situation is even worse when the paper claiming X underwent artifact review, where reviewers actually DID look at the raw data and source code but simply lacked the attention or expertise to recognize errors.
I don’t really buy the comparison entirely. Presumably the sports boss is doing something patently illegal, and obviously there are alternative career paths. OP is working in academia, which is socially acceptable, and feels that this is what is normal in their academic field, necessary for their goals, and isn’t actively harmful.
I wouldn’t necessarily condone the behavior, but what would you do in the situation? To always whistleblow whenever something doesn’t feel right and risk the politics? To quit working in the field if your concerns aren’t heard? To never cite papers that have absolutely any errors? I think it’s a tough situation and not productive to say OP isn’t behaving morally.
More specifically, this paper is focused on the social sciences. That's not to say that this isn't present in the basic sciences either.
But one other thing to note here is that these headlines about a "replication crisis" seems to imply that this is a new phenomenon. Let's not forget the history of the electron charge. As Feynman said:
"We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.
Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of—this history—because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that ..."
Something that I think the physical sciences benefit from is the ability to look at a problem from more than one angle. For instance, the stuff that we think is the most important, such as the most general laws, is supported by many different kinds of measurements, plus the parallel investigations of theoreticians. A few scattered experiments could bite the dust, like unplugging one node in a mesh network, and it could either be ignored or repaired.
The social sciences face the problem of not having so many different possible angles, such as quantitative theories or even a clear idea of what is being tested. Much of the research is engaged in the collection of isolated factoids. Hopefully something like a quantitative theory will emerge, that allows these results to be connected together like a mesh network, but no new science gets there right away.
The other thing is, to be fair, social sciences have to deal with noisy data, and with ethics. There were things I could do to atoms in my experiments, such as deprive them of air and smash them to bits, that would not pass ethical review if performed on humans. ;-)
Your example of looking at a problem from more than one angle made me think of the problem of finding the Hubble constant that describes the rate of expansion of the universe. There are two recent methods which have different estimates for this rate of expansion.
> More specifically, this paper is focused on the social sciences.
No, it isn't. It looked at a few different fields, and found that the problem was actually worse for general science papers published in Nature/Science, where non-reproducible papers were cited 300 times more often as reproducible ones.
I think you might be mistaken. The study of Nature/Science papers was "Evaluating replicability of social science experiments published in Nature and Science between 2010 and 2015"
Feynman's examples is of people being more critical about certain issues. A better example is the case of "radiation" that could only be seen in a dark room in the corner of your eye, which turned out to be a human visual artifact and wishful thinking.
It's interesting that according to the Wikipedia article it's not entirely certain whether the radiation is producing actual light or just the sensation of light.
I worked in the academic world for two years. What I saw was that lots of people are under a constant pressure to publish, and quantity is often put above quality.
I've seen papers without any sort of value or reason to exist being bruteforced through reviewing just to avoid some useless junk data of no value whatsoever being wasted, all to just add a line on someone's CV.
This is without saying that some Unis are packed of totally incompetent people that only got to advance their careers by always finding a way to piggyback on someone else's paper.
The worst thing I've seen is that reviewing papers is also often offloaded to newly graduated fellows, which are often instructed to be lenient when reviewing papers coming from "friendly universities".
The level of most papers I have had the disgrace to read is so bad it made me want to quit that world as soon as I could.
I got to the conclusion the whole system is basically a complex game of politics and strategy, fed by a loop in which bad research gets published on mediocre outlets, which then get a financial return by publishing them. This bad published research is then used to justify further money being spent on low quality rubbish work, and the cycle continues.
Sometimes you get to review papers that are so comically bad and low effort they almost feel insulting on a personal level.
For instance, I had to reject multiple papers not only due their complete lack of content, but also because their English was so horrendous they were basically unintelligible.
Quality is definitely above quantity in academia in almost all disciplines. The issue is citation count is used as a proxy for quality and its a poor one in many respects.
Until this is fixed, people need to stop saying "listen to The Science", in an attempt to convince others of a given viewpoint. Skeptical people already distrust our modern scientific institutions; not completely obviously, but definitely when they're cited as a cudgel. Continued articles like this should make everyone stop and wonder just how firm the supposed facts are, behind their own favoured opinions. We need a little more humility about which scientific facts are truly beyond reproach.
We also need to listen to the science on things that are clearly established. The replication crisis is not something that affects almost anything in public debate. Evolution is well established science. Large parts of Climate Change are well established science. Etc.
Yea, evolution and climate change are pretty solid. But the projected models do no favor to climate science. Those are bogus. They've mostly been wrong in the past and will be mostly wrong again in the future. There are way too many variables and unknowns, and those accumulate the further out the model reaches.
And your evidence for that is what exactly? Since this replication problem is already known to appear in multiple disciplines, it's quite likely that the same misconduct is happening in other areas too. I think you're being a little too quick to hope that it doesn't affect those areas where you have a vested interest.
"Believe science" is incredibly destructive to the entire field. It is quite literally turning science into a religion. Replacing the scientific method with "just believe what we say, how dare you question the orthodoxy". We're back to church and priests in all but name.
In the main people don't literally mean it that way. they're expressing belief int he validity of the scientific method, but the more they explaina nd justify the scientific method the more time consuming it is. When dealing with stupid people or obdurate trolls, the sincere adherent of the scientific method can be tricked into wasting a great deal of time by being peppered with questions that are foolish or posed in bad faith.
You're free to treat it as unquestionable, but the fact remains there is ample evidence of our scientific process being deeply broken with a really bad incentive structure if nothing else.
If you think there is no incorrect science at all in regards to evolution and climate change you're no better than the zealots of any religion.
Perhaps the government should have a team of people who randomly try to replicate science papers that are funded by the government.
The government can then reduce funding to institutions that have too high a percentage of research that failed to be replicated.
From that point the situation should resolve itself as institutions wouldn’t want to lose funding - so they’d either have an internal group replicate before publishing or coordinate with other institutions pre-publish.
This sounds like doubling down on the approach was causing the problems.
The desire to control and incentivize researchers to compete against each other in order to justify their salary is understandable, but it looks like it has been blown so out of proportions lately that it's doing active harm. Most researchers start their career pretty self-motivated to do good research.
Installing another system to double-check every contribution will just increase the pressure to game the system in addition to doing research. And replicating a paper may sometimes cost as much as the original research, and it's not clear when to stop trying. How much collaboration with the original authors are you supposed to do, if you fail to replicate? If you are making decisions about their career, you will need some system to ensure it's not arbitrary, etc.
While I agree that "most" researchers start out with good intentions, I'm afraid I've directly and indirectly witnessed so many examples of fraud, data manipulation, wilful misrepresentation and outright incompetence, that I think we need some proper checks and balances put in place.
When people deliberately fake lab data to further their career, and that fake data is used to perform clinical trials on actual people, that's not just fraudulent, it's morally destitute. Yet this has happened.
People deliberately use improper statistics all the time to make their data "significant". It's outright fraud.
I've seen people doing sloppy work in the lab, and when questioning them, was told "no one cares so long as it's publishable". Coming from industry, where quality, accuracy and precision are paramount, I found the attitude shocking and repugnant. People should take pride and care in their work. If they can't do that, they shouldn't be working in the field.
PIs don't care so long as things are publishable. They live in wilful ignorance. Unless they are forced to investigate, it's easiest not to ask any questions and get unpleasant answers back. Many of them would be shocked if they saw the quality of work done by their underlings, but they live in an office and rarely get directly involved.
I've since gone back to industry. Academia is fundamentally broken.
When you say "double-checking" won't solve anything, I'd like to propose a different way of thinking about this:
* lab notebooks are supposed to be kept as a permanent record, checked and signed off. This rarely happens. It should be the responsibility of a manager to check and sign off every page, and question any changes or discrepancies.
* lab work needs independent validation, and lab workers should be able to prove their competence to perform tasks accurately and reproducibly; in industry labs do things like sending samples to reference labs, and receiving unknown samples to test, and these are used to calculate any deviation from the real value both between the reference lab and others in the same industry. They get ranked based upon their real-world performance.
* random external audits to check everything, record keeping, facilities, materials, data, working practices, with penalties for noncompliance.
Now, academic research is not the same as industry, but the point I'm making here is that what's largely missing here is oversight. By and large, there isn't any. But putting it in place would fix most of the problems, because most of the problems only exist because they are permitted to flourish in the absence of oversight. That's a failure of management in academia, globally. PIs aren't good managers. PIs see management in terms of academic prestige, and expanding their research group empires, but they are incompetent at it. They have zero training, little desire to do it, and it could be made a separate position in a department. Stop PIs managing, let them focus on science, and have a professional do it. And have compliance with oversight and work quality part of staff performance metrics, above publication quantity.
This is what industry does though. That is in the less theoretical fields. If you actually want to make something that works, then you need to base your science on provable fact. Produce oil, build a cool structure, generate electricity. Based on amazing and complex science, but it has to work.
Conclusion is that the science that is done needs to be provable, but that means practical. Which is unfortunate. Because what about all that science that may be, or one day may be, practical?
The trouble is, industrial researchers usually don't publish negative results or failures to reproduce. So it takes a long time to correct the published scientific record even if privately some people know it's wrong.
This is like the xkcd test for weird science: Is some big boring company making billions with it? If so (quantum physics) then it’s legit. If not (healing crystals, orgone energy, essential oils...) it probably doesn’t work.
Research is non-linear and criteria based evaluation is lacking in perspective. You might throw away the baby with the bathwater. Advancement of science follows a deceptive path. Remember how the inventor of the mRNA method was shunned at her university just a few years ago? Because of things like that millions might die, but we can't tell beforehand which scientist is a visionary and who's a crackpot. If you close funding to seemingly useless research you might cut the next breakthrough.
I don't really see why "being shunned" or "being a visionary" has anything to do with this, to be honest. If you set up a simple rule: "the results have to be reproducable", then surely it shouldn't matter whether or not the theory is considered "crackpot" or "brilliant"?
I don't really like the idea of 'replication police', I think it would increase pressure on researchers who are doing their job of pushing the boundaries of science.
However, I think there is potential in taking the 'funded by the government' idea in a different direction. Having a publication house that was considered a public service, with scientists (and others) employed by the government and working to review and publish research without commercial pressures could be a way to redirect the incentives in science.
Of course this would be expensive and probably difficult to justify politically, but a country/bloc that succeeded in such long term support for science might end up with a very healthy scientific sector.
- You would need some sort of barrier preventing movement of researchers between these audit teams and the institutions they are supposed to audit otherwise there would be a perverse incentive for a researcher to provide favorable treatment to certain institutions in exchange for a guaranteed position at said institutions later on. You could have an internal audit team audit the audit team, but you quickly run into an infinitely recursive structure and we'd have to question whether there would even be sufficient resources to support anything more than the initial team to begin with.
- From my admittedly limited experience as an economics research assistant in undergrad, I understood replication studies to be considered low-value projects that are barely worth listing on a CV for a tenure-track academic. That in conjunction with the aforementioned movement barrier would make such an auditing researcher position a career dead-end, which would then raise the question of which researchers would be willing to take on this role (though to be fair there would still be someone given the insane ratio of candidates in academia to available positions). The uncomfortable truth is that most researchers would likely jump at other opportunities if they are able to and this position would be a last resort for those who aren't able to land a gig elsewhere. I wouldn't doubt the ability of this pool of candidates to still perform quality work, but if some of them have an axe to grind (e.g. denied tenure, criticized in a peer review) that is another source of bias to be wary of as they are effectively being granted the leverage to cut off the lifeline for their rivals.
- You could implement a sort of academic jury duty to randomly select the members of this team to address the issues in the last point, which might be an interesting structure to consider further. I could still see conflict-of-interest issues being present especially if the panel members are actively involved in the field of research (and from what I've seen of academia, it's a bunch of high-intellect individuals playing by high school social rules lol) but it would at least address the incentive issue of self-selection. Perhaps some sort of election structure like this (https://en.wikipedia.org/wiki/Doge_of_Venice#:~:text=Thirty%....) could be used to filter out conflict of interest, but it would make selecting the panel a much more involved and time-consuming process.
The "Jury Duty" could easily be implemented in the existing grant structure - condition some new research grant on also doing an audit of some previous grant in your field (and fund it as part of the grant).
Depending how big the stick is and how it's implemented, this might push people away from novel exploratory research that has a lower chance of replicating despite best efforts.
Pulling up the actual paper, there is an added part the article doesn't mention.
> Prediction markets, in which experts in the field bet on the replication results before the replication studies, showed that experts could predict well which findings would replicate (11).
So it's even stating that this isn't completely innocent, given different incentives most reviewers identify a suspicious study, but under current incentives it seems letting it through due to the novelty is somehow warranted.
My whole perception of academia and peer review changed that day.
Edit to elaborate: like many of our institutions, peer review is an effective system in many ways but was designed assuming good faith. Reviewers accept the author’s results on faith and largely just check to make sure you didn’t forget any obvious angles to cover and that the import of the work is worth flagging for the whole community to read. Since there’s no actual verification of results, it’s vulnerable to attack by dishonesty.
When I got into university and started alternating studying and work, I realised just how incredibly clueless even adults are. The "let's just try something and hope nothing bad happens" attitude permeates everything.
It's really a miracle the civilisation works as well as it does.
The upshot is that if something seems stupid, it probably is and can be improved.
It has to work well enough to… work… and reproduce. That’s it. It’s not “survival of the fittest.” It’s “survival of a randomized subset of the fit.”
There’s even a set of thermodynamic arguments to the effect that systems are unlikely to exceed such minimum requirements for a given threshold. For example, if we are visited by interstellar travelers they are likely to be the absolute dumbest and most dysfunctional possible examples of beings capable of interstellar travel since anything more is a less likely thermodynamic state.
So much for Star Trek toga wearing utopian aliens.
So after a certain time spent, you are left with a choice of ‘massaging’ the data to get some results, or not and getting left behind those that do or were luckier in their research.
I think this all the time.
Still, reading your comment makes me despair. It plants a nagging doubt in my mind, "how many of these zillion studies cited that are actually replicable?" This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.
What are the solutions here? A big incentive-shift to reward replication more? Public shaming of misleading studies? Influential conferences giving more air-time for talks about "studies that did not replicate"? I know some of these happen at a smaller-scale[1], but I wonder about the "scaling" aspect (to use a very HN-esque term).
PS: Since I read Behave by Sapolsky — where he says "your prefrontal cortex [which plays critical role in cognition, emotional regulation, and control of impulsive behavior] doesn't come online until you are 24" — I tend to take all studies done on university campuses with students younger than 24 with a good spoon of salt. ;-)
[1] https://replicationindex.com/about/
It’s probably not all bullshit but I would bet a double digit percentage of it is.
I think the problem is much bigger than simply a binary is it replicable or not. It’s extremely easy to find papers by “leading experts” that have valid data with replicable results where the conclusions have been generalized beyond the experiments. The media does this more or less by default when reporting on scientific results, but researchers do it themselves to a huge degree, use very specific conditions and results to jump to a wider conclusion that is not actually supported by the results.
A high profile example of this is the “Dunning Kruger” effect; the data in paper did not show what the flowery narrative in the paper claimed to show, but there’s no reason to think they falsified the results. Some researchers have reproduced the results, as long as the conditions were very similar. Other researchers have tried to reproduce the results under different conditions that should have worked according to the paper’s narrative and conclusions, but found that they could not reproduce, because there were specific factors in the original experiment that were not discussed in the original paper’s conclusions -- in other words, Dunning and Kruger overstated what they measured such that the conclusion was not true. They both enjoyed successful academic careers and some degree of academic fame as a result of this paper that is technically reproducible but not generally true.
To make matters worse, the public has generally misinterpreted and misunderstood even the incorrect conclusions the authors stated, and turned it into something else. Almost never in discussions where the DK effect is invoked do people talk about the context or methodology of the experiments, or the people who participated in them.
This human tendency to tell a story and lose the context and details and specificity of the original evidence, the tendency to declare that one piece of evidence means there is a general truth, that is scarier to me than whether papers are replicable or not, because it casts doubt on all the replicable papers too.
Out of curiosity, what's the title of the book?
There's obviously more complexity than this, but I believe that if even a relatively small percentage of the population started thinking like this (particularly, influential people) it could make a very big difference.
Unfortunately, this seems to be extremely counter to human nature and desires - people seem seem compelled to form conclusions, even when it is not necessary ("Do people have ideas, or do ideas have people?").
https://www.nature.com/news/failed-replications-put-stap-ste...
Deleted Comment
Isn't that the moment where you try even harder to falsify the claims in that paper? You already know that you'll succeed so it wouldn't be a waste of time in your effort.
The main problem is that even if you reproduce their experiment, they can claim that you did some step wrong, perhaps you are mixing it too fast or too slow, or the temperature is not correctly controlled, or that one of your reactive have a contamination that destroy the effect, or magically realize that their reactive that is important.
It's very difficult to publish papers with negative results. So there is a high chance it will not count in your total number of publications. Also, expect a low number of citation, so it's not useful for other metrics like citation count or h.
For the same reason, you will not see publications of exact replications. A good paper X will be followed by almost-replications by another teams, like "we changed this and got X with a 10% improvement" or "we mixed the methods of X and Y and unsurprisingly^W got X+Y". This is somewhat good because it shows that the initial result is robust enough to survive small modifications.
Assuming not good faith for peer review would make academia more interesting, only way would probably for the peer reviewer go to the lab and get live measurements shown. Then check the equipment...
One reasonable approach might be to look at which group has produced the 'best' research over the past few years. But how do you judge that in a way that seems fair? Once you have a criteria to judge that, then people will start to game that criteria.
Or taking a step up, The university needs to save money. How do you judge if the Chemistry department or the Computer Science department should have its funding cut.
No matter how you slice it at some point you're going to need a way for someone to judge which of two departments is producing the 'best' research and thus deserves more money, and that will incentivize people to game that metric.
One problem is PhD degrees are too costly to those who don't get academic or industrial success from them. But as long as talented people are willing to try to become a professor I don't see the system changing.
In the past people who did science could do so with less personally on the line. In the early days you had men of letters like Cavendish who didn't really need to care if you liked what he wrote, he'd be fine without any grants. That obviously doesn't work for everyone, but then the tenure system developed for a similar reason: you have to be able to follow an unproductive path sometimes without starving. And that can mean unproductive in that you don't find anything or in that your peers don't rate your work. There'd be a gap between being a young researcher and tenured, sure.
Nowadays there's an army of precariously employed phds and postdocs. Publish or perish is a trope. People get really quite old while still being juniors in some sense, and during that time everyone is thinking "I have to not jeopardise my career".
When you have a system where all the agents are under huge pressure, they adapt in certain ways: take safer bets, write more papers from each experiment, cooperate with others for mutual gain, congregate around previous winners, generally more risk reducing behaviour.
Perhaps the thing to do is make a hard barrier: everyone who wants to be a researcher needs to get tenure after undergrad, or not at all. (Or after masters or whatever, I wouldn't know.) Those people then get a grant for life. It will be hard to get one of these, but it will be clear if you have to give up. Lab assistants and other untenured staff know what they are negotiating for. Tenured young people can start a family and not have the rug pulled out when they write something interesting.
For me a better solution would be to properly incentivise replication work and solid scientific principles. If repeating an experiment and getting a contradictory result carried the same kudos as running the original experiment then I think we'd be in a healthier place. Similarly if doing the 'scientific grind work' of working out mistakes in experimental practice that can affect results and, ultimately, our understanding of the universe around us.
I think an analogy with software development works pretty well: often the incentives point towards adding new features above all else. Rarely is sitting down and grinding through the litany of small bugs prioritised, but as any dev will tell you doing that grind work is as important otherwise you'll run in to a wall of technical debt and the whole thing will come tumbling down.
One perspective is that, “knowledge generation wise,” the current system really does work from a long term perspective. Evolutionary pressure keeps the good work alive while bad work dies. Like that [Top Institution] paper: if nobody else could reproduce it, then the ideas within it die because nobody can extend the work.
But that comes at the heavy short term cost of good researchers getting duped into wasting time and bad researchers seeing incentives in lying. Which will make academia less attractive to the kind of people that ought to be there, dragging down the whole community.
https://nintil.com/newton-hypothesis
https://news.ycombinator.com/item?id=25787745
Due to career and other reasons, there is a publish or perish crisis today.
Maybe we can do better by accepting not everyone can publish ground breaking results, and it's okay.
There are lots of incompetent people in academia, who later go to upper positions and decide your promotions by citation counts and how much papers you published. I have no realistic ideas how to counter this.
We need to create new a social institution of Anti-Science, which would work on other stimuli correlated with the amount of refuted articles. No tenures, no long-term contracts. If anti-scientist wished to have income it would need to refute science articles.
Create a platform allowing to hold a scientific debate between scientists and anti-scientists, for a scientist had an ability to defend his/her research.
No need to do anything special to prosecute, because Science is a very competitive, and availability of refutations would be used inevitable to stop career progressions of authors of refuted articles.
Data manipulation generally doesn't happen by changing values in a data frame. It's done by running and rerunning similar models with slightly different specifications to get a P value under .05, or by applying various "manipulations" to variables or the models themselves for the same effect. It's much easier to identify this when you have the code that was used to recreate whatever was eventually published.
I personally favor requirements which call for bundling raw datasets with the "papers". The data storage and transmission is very cheap now so there isn't a need to restrict ourselves to just texts. We should still be able to check all of the thrown out "outliers" from the datasets. An aim should be to make the tricks for massaging data nonviable. Even if you found your first data set was full of embarassing screw ups due to doing it hungover and mixing up step order it could be helpful to get a collection of "known errors" to analyze. Optimistically it could also uncover phenomenon scientests thought was them screwing up like say cosmic background radiation being taken as just noise and not really there.
Paper reviewing is already a problem but adding some transparency should help.
Something like a well funded ten year campaign to do peer review, retrying experiments and publishing papers on why results are wrong.
I have a co-worker who had a job than involved publishing research papers. Based on his horror stories it seems like the most effective course of action is to attack the credibility of those who fudges results.
There will always be cases of fraud if someone deeps deeply enough into large institutions. That doesn't actually indicate that there is a problem.
Launching in to change complex systems like the research community based on a couple of anecdotes and just-so stories is a great way not actually achieving anything meaningful. There needs to be a very thorough, emotionally and technically correct enumeration of what the actual problem(s) are.
When I was in graduate school papers from one lab at Harvard were know to be “best case scenario”. Other labs had a rock solid reputation - if they said you could do X with their procedure, you could bet on it.
So basically we treated every claim as potential BS unless it came from a reputable lab or we or others had replicated it.
There are several levels of peer review. I've definitely been a reviwer on papers where the reviewers requested everything required and reproduced the experiment. That's extremely rare.
Deleted Comment
Some wider questions would be: Are there similar problems in Mathematics/physics versus the life sciences/other social sciences? Are there the same kind of problems across different fields of study?
Also i wonder if replication issues would be less severe if there was a requirement to publish the software and raw data that any study is based on as open source / data. It is possible that a change in this direction would make it more difficult to manipulate the results (after all it's the public who paid for the research, in most cases)
The only way to fix replication issues is to give financial and career incentives for doing replication work. Right now there are few carrots and many sticks.
Dead Comment
Personally, I'm always very careful to cite and praise work by "competing" researchers even when that work has well-known errors, because I know that those researchers will review my paper and if there aren't other experts on the review committee the paper won't make it. I wish I didn't have to, but my supervisor wants to get tenured and I want to finish grad school, and for that we need to publish papers.
Lots of science is completely inaccessible for non-experts as a result of this sort of politics. There is no guarantee that the work you hear praised/cited in papers is actually any good; it may have been inserted just to appease someone.
I thought that this was something specific to my field, but apparently not. Leaves me very jaded about the scientific community.
I want to answer the question "if I were a researcher and were willing to cheat to get ahead, what should be the objective of my cheating?"
If you want to look impressive to non-experts and get lots of grant money/opportunities, I'd go for lots of straightforward publications in top-tier venues. Star findings will come under greater scrutiny.
If you want to write a pop book and on TV and sell classes, you need one interesting bit of pseudoscience and a dozen followup papers using the same bad methodology.
"Academic politics is the most vicious and bitter form of politics, because the stakes are so low."
https://en.wikipedia.org/wiki/Sayre%27s_law
As a non-expert, this is not the type of inaccessibility that is relevant to my interests.
"Unfortunately, alumni do not have access to our online journal subscriptions and databases because of licensing restrictions. We usually advise alumni to request items through interlibrary loan at their home institution/public library. In addition, under normal circumstances, you would be able to come in to the library and access the article."
This may not be technically completely inaccessible. But it is a significant "chilling effect" for someone who wants to read on a subject.
I think the inaccessibility is for different reasons, most of which revolve around the use of jargon.
In my experience, the situation is not so bad. It is obvious who the good scientist are and you can almost always be sure that if they wrote it it's good.
In essence, the evaluators (non-scientific organizations who fund scientific organizations) need some metric to compare and distinguish decent research from weak, one that's (a) comparable across fields of science; (b) verifiable by people outside that field (so you can compare across subfields); (c) not trivially changeable by the funded institutions themselves; (d) describable in an objective manner so that you can write up the exact criteria/metrics in a legal act or contract. There are NO reasonable metrics that fit these criteria; international peer-reviewed-publications fitting certain criteria are bad but perhaps least bad from the (even worse) alternatives like direct evaluation by government committees.
Deleted Comment
Also, I know of no researchers personally who are enthralled by the existing system.
BTW I can tell you the the vast majority of researchers are not "enthralled" by the system, but highly critical. They simply don't have a choice but to work with it.
(There's lots of crowding out happening, of course, from the government subsidized science. But that can't be helped at the moment.)
What you've described sounds like something that is not, in any sense, science.
From your perspective, what can be done to return the scientific method to the forefront of these proceedings?
The situation is even worse when the paper claiming X underwent artifact review, where reviewers actually DID look at the raw data and source code but simply lacked the attention or expertise to recognize errors.
I'm not taking bribes, I'm paying a toll.
I wouldn’t necessarily condone the behavior, but what would you do in the situation? To always whistleblow whenever something doesn’t feel right and risk the politics? To quit working in the field if your concerns aren’t heard? To never cite papers that have absolutely any errors? I think it’s a tough situation and not productive to say OP isn’t behaving morally.
But one other thing to note here is that these headlines about a "replication crisis" seems to imply that this is a new phenomenon. Let's not forget the history of the electron charge. As Feynman said:
"We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher. Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of—this history—because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that ..."
https://en.wikipedia.org/wiki/Oil_drop_experiment#Millikan.2...
The social sciences face the problem of not having so many different possible angles, such as quantitative theories or even a clear idea of what is being tested. Much of the research is engaged in the collection of isolated factoids. Hopefully something like a quantitative theory will emerge, that allows these results to be connected together like a mesh network, but no new science gets there right away.
The other thing is, to be fair, social sciences have to deal with noisy data, and with ethics. There were things I could do to atoms in my experiments, such as deprive them of air and smash them to bits, that would not pass ethical review if performed on humans. ;-)
PBS Space time has an excellent video on the topic: https://www.youtube.com/watch?v=72cM_E6bsOs
No, it isn't. It looked at a few different fields, and found that the problem was actually worse for general science papers published in Nature/Science, where non-reproducible papers were cited 300 times more often as reproducible ones.
https://en.wikipedia.org/wiki/Cosmic_ray_visual_phenomena
It's interesting that according to the Wikipedia article it's not entirely certain whether the radiation is producing actual light or just the sensation of light.
I've seen papers without any sort of value or reason to exist being bruteforced through reviewing just to avoid some useless junk data of no value whatsoever being wasted, all to just add a line on someone's CV.
This is without saying that some Unis are packed of totally incompetent people that only got to advance their careers by always finding a way to piggyback on someone else's paper.
The worst thing I've seen is that reviewing papers is also often offloaded to newly graduated fellows, which are often instructed to be lenient when reviewing papers coming from "friendly universities".
The level of most papers I have had the disgrace to read is so bad it made me want to quit that world as soon as I could.
I got to the conclusion the whole system is basically a complex game of politics and strategy, fed by a loop in which bad research gets published on mediocre outlets, which then get a financial return by publishing them. This bad published research is then used to justify further money being spent on low quality rubbish work, and the cycle continues.
Sometimes you get to review papers that are so comically bad and low effort they almost feel insulting on a personal level.
For instance, I had to reject multiple papers not only due their complete lack of content, but also because their English was so horrendous they were basically unintelligible.
Maybe this is how the GPT "AI" can generate such similar results. Lol.
.... but each field is different. For those that are more quantitative, it's harder to deviate you conclusion from the data.
Bias is not binary, so it's a sliding scale between the hard sciences and the squishy ones.
You have to listen to the science, and also use the common sense that "this is as far as we know" and that knowledge today may change tomorrow.
Two comments below you use this "argument" to ask "evidence" for evolution and climate change. Big red flag.
If you think there is no incorrect science at all in regards to evolution and climate change you're no better than the zealots of any religion.
The government can then reduce funding to institutions that have too high a percentage of research that failed to be replicated.
From that point the situation should resolve itself as institutions wouldn’t want to lose funding - so they’d either have an internal group replicate before publishing or coordinate with other institutions pre-publish.
Anything I’m missing?
The desire to control and incentivize researchers to compete against each other in order to justify their salary is understandable, but it looks like it has been blown so out of proportions lately that it's doing active harm. Most researchers start their career pretty self-motivated to do good research.
Installing another system to double-check every contribution will just increase the pressure to game the system in addition to doing research. And replicating a paper may sometimes cost as much as the original research, and it's not clear when to stop trying. How much collaboration with the original authors are you supposed to do, if you fail to replicate? If you are making decisions about their career, you will need some system to ensure it's not arbitrary, etc.
When people deliberately fake lab data to further their career, and that fake data is used to perform clinical trials on actual people, that's not just fraudulent, it's morally destitute. Yet this has happened.
People deliberately use improper statistics all the time to make their data "significant". It's outright fraud.
I've seen people doing sloppy work in the lab, and when questioning them, was told "no one cares so long as it's publishable". Coming from industry, where quality, accuracy and precision are paramount, I found the attitude shocking and repugnant. People should take pride and care in their work. If they can't do that, they shouldn't be working in the field.
PIs don't care so long as things are publishable. They live in wilful ignorance. Unless they are forced to investigate, it's easiest not to ask any questions and get unpleasant answers back. Many of them would be shocked if they saw the quality of work done by their underlings, but they live in an office and rarely get directly involved.
I've since gone back to industry. Academia is fundamentally broken.
When you say "double-checking" won't solve anything, I'd like to propose a different way of thinking about this:
* lab notebooks are supposed to be kept as a permanent record, checked and signed off. This rarely happens. It should be the responsibility of a manager to check and sign off every page, and question any changes or discrepancies.
* lab work needs independent validation, and lab workers should be able to prove their competence to perform tasks accurately and reproducibly; in industry labs do things like sending samples to reference labs, and receiving unknown samples to test, and these are used to calculate any deviation from the real value both between the reference lab and others in the same industry. They get ranked based upon their real-world performance.
* random external audits to check everything, record keeping, facilities, materials, data, working practices, with penalties for noncompliance.
Now, academic research is not the same as industry, but the point I'm making here is that what's largely missing here is oversight. By and large, there isn't any. But putting it in place would fix most of the problems, because most of the problems only exist because they are permitted to flourish in the absence of oversight. That's a failure of management in academia, globally. PIs aren't good managers. PIs see management in terms of academic prestige, and expanding their research group empires, but they are incompetent at it. They have zero training, little desire to do it, and it could be made a separate position in a department. Stop PIs managing, let them focus on science, and have a professional do it. And have compliance with oversight and work quality part of staff performance metrics, above publication quantity.
However, I think there is potential in taking the 'funded by the government' idea in a different direction. Having a publication house that was considered a public service, with scientists (and others) employed by the government and working to review and publish research without commercial pressures could be a way to redirect the incentives in science.
Of course this would be expensive and probably difficult to justify politically, but a country/bloc that succeeded in such long term support for science might end up with a very healthy scientific sector.
- You would need some sort of barrier preventing movement of researchers between these audit teams and the institutions they are supposed to audit otherwise there would be a perverse incentive for a researcher to provide favorable treatment to certain institutions in exchange for a guaranteed position at said institutions later on. You could have an internal audit team audit the audit team, but you quickly run into an infinitely recursive structure and we'd have to question whether there would even be sufficient resources to support anything more than the initial team to begin with.
- From my admittedly limited experience as an economics research assistant in undergrad, I understood replication studies to be considered low-value projects that are barely worth listing on a CV for a tenure-track academic. That in conjunction with the aforementioned movement barrier would make such an auditing researcher position a career dead-end, which would then raise the question of which researchers would be willing to take on this role (though to be fair there would still be someone given the insane ratio of candidates in academia to available positions). The uncomfortable truth is that most researchers would likely jump at other opportunities if they are able to and this position would be a last resort for those who aren't able to land a gig elsewhere. I wouldn't doubt the ability of this pool of candidates to still perform quality work, but if some of them have an axe to grind (e.g. denied tenure, criticized in a peer review) that is another source of bias to be wary of as they are effectively being granted the leverage to cut off the lifeline for their rivals.
- You could implement a sort of academic jury duty to randomly select the members of this team to address the issues in the last point, which might be an interesting structure to consider further. I could still see conflict-of-interest issues being present especially if the panel members are actively involved in the field of research (and from what I've seen of academia, it's a bunch of high-intellect individuals playing by high school social rules lol) but it would at least address the incentive issue of self-selection. Perhaps some sort of election structure like this (https://en.wikipedia.org/wiki/Doge_of_Venice#:~:text=Thirty%....) could be used to filter out conflict of interest, but it would make selecting the panel a much more involved and time-consuming process.
Deleted Comment
Dead Comment
Dead Comment
Thank goodness our newsmedia business doesn't work that way, or we would be poorly-informed in multiple ways.
> Prediction markets, in which experts in the field bet on the replication results before the replication studies, showed that experts could predict well which findings would replicate (11).
So it's even stating that this isn't completely innocent, given different incentives most reviewers identify a suspicious study, but under current incentives it seems letting it through due to the novelty is somehow warranted.