Challenge to scientists: does your ten-year-old code still run?

This article brings up scientific code from 10 years ago, but how about code from .. right now? Scientists really need to publish their code artifacts, and we can no longer just say "Well they're scientists or mathematicians" and allow that as an excuse for terrible code with no testing specs. Take this for example:

https://github.com/mrc-ide/covid-sim/blob/e8f7864ad150f40022...

This was used by the Imperial College for COVID-19 predictions. It has race conditions, seeds the model multiple times, and therefore has totally non-deterministic results[0]. Also, this is the cleaned up repo. The original is not available[1].

A lot of my homework from over 10 years ago still runs (Some require the right Docker container: https://github.com/sumdog/assignments/). If journals really care about the reproducibility crisis, artifact reviews need to be part of the editorial process. Scientific code needs to have tests, a minimal amount of test coverage, and code/data used really need to be published and run by volunteers/editors in the same way papers are reviewed, even for non-computer science journals.

[0] https://lockdownsceptics.org/code-review-of-fergusons-model/

[1] https://github.com/mrc-ide/covid-sim/issues/179

djaque · 5 years ago

I am all for open science, but you understand that the links in your post are the exact worry people have when it comes to releasing code: people claiming that their non-software engineering grade code invalidates the results of their study.

I'm an accelerator physicist and I wouldn't want my code to end up on acceleratorskeptics.com with people that don't understand the material making low effort critiques of minor technical points. I'm here to turn out science, not production ready code.

As an example, you seem to be complaining that their Monte Carlo code has non-deterministic output when that is the entire point of Monte Carlo methods and doesn't change their result.

By the way, yes I tested my ten year old code and it does still work. What I'm saying is that scientific code doesn't need to handle every special case or be easily usable by non-experts. In fact the time spent making it that way is time that a scientist spends doing software engineering instead of science, which isn't very efficient.

solatic · 5 years ago

Let's be clear - scientific-grade code is a substandard of production-grade code. But it is still a real standard.

Does scientific-grade code need to handle a large number of users running it at the same time? Probably not a genuine concern, since those users will run their own copies of the code on their own hardware, and it's not necessary or relevant for users to see the same networked results from the same instance of the program running on a central machine.

Does scientific-grade code need to publish telemetry? Eh, usually no. Set up alerting so that on-call engineers can be paged when (not if) it falls over? Nope.

Does scientific-grade code need to handle the authorization and authentication of users? Nope.

Does scientific-grade code need to be reproducible? Yes. Fundamentally yes. The reproducibility of results is core to the scientific method. Yes, that includes Monte Carlo code, when there is no such thing as truly random number generation on contemporary computers, only pseudorandom number generation, and what matters for cryptographic purposes is that the seed numbers for the pseudorandom generation are sufficiently hidden / unknown. For scientific purposes, the seed numbers should be published on purpose, so that a) the exact results you found, sufficiently random as they are for the purpose of your experiment, can still be independently verified by a peer reviewer, b) a peer reviewer can intentionally decide to pick a different seed value, which will lead to different results but should still lead to the same conclusion if your decision to reject / refuse to reject the null hypothesis was correct.

mmmBacon · 5 years ago

Monte-Carlo can and should be deterministic and repeatable. It’s a matter of correctly initializing you random number generators and providing a known/same random seed from run to run. If you aren’t doing that, you aren’t running your Monte-Carlo correctly. That’s a huge red flag.

Scientists need to get over this fear about their code. They need to produce better code and need to actually start educating their students on how to write and produce code. For too long many in the physics community have trivialized programming and seen it as assumed knowledge.

Having open code will allow you to become better and you’ll produce better results.

Side note: 25 years ago I worked in accelerator science too.

ivanbakel · 5 years ago

Doesn't it concern you that it would be possible for critics to look at your scientific software and find mistakes (some of which the OP mentioned are not "minor") so easily?

Given that such software forms the very foundation of the results of such papers, why shouldn't it fall under scrutiny, even for "minor" points? If you are unable to produce good technical content, why are you qualified to declare what is or isn't minor? Isn't the whole point that scrutiny is best left to technical experts (and not subject experts)?

sitkack · 5 years ago

> exact worry people have when it comes to releasing code: people claiming that their non-software engineering grade code invalidates the results of their study.

If code is what is substantiating a scientific claim, then code needs to stand up to scientific scrutiny. This is how science is done.

I came from physics, but systems and computer engineering was always an interest of mine, even before physics, I thought it was kooky-dooks that CS people can release papers w/o code, fine if the paper contains all the proofs but otherwise it shouldn't even be looked at. PoS (proof-of-science) or GTFO.

We are the point in human and scientific civilization that knowledge needs to prove itself correct. Papers should be self contained execution environments that generate PDFs and resulting datasets. The code doesn't need to be pretty, or robust, but it needs to be sealed inside of a container so that it can be re-run, re-validated and someone else can confirm the result X years from now. And it isn't about trusting or not trusting the researcher, we need to fundamentally trust the results.

jnxx · 5 years ago

> I'm an accelerator physicist and I wouldn't want my code to end up on acceleratorskeptics.com with people that don't understand the material making low effort critiques of minor technical points. I'm here to turn out science, not production ready code.

Specifically, to that point, I want to cite the saying:

"The dogs bark, but the caravan passes."

(There is a more colorful German variant which is, translated: "What does it bother the mighty old oak tree if a dog takes a piss...").

Of course, if you publish your code, you expose it to critics. Some of this will be unqualified. And as we have seen in the case e.g. of climate scientists, some might be even nasty. But who cares? What matters is open discussion which is a core value of science.

throwaway7281 · 5 years ago

That's not how the game is played. If you cannot the release the code because the code is too ugly or untested or has bugs, how do you expect anyone with the right expertise to assess your findings?

It reminds me of Kerckhoffs's principle in cryptography, which states: A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

Jabbles · 5 years ago

I am interested to know the distinction between "production-ready" and "science-ready" code.

I do not think "non-experts" should be able to use your code, but I do think an expert who was not involved in writing it should be.

chrchang523 · 5 years ago

Nit: implementations of Monte Carlo methods are not necessarily nondeterministic. Whenever I implement one, I always aim for a deterministic function of (input data, RNG seed, parallelism, workspace size).

halfdan · 5 years ago

I have done research on Evolutionary Algorithm and numerical optimization. It was nigh impossible to reproduce poorly described algorithms from state of the art research at the time and researchers would very often not bother to reply to inquiries for their code. Even if you did get the code it would be some arcane C only compatible with a GCC from 1996.

Code belongs with the paper. Otherwise we can just continue to make up numbers and pretend we found something significant.

jacobwilliamroy · 5 years ago

Our first job as scientists is to make sure we're not fooling ourselves. I wouldn't just use any old scale to take a measurement. I want a calibrated scale, adjusted to meet a specific standard of accuracy. Such standards and calibrations ensure we can all get "the same" result doing "the same" thing, even if we use different equipment from different vendors. The concerns about code are exactly the same. It's even scarier to me because I realize that unlike a scale, most scientists have no idea how to calibrate their code to ensure accurate, reproducible results. Of course with the scales, the calibration is done by a specialized professional who's been trained to calibrate scales. Not sure how we solve this issue with the code.

woah · 5 years ago

I’m very puzzled by this attitude. As an accelerator physicist, would you want you accelerator to be held together by duct tape, and producing inconsistent results? Would you complain that you’re not a professional machinist when somebody pointed it out? Why is software any different than hardware in this respect?

enriquto · 5 years ago

> I wouldn't want my code to end up on acceleratorskeptics.com with people that don't understand the material making low effort critiques of minor technical points. I'm here to turn out science, not production ready code.

In what way do idiots making idiotic comments about your correct code invalidate your scientific production? You can still turn out science and let people read and comment freely on it.

> As an example, you seem to be complaining that their Monte Carlo code has non-deterministic output when that is the entire point of Monte Carlo methods and doesn't change their result.

I guess you would not need to engage personally with the idiots at "acceleratorskeptics.com", but likely most of their critique would be easily shut off by a simple sentence such as this one. Since most of your readers would not be idiots, they could scrutinize your code and even provide that reply on your behalf. This is called the scientific method.

I agree that you produce science, not merely code. Yet, the code is part of the science and you are not really publishing anything if you hide that part. Criticizing scientific code because it is bad software engineering is like criticizing it because it uses bad typography. You should not feel attacked by that.

shirakawasuna · 5 years ago

Race conditions and certain forms of non-determinism could invalidate the results of a given study. Code is essentially a better-specified methods section, it just says what they did. Scientists are expected to include a methods section for exactly this reason, and any scientist worried about including a methods section in their paper would be rightly rejected.

However, a methods section is always under-specified. Code provides the unique opportunity to actually see the full methods on display and properly review their work. It should be mandated by all reputable journals and worked into the peer review process.

ativzzz · 5 years ago

While you're running experiments, it doesn't matter, but publishing any sort of result or using your code in parts of other publishable code IS production code, and you should treat it as such.

pbalau · 5 years ago

> people claiming that their non-software engineering grade code invalidates the results of their study.

But that's exactly the problem.

Are you familiar with that bug in early Civ games where an overflow was making Ghandi nuke the crap out of everyone? What if your code has a similar issue?

What if you have a random value right smack in the middle of your calculations and you just happened to be lucky when you run your code?

I'm not that familiar with Monte Carlo, my understanding is that this is just a way to sample the data. And I won't be testing your data sampling, but I will expect that given the same data to your calculations part (eg, after the sampling happens), I get exactly the same results every time I run the code and on any computer. And if there are differences I expect you to be able to explain why they don't matter, which will show you were aware of the differences in the first place and you were not just lucky.

And then there is the matter of magic values that plaster research code.

Researchers should understand that the rules for "software engineering grade code" are not there just because we want to complicate things, but because we want to make sure the code is correct and does what we expect it to do.

/edit: The real problem is not getting good results with faulty code, is ignoring good solutions because faulty code.

touristtam · 5 years ago

> What I'm saying is that scientific code doesn't need to handle every special case or be easily usable by non-experts. In fact the time spent making it that way is time that a scientist spends doing software engineering instead of science, which isn't very efficient.

If the proof on which the paper is based is in the code that produced the evidence, you absolutely need to be able to let a lambda user run it without specific knowledge to abide to the reproducible principle. Asking a reviewer to fiddle about like a IT professional to get something working is bound to promote lazy reviewing and either will result into dismissing the result or approval without real review.

And by the way producing a paper could be argued it isn't really science either, but if you are working with MSFT Office, you know there is a fair amount of non science work hours that has been put into that as well.

eru · 5 years ago

> As an example, you seem to be complaining that their Monte Carlo code has non-deterministic output when that is the entire point of Monte Carlo methods and doesn't change their result.

Not so fast. Monte Carlo code turns arbitrary RNG seeds into outputs. That process can, and arguably should be, deterministic.

To do your study, you feed your Monte Carlo code 'random enough' seeds. Coming up with the seeds does not need to be deterministic. But once the seeds are fixed, the rest can be deterministic. Your paper should probably also publish the seeds used, so that people can reproduce everything. (And so they can check whether your seeds are carefully chosen, or really produce typical outcomes.)

CamperBob2 · 5 years ago

Sure, and that rationale works OK when your code operates in a limited, specialized domain.

But if you're modeling climate change or infectious diseases, and you expect your work to affect millions of human lives and trillions of dollars in spending, then you owe us a full accounting of it.

spott · 5 years ago

>when that is the entire point of Monte Carlo methods and doesn't change their result.

Two nitpicks: a) it shouldn't change the conclusions, but MC calculations will get different results depending on the seed. and b) it is considered good practice in reproducible science to fix the seed so that the results of subsequent runs give exactly the same results.

Ultimately, I think there is a balance: really poor code can lead to incorrect conclusions... but you don't need production ready code for scientific exploration.

selectionbias · 5 years ago

Sorry to be pedantic, but although Monte Carlo simulations are based on pseudo-randomness, I still think it is good practice that they have deterministic results (i.e., use a given seed) so that the exact results can be replicated. If the precise numbers can be reproduced then a) it helps me as a reviewer see that everything is kosher with their code and b) it means that if I tweak the code to try something out my results will be fully compatible with theirs.

oliver101 · 5 years ago

Why is "doing software engineering" not "doing science"?

Anybody who has conducted experimental research will say they spent 80% of the time using a hammer or a spanner. Repairing faulty lasers or power supplies. This process of reliable and repeatable experimentation is the basis of science itself.

Computational experiments must be held to the same standards as physical experiments. They must be reproducible and they should be publicly available (if publicly funded).

Deleted Comment

mikemotherwell · 5 years ago

What are the frameworks used in scientific endeavours? Given that scaling is not an issue, something like Rails for science seems like it could potentially return many $(B/M)illions of dollars for humanity.

booleandilemma · 5 years ago

What I'm saying is that scientific code doesn't need to handle every special case or be easily usable by non-experts.

Sounds like I should just become a scientist then.

Do you guys write unit tests or is that beneath you too?

jnxx · 5 years ago

edit: please read the grandchild comment before going off on the idea that some random programmer on the Internet dares to criticize scientific code he does not understand. What is crucial in the argument here is indeed the distinction between methods employing pseudo-randomness, like Monte Carlo simulation, and non-determinism caused by undefined behavior.

> I'm an accelerator physicist and I wouldn't want my code to end up on acceleratorskeptics.com with people that don't understand the material making low effort critiques of minor technical points.

The person which wrote the linked blog post writes that it was a software engineer at google. Unfortunately, that claim is not falsifiable as the person decided to remain anonymous.

> As an example, you seem to be complaining that their Monte Carlo code has non-deterministic output when that is the entire point of Monte Carlo methods and doesn't change their result.

The claim is that even with the same random seed for the random generator, the program produces different results, and this is explained by the allegation that it runs non-deterministic (in the sense of undefined behavior) in multiple threads. It claims also that it produces significantly different results depending on which output file format is chosen.

If this is true, the code would have race conditions, and as being impacted by race conditions is a form of undefined behavior, this would make any result of the program questionable, as the program would not be well-defined.

Personally, I am very doubtful whether this is true, this would be incredibly sloppy by the imperial college scientists. Some more careful analysis by a recognized programmer might be warranted.

However it underlines well the importance of the main topic that scientific code should be open to analysis.

> What I'm saying is that scientific code doesn't need to handle every special case or be easily usable by non-experts.

Fully agree with this. But it should try to document its limitations.

beefee · 5 years ago

I want science to be held to a very high standard. Maybe even higher than "software engineering grade". Especially if it's being used as a justification for public policy.

kordlessagain · 5 years ago

> people that don't understand the material making low effort critiques of minor technical points

GPT-3 FTW!

MaxBarraclough · 5 years ago

At the risk of just mirroring points which have already been made:

> you understand that the links in your post are the exact worry people have when it comes to releasing code: people claiming that their non-software engineering grade code invalidates the results of their study.

It's profoundly unscientific to suggest that researchers should be given the choice to withhold details of their experiments that they fear will not withstand peer review. That's much of the point of scientific publication.

Researchers who are too ashamed of their code to submit it for publication, should be denied the opportunity to publish. If that's the state of their code, their results aren't publishable. Unpublishable garbage in, unpublishable garbage out. Simple enough. Journals just shouldn't permit that kind of sloppiness. Neither should scientists be permitted to take steps to artificially make it difficult to reproduce (in some weak sense) an experiment. (Independently re-running code whose correctness is suspect, obviously isn't as good as comparing against a fully independent reimplementation, but it still counts for something.)

If a mathematician tried to publish the conclusion of a proof but refused to show the derivation, they'd be laughed out of the room. Why should we hold software-based experiments to such a pitifully low standard by comparison?

It's not as if this is a minor problem. Software bugs really can result in incorrect figures being published. In the case of C and C++ code in particular, a seemingly minor issue can result in undefined behaviour, meaning the output of the program is entirely unconstrained, with no assurance that the output will resemble what the programmer expects. This isn't just theoretical. Bizarre behaviour really can happen on modern systems, when undefined behaviour is present.

A computer scientist once told me a story of some students he was supervising. The students had built some kind of physics simulation engine. They seemed pretty confident in its correctness, but in truth it hadn't been given any kind of proper testing, it merely looked about right to them. The supervisor had a suggestion: Rotate the simulated world by 19 degrees about the Y axis, run the simulation again, and compare the results. They did so. Their program showed totally different results. Oh dear.

Needless to say, not all scientific code can so easily be shown to be incorrect. All the more reason to subject it to peer review.

> I'm an accelerator physicist and I wouldn't want my code to end up on acceleratorskeptics.com with people that don't understand the material making low effort critiques of minor technical points.

Why would you care? Science is about advancing the frontier of knowledge, not about avoiding invalid criticism from online communities of unqualified fools.

I sincerely hope vaccine researchers don't make publication decisions based on this sort of fear.

RandoHolmes · 5 years ago

> people claiming that their non-software engineering grade code invalidates the results of their study.

How exactly is this a bad thing?

But it should be noted that what you didn't say is that you're here to turn out accurate science.

This is the software version of statistics. Imagine if someone took a random sampling of people at a Trump rally and then claimed that "98% of Americans are voting for Trump". And now imagine someone else points out that the sample is biased and therefore the conclusion is flawed, and the response was "Hey, I'm just here to do statistics".

---

Do you see the problem now? The poster above you pointed out that the conclusions of the software can't be trusted, not that the coding style was ugly. Most developers would be more than willing to say "the code is ugly, but it's accurate". What we don't want is to hear "the conclusions can't be trusted and 100 people have spent 10+ years working from those unreliable conclusions".

dandelion_lover · 5 years ago

As a theoretical physicist doing computer simulations, I am trying to publish all my code whenever possible. However all my coauthors are against that. They say things like "Someone will take this code and use it without citing us", "Someone will break the code, obtain wrong results and blame us", "Someone will demand support and we do not have time for that", "No one is giving away their tools which make their competitive advantage". This is of course all nonsense, but my arguments are ignored.

If you want to help me (and others who agree with me), please sign this petition: https://publiccode.eu. It demands that all publicly funded code must be public.

P.S. Yes, my 10-year-old code is working.

SilasX · 5 years ago

>"Someone will demand support and we do not have time for that",

Well ... that part isn't nonsense, though I agree it shouldn't be a dealbreaker. And it means we should work towards making such support demands minimal or non-existent via easy containerization.

I note with frustration that even the Docker people, whose entire job is containerization, can get this part wrong. I remember when we containerized our startup's app c. 2015, to the point that you should be able to run it locally just by installing docker and running `docker-compose up`, and it still stopped working within a few weeks (which we found when onboarding new employees), which required a knowledgeable person to debug and re-write.

(They changed the spec for docker-compose so that the new version you'd get when downloading Docker would interpret the yaml to mean something else.)

onhn · 5 years ago

As a theoretical physicist your results should be reproducible based on the content of your papers, where you should detail/state the methods you use. I would make the argument that releasing code in your position has the potential to be scientifically damaging; if another researcher interested in reproducing your results reads your code, then it is possible their reproduction will not be independent. However they will likely still publish it as such.

pthread_t · 5 years ago

> "No one is giving away their tools which make their competitive advantage"

This hits close to home. Back in college, I developed software, for a lab, for a project-based class. I put the code up on GitHub under the GPL license (some code I used was licensed under GPL as well), and when the people from the lab found out, they lost their minds. A while later, they submitted a paper and the journal ended up demanding the code they used for analysis. Their solution? They copied and pasted pieces of my project they used for that paper and submitted it as their own work. Of course, they also completely ignored the license.

bumby · 5 years ago

I’m curious, are dedicated software assurance teams a thing in your research area? Or is quality left up to the primary researchers?

Vinnl · 5 years ago

Interestingly each of those arguments also applies to publishing an article describing your work.

arcanus · 5 years ago

> Scientists really need to publish their code artifacts, and we can no longer just say "Well they're scientists or mathematicians" and allow that as an excuse for terrible code with no testing specs.

You are blaming scientists but speaking from my personal experience as a computational scientist, this exists because there are few structures in place that incentivize strong programming practices.

* Funding agencies do not provide support for verification and validation of scientific software (typically)

* Few journals require assess code reproducibility and few require public code (few require even public data)

* There are few funded studies to reproduce major existing studies

Until these structural challenges are addressed, scientists will not have sufficient incentive to change their behavior.

> Scientific code needs to have tests, a minimal amount of test coverage, and code/data used really need to be published and run by volunteers/editors in the same way papers are reviewed, even for non-computer science journals.

I completely agree.

geoalchimista · 5 years ago

Second this. Research code is already hard, and with misaligned incentives from the funding agencies and grad school pipelines, it's an uphill battle. Not to mention that professors with an outdated mindset might discourage graduate students from committing too much time to work on scientific code. "We are scientists, not programmers. Coding doesn't advance your career" is often an excuse for that.

In my opinion, enforcing standards without addressing this root cause is not gonna fix the problem. Worse, students and early career researchers will bear the brunt of increased workload and code compliance requirements from journals. Big, well-funded labs that can afford a research engineer position is gonna have an edge over small labs that cannot do so.

bartvbl · 5 years ago

The graphics community has started an interesting initiative at this end: http://www.replicabilitystamp.org/

After a paper has been accepted, authors can submit a repository containing a script which automatically replicates results shown in the paper. After a reviewer confirms that the results were indeed replicable, the paper gets a small badge next to its title.

While there could certainly be improvements, I think it's a step in the right direction.

dandelion_lover · 5 years ago

But does this badge influence the scientific profile / resume of the researcher in any way?

rscho · 5 years ago

> If journals really care about the reproducibility crisis

All is well and good then, because journals absolutely don't care about science. They care about money and prestige. From personal experience, I'd say this intersects with the interests of most high-ranking academics. So the only unhappy people are idealistic youngsters and science "users".

Let's get back to non-profit journals.

OminousWeapons · 5 years ago

I am in 100% agreement and would like to point out that many papers based on code don't even come with code bases, and if they do those code bases are not going to contain or be accompanied by any documentation whatsoever. This is frequently by design as many labs consider code to be IP and they don't want to share it because it gives them a leg up on producing more papers and the shared code won't yield an authorship.

acutesoftware · 5 years ago

If published research is based on a code base, then surely the documentation and working code is equally important than the carefully written paper.

prionassembly · 5 years ago

Institutions need to provide scientists and mathematicians with coders. It's a bit insane to expect them to be software engineers as well.

neuromantik8086 · 5 years ago

There are some efforts in this vein within academia, but they are very weak in the United States. The U.S. Research Software Engineer Association (https://us-rse.org/) represents one such attempt at increasing awareness about the need for dedicated software engineers in scientific research and advocates for a formal recognition that software engineers are essential to the scientific process.

In terms of tangible results, Princeton at least has created a dedicated team of software engineers as part of their research computing unit (https://researchcomputing.princeton.edu/software-engineering).

Realistically though even if the necessity of research software engineering were acknowledged at the institutional level at the bulk of universities, there would still be the problem of universities paying way below market rate for software engineering talent...

To some degree, universities alone cannot effect the change needed to establish a professional class of software engineers that collaborate with researchers. Funding agencies such as the NIH and NSF are also responsible, and need to lead in this regard.

izacus · 5 years ago

Noone expects them to be software engineers, but we do expect them to be _scientists_ - to publish results that are reproducible and verifiable. And that has to hold for code as well.

justin66 · 5 years ago

John Carmack, who did some small amount of work on the code, had a short rebuttal of the "Lockdown Skeptics" attack on the Imperial College code that probably mirrors the feelings of some of us here:

https://mobile.twitter.com/id_aa_carmack/status/125819213475...

paperwork · 5 years ago

Can you describe a bit more about what is going on in the project? The file you linked is over 2.5k lines of c++ code, and that is just the “setup” file. As you say, this is supposed to be a statistical model, I expected this to be R, Python or one of the standard statistical packages.

Why is there so much c++ code?

fsh · 5 years ago

It's a Monte-Carlo simulation, not a statistical model. These are usually written in C++ for performance reasons.

recursivecaveat · 5 years ago

It is essentially a detailed simulation of viral spread, not just a programmed distribution or anything. It's all in C++ because it's pretty performance-critical.

disgruntledphd2 · 5 years ago

Because much of this code was written in the 80's, I suspect. In general, there's a bunch of really old scientific codebases in particular disciplines because people have been working on these problems for a looooonnngg time.

roel_v · 5 years ago

Who says anything about statistical models?

ranaexmachina · 5 years ago

In computer science a lot of researcher already publish their code (at least in the domain of software engineering) but my biggest problem is not the absence of tests but the absence of any documentation how to run it. In the best case you can open it in an IDE and it will figure out how to run it but I rarely see any indications what the dependencies are. So if you figure out how to run the code you run it until you get the first import exception, get the dependency until you get the next import exception and so on. I spent way too much time on that instead of doing real research.

noelsusman · 5 years ago

The criticisms of the code from Imperial College are strange to me. Non-deterministic code is the least of your problems when it comes to modeling the spread of a brand new disease. Whatever error is introduced by race conditions or multiple seeds is completely dwarfed by the error in the input parameters. Like, it's hard to overstate how irrelevant that is to the practical conclusions drawn from the results.

Skeptics could have a field day tearing apart the estimates for the large number of input parameters to models like that, but they choose not to? I don't get it.

klyrs · 5 years ago

I do research for a private company, and open-source as much of my work as I can. It's always a fight. So I'll take their side for the moment.

Many years ago, a paper on the PageRank algorithm was written, and the code behind that paper was monetized to unprecedented levels. Should computer science journals also require working proof of concept code, even if that discourages companies from sharing their results; even if it prevents students from monetizing the fruits of their research?

jonnycomputer · 5 years ago

A seasoned software developer encountering scientific code can be a jarring experience. So many code smells. Yet, most of those code smells are really only code smells in application development. Most scientific programming code only ever runs once, so most of the axioms of software engineering are inapplicable or a distraction from the business at hand.

Scientists, not programmers, should be the ones spear-heading the development of standards and rules of thumb.

Still, there are real problematic practices that an emphasis on sharing scientific code would discourage. One classic one is the use of a single script that you edit each time you want to re-parameterize a model. Unless you copy the script into the output, you lose the informational channel between your code and its output. This can have real consequences. Several years ago I started up a project with a collaborator to follow up on their unpublished results from a year prior. Our first task was to take that data and reproduce the results they obtained before, because the person no longer had access to the exact copy of the script that they ran. We eventually determined that the original result was due to a software error (which we eventually identified). My colleague took it well, but the motivation to continue the project was much diminished.

amelius · 5 years ago

You can blame all the scientists, but shouldn't we blame the CS folks for not coming up with suitable languages and software engineering methods that will prevent software from rotting in the first place?

Why isn't there a common language that all other languages compile to, and that will be supported on all possible platforms, for the rest of time?

(Perhaps WASM could be such a language, but the point is that this would be just coincidental and not a planned effort to conservate software)

And why aren't package managers structured such that packages will live forever (e.g. in IPFS) regardless of whether the package management system is online? Why is Github still a single point of failure in many cases?

WhompingWindows · 5 years ago

It's hard for me to publish my code in healthcare services research because most of it is under lock-and-key due to HIPAA concerns. I can't release the data, and so 90% of the work of munging and validating the data is un-releasable. So, should I release my last 10% of code where I do basic descriptive stats, make tables, make visualizations, or do some regression modeling? Certainly, I can make that available in de-identified ways, but without data, how can anyone ever verify its usefulness? And does anyone want to see how I calculated the mean, median, SD, IQR?...because it's with base R or tidyverse, that's not exactly revolutionary code.

j45 · 5 years ago

One of the things I come across is scientists who believe they're capable of learning code quickly because they're capable in another field.

After they embark on solving problems, it does become an eyeopening experience, and one that becomes now about keeping things running.

For those who have a STEM discipline in addition to a software development background >5Y, would you agree with seeing the above?

I would have thought the scientists among us would approach someone with familiarity with software development expertise. (something abstract and requiring a different set of muscles)

One positive emerging is the variety of low/no-code tooling that can replace a lot of this hornets nest coding.

PeterisP · 5 years ago

It's generally not plausible to "approach someone with familiarity with software development expertise" for organizational and budget reasons. Employing dedicated software developers is simply not a thing that happens; research labs overwhelmingly have the coding done by researchers and involved students without having any dedicated positions for software development.

In any case you'd need to teach them the problem domain, and it's considered cheaper (and simpler from organizational perspective) to get some phd students or postdocs from your domain to spend half a year getting up to speed on coding (and they likely had a few courses in programming and statistics anyway) than to hire an experienced software developer and have them learn the basics of your domain (which may well take a third or half of the appropriate undergraduate bachelor's program).

jpeloquin · 5 years ago

> I would have thought the scientists among us would approach someone with familiarity with software development expertise.

Is there a pool of skilled software architects willing to provide consultations at well-below market wages? Or a Q&A forum full of people interested in giving this kind of advice? (StackOverflow isn't useful for this; the allowed question scope is too narrow.) I guess one incentive to publish one's code is to get it criticized on places like Hacker News. The best way to get the right answer on the internet is to post the wrong answer, after all.

Fiahil · 5 years ago

My work position was created because scientists are not engineers. I had to explain -to my disappointment- why non-deterministic algorithms are bad, how to write tests, and how to write SQL queries, more than once.

However, when working as equals scientists and engineers can create truly transformative projects. Algorithms accounts for 10% of the solution. The code, infrastructure and system design accounts for 20% of the final result. The remaining 70% of the value, is directly coming from its impact. A projects that nobody uses is a failure. Something that perfectly solves a problem that nobody cares about is useless.

SiempreViernes · 5 years ago

In the event, the code actually is reproducible: https://www.nature.com/articles/d41586-020-01685-y

jnxx · 5 years ago

> This was used by the Imperial College for COVID-19 predictions. It has race conditions, seeds the model multiple times, and therefore has totally non-deterministic results[0].

> [0] https://lockdownsceptics.org/code-review-of-fergusons-model/

This does not looks like a good example at all, as it appears that the blog author there just tries to discredit the program because he does not like the results. He also writes that all epidemiological research should be defunded.

onhn · 5 years ago

There is a fundamental reason not to publish scientific code.

If someone is trying to reproduce someone else's results, the data and methods are the only ingredients they need. If you add code into this mix, all you do is introduce new sources of bias.

(Ideally the results would be blinded too.)

marmaduke · 5 years ago

This is an easy argument to make because it was already made for you in popular press months ago.

Show me the grant announcements that identify reproducible long term code as a key deliverable, and I’ll show you 19 out of 20 scientists who start worrying about it.

As someone who worked with bits of scientific code: Does the code you write right now work on another machine might be the more appropriate challenge. If seen a lot of hardcoded paths, unmentioned dependencies and monkey-patched libraries downloaded from somewhere; just getting the new code to work is hard enough. And let's not even begin to talk about versioning or magic numbers.

Similar to other comments I don't mean to fault scientists for that - their job is not coding and some of the dependencies come from earlier papers or proprietary cluster setups and are therefore hard to avoid - but the situation is not good.

BeetleB · 5 years ago

> their job is not coding

To me, that's like a theoretical physicist saying "My job is not to do mathematics" when asked for a derivation of a formula he put in the paper.

Or an experimental physicist saying "My job is not mechanical engineering" when asked for details of their lab equipment (almost all of which is typically custom built for the experiment).

Sebb767 · 5 years ago

On one hand, yes. But on the other hand, reuseable code, dependency management, linting, portability etc are not that easy problems and something junior developers tend to struggle with (and its not like that problem never pops up for seniors, either). I really can't fault non-compsci scientist for not handling that problem well. Of course, part of it (like publishing the relevant code) is far easier and should be done, but some aspects are really hard.

IMO the incentive problem in science (basically number of papers and new results is what counts) also plays into this, as investing tons of time in your code gives you hardly any reward.

djaque · 5 years ago

The point is that as a scientist your code is a tool to get the job done and not the product. I can't spend 48 hours writing unit tests for my library (even though I want to) if it's not going to give me results. It's literally not my job and is not an efficient use of my time

konjin · 5 years ago

That's literally what they do.

Theoretical Physicists (literal conversation I had):

>Yeah, this looked like it simplifies to 1-ish and Smart John said it's probably right.

Experimental physicists (another literal conversation):

>Yeah, we build it with duck-tape and there's hot glue holding the important bits that kept falling off. Don't put anything metal in that, we use it as a tea heater, but there's 1000A running through it so it's shoots spoons out when we turn the main machine on.

abdullahkhalids · 5 years ago

Lots of people saying, it is the scientist's job to produce reproducible code. It is, and the benefits of reproducible code are many. I have been a big proponent of it in my own work.

But not with the current mess of software frameworks. If I am to produce reproducible scientific code, I need an idiot-proof method of doing it. Yes, I can put in the 50-100 hours to learn how to do it [1], but guess what, in about 3-5 years a lot of that knowledge will be outdated. People comparing it with math, but the math proofs I produce will still be readable and understandable a century from now.

Regularly used scientific computing frameworks like matlab/R/Python ecosystem/mathematica need a dumb guided method of producing releasable and reproducable code. I want to go through a bunch of next buttons, that help me fix the problems you indicate, and finally release a final version that has all the information necessary for someone else to reproduce the results.

[1] I have. I would put myself in the 90th percentile of physicists familiar with best practices for coding. I speak for the 50% percentile.

zelphirkalt · 5 years ago

The dumb guide is the following:

(1) Use a package manager, which stores hashsums in a lock file. (2) Install your dependencies from a lock file as spec. (3) Do not trust version numbers. Trust hash sums. Do not believe in "But I set the version number!". (4) Do not rely on downloads Again, trust hash sums, not URLs. (5) Hashsums!!! (6) Wherever there is randomness as in random number generators, use a seed. If the interface does not allow to specify the seed, thtow the trash away and use another generator. Careful when concurrency is involved. It might destroy reproducibility. For example this was the case with Tensorflow. Not sure it still is. (7) Use a version control system.

TheJoeMan · 5 years ago

I emailed an author of a 5 year old paper and they said they had lost their original MATLAB code, certainly brings into question their paper.

James_Henry · 5 years ago

Definitely makes you question it more. Does the paper not explain the contents of the MATLAB code? That's all that is usually needed for reproducibility. You should be able to get the same results no matter who writes the code to do what is explained in their methods.

Of course, I have no idea about the paper you're talking about and just want to say that reproducibility isn't dependent on releasing code. There could even be a case were it's better if someone reproduces a result without having been biased by someone else's code.

dunefox · 5 years ago

If a scientist needs to write code then it's part of their job. It's as easy as that.

magv · 5 years ago

I think the idea that scientific code should be judged by the same standards as production code is a bit unfair. The point when the code works the first time is when an industry programmer starts to refactor it -- because he expects to use and work on it in the future. The point when the code works the first time is when a scientists abandons it -- because it has fulfilled its purpose. This is why the quality is lower: lots of scientific code is the first iteration that never got a second.

(Of course, not all scientific code is discardable, large quantities of reusable code is reused every day; we have many frameworks, and the code quality of those is completely different).

hobofan · 5 years ago

> their job is not coding

But it often is. For most non-CS papers (mostly biosciences) I've read, there are specific authors whose contribution to a large degree was mainly "coding".