> Available data is 0 for most things.
I would argue that we need an effective alternative to benchmarks entirely given how hard they are to obtain in scientific disciplines. Classical statistics has gone very far by getting a lot out of limited datasets, and train-test splits are absolutely unnecessary there.
Even if this is a super high bar, I think more papers in ML for science should strive to be truly interdisciplinary and include an actual science advancement... Not just "we modify X and get some improvement on a benchmark dataset that may or may not be representative of the problems scientists could actually encounter." The ultimate goal of "ml for science" is science, not really to improve ML methods imo
smaller functions are also usually easier to test :shrug: