Setting up a local LLM isn't that hard, although I'd probably air gap anything truly sensitive. I like ollama, but it wouldn't surprise me if it's phoning home.
What a strange thing to assert, especially as a general overarching truth.
The best reports I have ever seen have matched code and output in the same file. There's never a question of what code generated a plot or a table with a notebook.
With .py files and separate outputs there's far more change for unreproducibke science, it's far messier, and for someone who doesn't appear to respect the organizational capabilities of academic labs, you are condemning them to far more poorly organized outputs.
> Having multiple copies of code
That doesn't have anything to do with notebooks. It's as silly as saying that a Python package is a poor idea because you say somebody repeat code across multiple places.
> non-version controlled files
Notebooks are no less version controllable than .py files.
> outputs with timestamps and run information
Jupyter notebooks are perfect for this, far superior to a directory of cryptically named outputs that need to be strung together in some order
> documentation dispersed with questionable organization
Using separate Python files rather than a notebook means that documentation can never be where it needs to be: next to the output. This is one of the ways that Python files are strictly inferior for generating results.
There are roughly two modes for notebooks: exploration with a REPL, and well-documented reports. The best scientific reports I have ever seen are notebooks (or R Markdown output) that are the full report text plus code plus figures.
This is not a great way to make your argument, though you are not the not only one here making a personal judgement without even knowing about my background. These are all issues I have seen first hard. With most academic labs being funding limited, the "organizational capabilities of academic labs" seems irrelevant to me. In our field, no one is getting grants to manage code of any kind .py or .ipynb and I suspect its the same at most university labs. It's effort wasted that ultimately does take time away from the actual research that's fundable and publishable. As someone who has been responsible for wrangling people's notebooks in the past, it's enough of a problem that I would encourage to remove all .ipynb.
> That doesn't have anything to do with notebooks. It's as silly as saying that a Python package is a poor idea because you say somebody repeat code across multiple places.
Human factors make jupyter notebooks lead to the problems I have listed. The issues are most apparent with large groups and over long periods of time. Python and other programming languages already solved most of these problems with git. There isn't a tool that is as elegant and scales from individuals to massive organizations.
> There are roughly two modes for notebooks: exploration with a REPL, and well-documented reports. The best scientific reports I have ever seen are notebooks (or R Markdown output) that are the full report text plus code plus figures.
The REPL functionality is handled by .py cell execution, as I’ve mentioned in other comments. It baffles me how the minimal effort saved by not using separate tools -- one for code, one for documentation -- justifies the issues it introduces.
But since most uses of Jupyter notebooks I've seen don't version control them much at all, it's not as useful in practice often.
Also, while many practices out there is questionable, in alternative scenarios where ipynb doesn’t exist, they might have been using something like matlab for example. Eg, in my field (physics), often time there are experimentalists doing some coding. Ipynb can be very enabling for them.
I think a piece of research should be broken down and worked by multiple people to improve the state of the project. Some scientists might be passing you the initial prototype in the form of a notebook, and some others should be refactoring to something more suitable for deployment and archival purpose. Properly funding these roles is important, and is lacking but improving (eg hiring RSE.)
In my field, the most prominent way when ipynb is shared a lot is for training. It’s a great application as that becomes literate programming. In this sense notebook is highly underused as literate programming still hasn’t got mainstream.
I think the notebooks are a fine learning tool to introduce people to programming initially, but I'm afraid it doesn't allow for growth beyond a certain level. You have a good point about funding for those software roles. Perhaps this may not be as big of a concern if there were more software talent in these labs to handle the issues that arise.
It's great for exploring code and data too, especially situations where I'm really trying to feel my way towards a solution. I get to merrily intermingle rich text narrative and code so I explain how I got to where I got to and can walk people through it (I did that with some experimenting with an SMT solver several months ago, meant that people that had no experience with an SMT solver could understand the model I built).
I'd never use it to share code though. If we get to that stage, it's time to export from jupyter (which it natively supports), and then tidy up the code and productionise it. There's no way jupyter should be the deployed thing.
If the code is the end product, sure, use a python package.
But does your .py with `# %%` in it also store the outputs? If not, why even bring this up? A .py output without the plots tied to the code doesn't meet the basic use case.
If the end product is the plot, I want to see how that plot was generated. And a Jupyter notebook is a much much better artifact than a Python package, unless that Python package hard codes the inputs and execution path like a notebook would.
Over the past 20 years of my career I have run into this divergence of use cases a lot. Software engineers seem to not understand the end goals, how it should be performed, and the learnings of the practitioners that have been generating results for a long time. It's hard to protect data scientists from these inflexible software engineers that see "aha that's code, I know this!" without bothering to understand the actual use case at hand.
There isn't a single install tool that "just works" for this at the moment. If editors came with more robust support for it by default, I think the notebook format wouldn't be needed at that point and people could use regular python and interactive cell based python more interchangeably. I've seen important code get buried under collections of jupyter notebooks across different users so I have a good reason for this. Notebooks simply dont scale beyond a certain complexity.