Readit News logoReadit News
jhbadger · 2 years ago
I think people may be misusing notebooks. They aren't there to develop software but serve as virtual versions of scientific notebooks (hence the name). They are there to conduct experiments (changing parameters and the like in your code) and to record and plot the results. You don't have to develop the software itself in the notebook.
subjectsigma · 2 years ago
I always hated notebooks personally as I found them clunky and inefficient, but never wanted to yuck someone else’s yum.

Then I started mentoring a junior staff member who worked on another project. The lead of that project was a physicist who wrote primarily in Jupyter notebooks. Like, thousands and thousands of lines of code. This junior staff member spent like 90% of her time confused and copying-and-pasting between notebooks. She had no idea what a virtual environment was and her go-to solution for solving import errors was to nuke and re-download the repo and re-run the setup script, or create a new notebook from scratch containing all the prerequisite user functions.

Was top 10 most horrifying things I’ve ever seen. I advised her immediately to stop using notebooks for development and sent her a few Python tutorials. Luckily though she just left the project and got on a better one instead.

ahmadmijot · 2 years ago
This is why spaghetti code is so prevalent in scientific community. We are not really train (especially for non engineering research projects) to become programmers and we do programming just as a way to do research.
htrp · 2 years ago
Academia is also usually a single person trying to get stuff working so they can publish a paper to get tenure/a job.

Good SWE practices are 100% not taught in school.

madsbuch · 2 years ago
I still rely heavily on principles acquired from courses on Software Architecture and PL studies as a part of my CS degree - and I can definitely see a difference in how people organise their code with same tenure but no schooling.
alan-hn · 2 years ago
I can change parameters in a script. What's the advantage?
abdullahkhalids · 2 years ago
Remember back university, we used to do course projects (for science/engineering course) and at the end we would have to write a report. The report would typically include, among other stuff, two things.

(1) The calculational methods we used - could either be a set of mathematical equations or a description of the algorithms. (2) The results of evaluating these equations/algorithms for different parameter values. Usually some graphs, and some discussion of their meaning.

A Jupyter notebook is designed to replicate that process but make it easier because the figures are produced by code right there. Personally, all my notebooks include a discussion in the markdown cells what I am doing, and why. It includes discussions of the code. And directly from the code, some graphs or numbers, with a discussion attached.

With the script workflow, I would have two different files. One with the code, and one with the results pasted in. It's annoying when my primary goal is to develop and test the algorithms under discussions. Best thing is, if done right, my work is completely replicable. Just run the notebook again.

Just because some people misuse the tool doesn't mean the tool isn't useful.

pipe2devnull · 2 years ago
Sometimes you want to quickly iterate on a portion of code doing some sort of analysis or tweaking a plot. If your data pipeline takes a while to run then rerunning the whole script is really awful. Notebooks make it easy to cache rerun and tweak chunks of the code.
jhbadger · 2 years ago
You can't embed graphs in a script, and plotting is an important part of systematic research. Also, it is easier to have an obvious sequential set of experiments over months in a notebook rather than a bunch of scripts. It's the same reason scientists use lab notebooks to keep track of things rather than just a bunch of loose papers.
szvsw · 2 years ago
Another advantage is when you have very slow code, you can use cells as caches, essentially without having to worry about serialization to disk. This often makes it much easier to interactively explore/develop downstream methods without needing to re-run earlier upstream dependencies.

This is especially useful with large datasets. Even if serialization is straightforward, if you have enough data (or the data is remotely hosted), loading it might take anywhere from 2s to multiple minutes, and even 2s is enough to get you out of the flow if you are working rapidly and want quick feedback.

whywhywhywhy · 2 years ago
You can rerun something halfway down a script without re-running the whole thing.

If your script requires loading 12+ GB of ml models into a gpu before running anything at all this is the difference between a few seconds and a minute to see a change also if the output isn't text you can see the image or chart result inline to that code.

williamcotton · 2 years ago
I am guessing you don’t do data science or data forensics? Inline styled tabular outputs and graphic plots while exploring data is very handy!

Edit: jinx!

vundercind · 2 years ago
They’re a repl where you can go back and edit and re-run earlier parts much more easily than on a normal repl.
yowlingcat · 2 years ago
I think the workflow improvement happens but it's not because notebooks allow you to do something that you can't do otherwise. They just improve ergonomics.

For example, there are a lot of cases my team uses notebooks for proofs of concept where we make a large expensive call to load a large chunk of data, slice a small piece of it, iteratively try to reprocess the piece until you get the reprocessing to occur the desired way, validate it reprocessed correctly, and then extend the reprocessing the the rest of the data set. That can all be done after only making 1 expensive call. Further more, if the last cell evaluation fails, it just resets you back to the line before and you can retry it.

Can you do this with a script? Absolutely. You can write a script to download the data, and a script to process the data, and sub scripts for the individual steps. But that's not the path of least resistance; the path of least resistance involves you having to edit a piece and recompile everything and reset the entry point. Avoiding really makes it easier to brute force to the desired state ASAP.

packetlost · 2 years ago
Immediate(ish) visualization and a whole lot of tooling to make presentation palatable for some datasets/types
slt2021 · 2 years ago
you would have to re-run script from the beginning - this is not productive in scientific experiments, where you need to re-run certain parts of your code and tune/change parameters, try different things.

if your calculation is long running you would not be as productive as could be in notebooks

Cacti · 2 years ago
It’s a REPL, for starters.
sneed_chucker · 2 years ago
Formatted markdown, embedded images, graphs, plots, etc.

Can't really do that in a script unless you're running TempleOS

bdjsiqoocwk · 2 years ago
It's just a name, you're overthinking it.
WalterSear · 2 years ago
Some people, such as Jeremy Howard, think otherwise:

https://nbdev.fast.ai/

lucw · 2 years ago
The article should start with more context, what is a notebook ? I know what it is, but the author is particularly bad at introducing his article.
xanathar · 2 years ago
Especially given that 'notebook' is also a synonym of 'laptop' and I was like, wtf, I don't have a mainframe to be less lazy.
blitzar · 2 years ago
I dont want to nitpick here but a 'notebook' is a small book with blank or ruled pages for writing notes in.

All the greats carried one and I too carry one.

mrweasel · 2 years ago
I still have no idea. I assumed it was a paper notebook, but then Excel is brought in, so now I think it's an Excel feature. In either case I have no idea what their are on about.
kwstas · 2 years ago
Agreed, even something like "coding" notebooks if brandname usage is a concern would help. It took more time than I'd like to admit to understand that they weren't talking about physical notebooks...
p4bl0 · 2 years ago
Indeed. I have to admit it took me some times to understand this was not about a specific small-ish form factor of laptops (e.g. Chromebooks like)…
hypercube33 · 2 years ago
My first assumption was something like interactive notebooks like Polyglot or something but reading this I really have no idea either
skybrian · 2 years ago
One of those complaints is due to an unfortunate implementation choice.

Out-of-date cells happen because Jupyter works like a buggy makefile that doesn't reliably rebuild dependencies, forcing you to run "make clean" when anything weird happens. There are better build systems.

Observable notebooks will automatically rerun cells that changed, like a spreadsheet. It works nicely for calculations that aren't too heavy, but it might not be what you want for a heavy batch job.

Their newer tool, Observable Framework, works more like a regular build system. You can still have it automatically build when you save a file in your editor.

A second complaint, that it's browser based, is basically an editor preference. You can open Jupyter notebooks in VS Code if you prefer.

waldrews · 2 years ago
Notebooks are a diluted form of the Lisp/Smalltalk REPL-based development experience, with some features from the reactive-spreadsheet world. Especially for those of us in the business of producing numbers, 'real code' with a fixed set of tests isn't a better way but a necessary evil, a black box that we can't really trust. Building a calculation piece by piece, in a notebook, trying out variations along the way? That's how you get a feel for the calculation, connect with it at a spiritual level.
fifilura · 2 years ago
This is what I have to teach people working with numbers. Often with a risk of looking like a fool to the SWE crowd.

That working with numbers is a craft in itself. And the primary driver for the craft are the numbers. Not SWE practices.

You can't "feel" the numbers just by coming up with a huge testsuite.

And pretty often, the feel goes missing when you translate your prototype notebook into "real code".

skydhash · 2 years ago
I wonder why CL and Smalltalk haven’t beat Python. Is is the languages or just unawareness? The workflow just make more sense there with better updates propagation and state saving.
waldrews · 2 years ago
The Clojure/JVM statistics/scientific computing/now tensor math packages just never got as good as Python, and in Smalltalk they were a non-starter. R is an awkward language, but it's repl-first, has a lot of Lisp's metaprogramming (done in very ad-hoc ways), and Smalltalk's serializable image model -- so a lot of exploratory/experimental statistical methods research happens there, and then gradually makes its way to Python when it needs to be stabilized for production.

F# based .Net would've made a fine math-centric environment, but it's a language with a high initial barrier, and though .Net community has been making a decent effort at porting over a lot of NumPy/SciPy, it's not fully caught up after many years.

We have Julia now, it's jitted, multicore/GPU friendly, and has interesting REPL innovations, and yes, serializable state (though, ugh, ligatures). But every time I reach for it, it's like, oh no, yet another thing I want to call is in the Python ecosystem. Especially all the modern deep learning/tensor stuff, where the assumption is Python's speed and the GIL don't matter because you're just gluing together GPU calls.

kelseyfrog · 2 years ago
It's absolutely hilarious that the author opens with an image of Socrates.

Socrates, as you recall, famously argued that writing was a detriment to thinking. The parallel that notebooks are a sign of lazy thought does not go unnoticed.

JoeyBananas · 2 years ago
That is really not a valid parallel at all because code notebooks are only one particular form of coding.

Being against coding in a code notebook is more like saying "yellow highlighter on sticky notes is a bad medium for serious writing"

kelseyfrog · 2 years ago
This is a good opportunity to respond with the strongest plausible interpretation of what someone says.

Deleted Comment

RamblingCTO · 2 years ago
That's his profile pic on all platforms, afaik
packetlost · 2 years ago
Some of the worst code I've seen in my life lives in Jupyter notebooks. But that's fine, it's meant to be throwaway code. The problem is a lot of places/people do not use it as such.
giraffe_lady · 2 years ago
In general anything written by nonprofessional programmers in pursuit of some other goal is terrible code by professional programmer standards.

On the other hand, it accomplishes a goal other than getting a programmer paid. So in one important sense is objectively better than probably 80% of the code I've seen people paid to write, no matter how nicely it was constructed.

__mharrison__ · 2 years ago
This is the crux of a lot of the anti-notebook rhetoric.

I spend a good deal of time covertly teaching software engineering best practices to folks who claim they don't want to be "software engineers", yet they are in front of Jupyter most days.

cqqxo4zV46cp · 2 years ago
Amen to that. The confluence of an insulator nature of organisational developer cliques, the culture of self-importance in the field, and the ability for many teams to expend material effort on things that are…at best, tangentially related to business goals (due to cheap money, and frankly, often pulling the wool over management’a eyes), has bred at least a couple of generations of developer where large contingents have a completely out of whack perspective on what’s actually important.
hot_gril · 2 years ago
In one project, I had a new teammate ask me why almost our entire web backend is a single .js file mostly filled with SQL queries. He wanted to make it his personal project to refactor this into probably 20 different files, adding more layers between the handlers and the DB stuff, using a query builder, and also migrating to TS. When I asked him what's wrong with the current thing, he couldn't answer and gave up on this idea.

That backend had decent integration tests, and it took less than 5 minutes to add a small new feature. Most prod code doing comparable things where I work is worse-tested, less reliable, and at least 10X more expensive in terms of SWE-time, so I think modern programming standards are actually nonsense even though I can play along with them.

m463 · 2 years ago
They should put it in an excel spreadsheet. Then it would be carefully documented and be immediately retired, no possible way to outlive its usefulness.

https://xkcd.com/2730/

squarepizza · 2 years ago
The insidiousness of notebooks is in their uncanny resemblance to proper code. To take writing for example, a handwritten draft scrawled on coffee-stained yellow paper is more obviously a work-in-progress than a typed manuscript. But the maturity of code is not so immediately apparent.
hot_gril · 2 years ago
Idk how you'd use one for non-throwaway code. Not like a webserver can run a routine in a notebook.
packetlost · 2 years ago
I've seen "production workflows" be wrapped up in jupyter notebooks. Manually triggered of course.
WalterSear · 2 years ago
bluenose69 · 2 years ago
This is a great article. For me, a lot relates to scale.

I use notebooks occasionally for the university classes I teach. I like how I can write equations and text to explain ideas, and present interactive graphs to illustrate the ideas.

But it's quite a lot of work to set things up, and there can be a problem when components of the work take a long time.

In my research work, it is common for a calculation or a graph to take hours to days. Doing work like that in a notebook is just a non-starter. I use scripts (in various languages) along with Makefiles that "know" if I've changed my code or my data, and only rebuilds a result file or a graph when require. Almost always, I separate code that does analysis from code that creates graphical displays. And of course the text (of a paper, course notes, etc.) is written to incorporate these numerical and graphical results. The whole point is to subdivide complex tasks into smaller tasks that can be executed in an organized way.

I don't see the sense in using one tool (notebooks, say) for simple tasks and other tools (scripts, a strong editor, tmux, unix, etc.) for complex tasks. So, apart from the toy tools that I make for some teaching tasks, I stick to simple unix-style tools for the tricky stuff.

It's important to have many tools in the toolbox. Notebooks can be useful for some things, but I wouldn't want to frame a wall using an awl.

stakhanov · 2 years ago
Wholly agree and can't for the life of me figure out why you're being downvoted, other than people disagreeing with the tenor of your opinion.
skeledrew · 2 years ago
Recently, since coming across FastAI and nbdev[0], I've been moving increasingly to a more notebook-centric flow. So far it's been better particularly for the exploration aspect where I've primarily used ptpython in the past (and this isn't anywhere ML-related). I think the idea behind nbdev is pretty neat, but it pushes some practices that I'm not a fan of at all. I want to get to the point where I have a mostly complete IDE experience in the notebook so I don't have to keep switching back and forth. I have a ways to go.

[0] https://nbdev.fast.ai/

darepublic · 2 years ago
I remember Jeremy from fast ai was an advocate for notebook centric flows. But I have to respectfully disagree that he is a good authority on maintainable / scalable coding practices like this. And though I may be wrong, I feel like the fast ai lib itself is a not too useful wrapper around pytorch and people would be better served just learning pytorch itself. I say this as someone who watched a year of the fast ai videos, and got really excited about ai because of the fast ai course, which I am still grateful for. But this is my current take.
skeledrew · 2 years ago
I'd say the fastai library itself[0] is a pretty good example of how maintainable/scalable practices can come to life in notebook flows. There's something to be said IMO for an active project with 25.8k stars, 238 contributers, 2.7k commits, and 199 open vs 1.5k closed issues.

[0] https://github.com/fastai/fastai/