Literate programming is much more than just commenting code

The problems one will run into with literate programming:

1. Lack of tooling.

2. Refactoring becomes nontrivial

3. How one would write a program in literate style will vary widely from person to person. If you write your code in literate style, it may be easy for you to follow it years later and modify it, but it likely will not be the case for a coworker. If they have to modify the code, the cognitive load will not be too different from that of just dealing with well written code.

Disclaimer: I've written two nontrivial programs literate style that I continue to rely on and occasionally modify years after writing them. It works as advertised.

klibertp · 3 years ago

For point 3. - it's exactly the same with code that's not literate. Writing code is ultimately about expressing ideas using a language, which is really much closer to writing a novel than to drawing a plan for a bridge. As such, if you want to make your code understandable to other, you have to learn to write well. Just like with novels, there's no problem with having a personal style, or a specific flavor that comes from how you use the language, how you structure your sentences and paragraphs, how you guide a reader through the story.

In other words, the style varying between people is not a problem - bad writing is. And, unfortunately, in my experience very few programmers are capable of consciously producing good writing. The fact that most of the docs out there are barely-legible trash is a proof of this.

I'm sure that reading literate code from Charles Stross would be a blast. It would be exciting, sometimes surprising, but still clear, easy to navigate, structured in a way allowing for extension within a well thought-out framework. Unfortunately, when people without his talent try to use LP, they produce things on par with that unfinished fantasy novel you started writing in 8th grade.

Programming requires a bit of talent, but you can get by with lots of hard work. Literate programming is much harder than that and requires a lot of talent to be beneficial to the codebase. Without that, your LP code will be Fifty Shades of Twilight, and honestly, we don't need more of things like that.

BeetleB · 3 years ago

It's a nice perspective, but the fact that more people have read Fifty Shades/Twilight than the sum total of all who have read any of Charles Stross's works undercuts your point.

And while you add to point 3, it wasn't my main point.

Take any two exceptionally good writers who have very different styles. If one of them produces literate code, the other may be able to understand it very well, but it is unlikely that he can modify it, along with the prose, and maintain the quality of the literate document.

It's not just about bad writers, but incompatibly good ones.

WorldMaker · 3 years ago

I feel like problem 1 is on the cusp of solutions given the amount of money poured into tooling for "notebooks" like Jupyter. Notebooks are a form of literate programming. Projects written in Jupyter Notebooks are getting larger and scaling harder. I think a convergence should eventually happen that larger scale literate programming tasks can benefit immensely from the tooling investments in notebooks like Jupyter.

throwaquestion5 · 3 years ago

Problems 1 and 3 I could imagine. I would need to learn how to be a better writer to share a literate program.

As someone experienced in the topic, What's the biggest hurdle when trying to refactor the code?

shakna · 3 years ago

Refactoring becomes the dual problem space of both programming and editing.

It's simply more work - but that "more work" is vitally important, tedious, and resistant to any kind of automated help.

taeric · 3 years ago

I'd also say that a lot of the reason for refactoring is just... different. Literate programs are typically made to be fairly self contained works. If you have some general purpose code that is of use in the entire codebase, you can make that its own section and cover it when it is needed. Otherwise, you likely won't reach for the same coding strategies that are common outside of literate code.

I'm trying to find a way to describe this a bit better than the above. I think the easiest way to think about it, is that in most software projects you have a separate document that is the general architecture of the software. It is rare that you will need or want to refactor the architecture, so you try to keep that somewhat faithful to what the code is doing. In literate software, that high level architecture view is part of how you organize the code.

ilammy · 3 years ago

> What's the biggest hurdle when trying to refactor the code?

I'd guess it's updating cross-references in prose and rewriting chapters of documentation which no longer make sense after your refactoring.

I like literate programming in theory but the most common response I see to it is that writing self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.

mannykannot · 3 years ago

> self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.

There is no reason to believe they are any more likely to keep code self-documenting (or to succeed even if they try) - it is not as if it will not compile or run unless it is.

I see literate programming to be an attempt to put some rigor into the otherwise terminally vague concept of self-documenting code (conceptually, it is way beyond the platitudes in 'clean code', even though it came first.) It is, however, doomed to failure in practice because it always takes less information (and less skill) to merely specify what a program will do than it does to not only specify what it will do but also explain and justify that as a correct and efficient solution to a problem that matters.

Neither 'literate' nor 'self-documenting' code are objective concepts.

Koshkin · 3 years ago

Self-documenting code is fine - until someone starts wondering why code does what it does, or if someone wants to generate documentation. (No, the lazy style, "OpenFile - opens a file", does not cut it.)

QuikAccount · 3 years ago

Python resolves this with docstrings but in general, assuming it is not self explanatory while a function exist, is it really necessary to go the whole nine with literate programming instead of just adding a few comments to explain why this exist. Self-documenting code with explanations when necessary is how most codebases already are. At least most good ones.

falcolas · 3 years ago

I think the one potential mitigating factor is that new features can be entirely new "chapters". Thanks to the tangling, a feature that needs to be added in 10 different places in the code can be written completely separately from the rest of the code.

Additionally, bugs can be fixed in-situ, refactoring can occur at will, and neither would require the prose around them to change, since code being talked about (despite moving or undergoing small changes) still fulfills the original, documented, purpose.

ravel-bar-foo · 3 years ago

It seems like adding new code as "Chapters," unless pursued with a bit of self-discipline, may result in spahgetti which is worse than a non-literate style.

Imagine a multi-person project where every little feature gets its own file, and now the programmer has to find the source of the bug between interacting blocks of in code fragments split across multiple files, ehich are combined together by tooling.... oh wait, I think that describes just about any sufficiently large C or C++ project.

taeric · 3 years ago

The problem with self documenting code is that it doesn't help justify all of the parts into the whole. This is particularly troublesome in code where a refactor effectively isolated entire sections of the code, but the person that did the refactor didn't realize it, and now you have code that exists only for the sake of existing tests.

t0suj4 · 3 years ago

I interpret self documenting as writing the what in the code and writing comments about the why. While minimizing the places where you need to explain your code.

My role of thumb is that if it's not obvious why that particular line is there and removing it would break functionality, add a comment.

doliveira · 3 years ago

How are newcomers handled in those "self-documenting codebases"?

My favorite literate program still has to be the book "Physically Based Rendering". An optimized, feature rich ray tracer in the form of a textbook.

That said, I wouldn't personally want to try and collaborate on such a program with more than one other person. It would make for a great single-contributer OSS library though. Rubber duck debugging built right into the prose.

https://smile.amazon.com/gp/product/1541259335 is also a great book. As is Stanford GraphBase.

My personal bet is that it is probably easier to collaborate on something like this than you would think. The imposed structure of programs, in general, already makes a lot of collaboration tough.

eunoia · 3 years ago

Great book! It’s available online, for free at https://www.pbrt.org/.

You can also find older, physical editions on EBay for $10-$15.

svat · 3 years ago

I would go further: literate programming is not just "much more than" commenting code, because you can do LP without commenting much. The main thing in LP is the idea/orientation of writing as if you're writing something for a human reader. This does often lead to more comments, but even something like "here's the code" followed by lots of code can be LP, if you deem it sufficient for your intended audience. (Earlier comment of mine about target audience and not over-commenting: https://news.ycombinator.com/item?id=29871047)

This works well for people who are writers by nature (like Knuth who's always making edits and improvements to his books https://news.ycombinator.com/item?id=30149221). One problem though (and there are several) is that because this is so personal, nearly everyone who seriously tries LP ends up writing their own LP tool (including the author of this post!).

I'm somewhat hopeful the growing ubiquity of especially Jupyter notebooks leads to better, more universal tools for literate programming. Notebooks have always been a form of literate programming. Jupyter and its underlying formats are now ubiquitous enough with a lot of strong IDE support (across a variety of IDEs) that I'm hopeful a better convergence as a "general literate programming platform" from the notebook side may just be a matter of time. (Other than that a lot of strong LP proponents so far seem to mostly be oblivious to the happenings in Notebook spaces and vice versa, despite there being so much cross-over.)

Yes I agree (and share the hope). In fact, earlier today I was thinking about Peter Norvig's "pytudes" (https://github.com/norvig/pytudes) as good examples of literate programming — and they are notebooks. Also, last weekend I picked up some code I had written a few months ago, threw it all away and started writing it in a notebook (Colab) precisely for this "literate programming" reasons.

There's also "nbdev" (https://github.com/fastai/nbdev) which seems like it should be the best of both worlds, but I couldn't quite get it to work.

antirez · 3 years ago

I think likewise. When I had to write the radix tree implementation for Redis I faced two problems:

- I needed a stable implemention as soon as possible, I had a performance issued that needed to be solved by range queries.

- The radix tree was full of corner cases.

So I resorted to literate programming, which is in general very near to my usual programming style. You can find it in the rax.c file inside the Redis source code, as you can see as the algorithm is enunciated, the corresponding code is inplenented.

Other than that I wrote a very extensive fuzzer for the implementation. Result: after the initial development I don't think it was never targeted by serious bugs, and now the implementation is very easy to modify if needed.

lioeters · 3 years ago

For those curious, the extensively commented source code can be seen here:

https://github.com/redis/redis/blob/unstable/src/rax.c

Deleted Comment

yumiris · 3 years ago

Literate programming has been particularly useful for my "dotfile" configurations, such as .emacs, .vimrc, .zshrc and even the .gitconfig file.

I use one .org file to declare all of my configurations, and tangle them together into the aforementioned files. This keeps things pretty portable, and makes up for the unintuitive readability of many dotfiles.

It can also work for rudimentary shell scripts and other single-file goodies; however, scaling it to proper multi-file programs proves to be difficult, especially when multiple developers are involved.

syntaxfree · 3 years ago

This is a cool idea. Also so if you switch tools like WMs you know what you used to have even if it takes some work to reconstruct what that was. But have such a tangle of glued together and custom written tiling WM rice that I can never switch to anything every again.

sritchie · 3 years ago

Literate programming is going to feel far more powerful when we expand the definition to include:

- Smalltalk-ish things like writing suites of custom viewers for various types, - demos and examples in-line inside of a library - multiple stories about the same piece of code, but all with the ability to IMPORT the story as a library

I've been writing sicmutils[0] as a "literate library"; see the automatic differentiation implementation as an example[1].

A talk I gave yesterday at ELS[2] demos a much more powerful host that uses Nextjournal's Clerk[3] to power physics animations, TeX rendering etc, but all derived from a piece of Clojure source that you can pull in as a library, ignoring all of these presentation effects.

Code should perform itself, and it would be great if when people thought "LP" they imagined the full range of media through which that performance could happen.

[0] sicmutils: https://github.com/sicmutils/sicmutils

[1] autodiff namespace: https://github.com/sicmutils/sicmutils/blob/main/src/sicmuti...

[2] Talk code: https://github.com/sritchie/programming-2022

[3] Clerk: https://github.com/nextjournal/clerk

Don't forget to include "notebooks" in the expanded view of literate programming. The amount of code being written in Jupyter notebooks alone today in practice dwarves much of literate programming in preceding years.

lf-non · 3 years ago

I am not a big fan of the complex literate programming style involving code-generation which this article talks about.

But I recently discovered that Google's zx [1] scripting utility supports executing scripts in markdown documents and I combined it with httpie [2] and usql [3] for a bit of quick and dirty automation testing and api verification code and it worked out pretty well.

I imagine for most people nowadays jupyter or vscode notebooks are the closest it comes to practical literate programming.

[1] https://github.com/google/zx#markdown-scripts

[2] https://github.com/httpie/httpie

[3] https://github.com/xo/usql