We should revisit literate programming in the agent era

I am not convinced.

- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).

- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.

- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.

What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).

Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.

bottd · 4 days ago

> If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-)

If good code was enough on its own we would read the source instead of documentation. I believe part of good software is good documentation. The prose of literate source is aimed at documentation, not line-level comments about implementation.

wvenable · 4 days ago

> If good code was enough on its own we would read the source instead of documentation.

That's 100% how I work -- reading the source. If the code is confusing, the code needs to be fixed.

WillAdams · 4 days ago

https://diataxis.fr/

(originally developed at: https://docs.divio.com/documentation-system/) --- divides documentation along two axes:

- Action (Practical) vs. Cognition (Theoretical)

- Acquisition (Studying) vs. Application (Working)

which for my current project has resulted in:

- readme.md --- (Overview) Explanation (understanding-oriented)

- Templates (small source snippets) --- Tutorials (learning-oriented)

- Literate Source (pdf) --- How-to Guides (problem-oriented)

- Index (of the above pdf) --- Reference (information-oriented)

AdieuToLogic · 4 days ago

> If good code was enough on its own we would read the source instead of documentation.

An axiom I have long held regarding documenting code is:

  Code answers what it does, how it does it, when it is used, 
  and who uses it.  What it cannot answer is why it exists.  
  Comments accomplish this.

necovek · 4 days ago

Having "grown up" on free software, I've always been quick to jump into code when documentation was dubious or lacking: there is only one canonical source of truth, and you need to be good at reading it.

Though I'd note two kinds of documentation: docs how software is built (seldom needed if you have good source code), and how it is operated. When it comes to the former, I jump into code even sooner as documentation rarely answers my questions.

Still, I do believe that literate programming is the best of both worlds, and I frequently lament the dead practice of doing "doctests" with Python (though I guess Jupyter notebooks are in a similar vein).

Usually, the automated tests are the best documentation you can have!

Verdex · 3 days ago

I do read the code instead of the documentation, whenever that is an option.

Interesting factiod. The number of times I've found the code to describe what the software does more accurately than the documentation: many.

The number of times I've found the documentation to describe what the software does more accurately than the code: never.

habinero · 4 days ago

> If good code was enough on its own we would read the source instead of documentation.

Uh. We do. We, in fact, do this very thing. Lots of comments in code is a code smell. Yes, really.

If I see lots of comments in code, I'm gonna go looking for the intern who just put up their first PR.

> I believe part of good software is good documentation

It is not. Docs tell you how to use the software. If you need to know what it does, you read the code.

pdntspa · 4 days ago

> because my prompts are in natural languages, and hence ambiguous.

Legalese developed specifically because natural language was too ambiguous. A similar level of specificity for prompting works wonders

One of the issues with specifying directions to the computer with code is that you are very narrowly describing how something can be done. But sometimes I don't always know the best 'how', I just know what I know. With natural language prompting the AI can tap into its training knowledge and come up with better ways of doing things. It still needs lots of steering (usually) but a lot of times you can end up with a superior result.

vnorilo · 4 days ago

Yes. LLMs are search engines into the (latent) space or source code. Stuff you put into the context window is the "query". I've had some good results by minimizing the conversational aspect, and thinking in terms of shaping the context: asking the LLM to analyze relevant files, nor because I want the analysis, but because I want a good reading in the context. LLMs will work hard to stay in that "landscape", even with vague prompts. Often better than with weirdly specific or conflicting instructions.

Dead Comment

awesome_dude · 4 days ago

> Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).

I loathe this take.

I have rocked up to codebases where there were specific rules banning comments because of this attitude.

Yes comments can lie, yes there are no guards ensuring they stay in lock step with the code they document, but not having them is a thousand times worse - I can always see WHAT code is doing, that's never the problem, the problems is WHY it was done in this manner.

I put comments like "This code runs in O(n) because there are only a handful of items ever going to be searched - update it when there are enough items to justify an O(log2 n) search"

That tells future developers that the author (me) KNOWS it's not the most efficient code possible, but it IS when you take into account things unknown by the person reading it

Edit: Tribal knowledge is the worst type of knowledge, it's assumed that everyone knows it, and pass it along when new people onboard, but the reality (for me) has always been that the people doing the onboarding have had fragments, or incorrect assumptions on what was being conveyed to them, and just like the childrens game of "telephone" the passing of the knowledge always ends in a disaster

AdieuToLogic · 4 days ago

> Yes comments can lie ...

Comments only lie if they are allowed to become one.

Just like a method name can lie. Or a class name. Or ...

larusso · 4 days ago

I don’t disagree here. I personally like to put the why into commit messages though. It’s my longtime fight to make people write better commit messages. Most devs I see describe what they did. And in most cases that is visible from the change-set. One has to be careful here as similar to line documentation etc everything changes with size. But I prefer if the why isn’t sprinkled between source. But I’m not dogmatic about it. It really depends.

palata · 3 days ago

IMHO, you shouldn't have to justify yourself ("yeah yeah, this is not optimal, I know it because I am not an idiot"). Just write your code in O(n) if that's good enough now. Later, a developer may see that it needs to be optimised, and they should assume that the previous developer was not an idiot and that it was fine with O(n), but now it's not anymore.

Or do you think that your example comment brings knowledge other than "I want you to know that I know that it is not optimal, but it is fine, so don't judge me"?

baq · 4 days ago

Docs and code work together as mutually error correcting codes. You can’t have the benefits of error detection and correction without redundant information.

ghywertelling · 4 days ago

> With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?

I think this is true. Your point supports it. If either the explanation / intention or the code changes, the other can be brought into sync. Beautiful post. I always hated the fact that research papers don't read like novels, eg "ohk, we tried this which was unsuccessful but then we found another adjacent approach and it helped."

Computer Scientist Explains One Concept in 5 Levels of Difficulty | WIRED

https://www.youtube.com/watch?v=fOGdb1CTu5c

Computer scientist Amit Sahai, PhD, is asked to explain the concept of zero-knowledge proofs to 5 different people; a child, a teen, a college student, a grad student, and an expert. Using a variety of techniques, Amit breaks down what zero-knowledge proofs are and why it's so exciting in the world of cryptography.

casey2 · 4 days ago

Programming languages are natural and ambiguous too, what does READ mean? you have to look it up to see the types. The power comes from the fact that it's audit-able, but that you don't need to audit it every time you want to write some code. You think you write good code? try to prove it after the compiler gets through with it.

Natural languages are richer in ideas, it may be harder to get working code going from a purely natural description to code, than code to code, but you don't gain much from just translating code. One is only limited by your imagination the other already exists, you could just call it as a routine.

You only have a SENSE for good code because it's a natural language with conventions and shared meaning. If the goal of programming is to learn to communicate better as humans then we should be fighting ambiguity not running from it. 100 years from now nobody is going to understand that your conventions were actually "good code".

musicale · 4 days ago

> Programming languages are natural and ambiguous too

Programming languages work because they are artificial (small, constrained, often based on algebraic and arithmetic expressions, boolean logic, etc.) and have generally well-defined semantics. This is what enables reliable compilers and interpreters to be constructed.

palata · 3 days ago

> Programming languages are natural and ambiguous too, what does READ mean?

"READ" is part of the "documentation in natural language". The compiler ignores it entirely, it's not part of the programming language per se. It is pure documentation for the developers, and it is ambiguous.

But the part that the compiler actually reads is non-ambiguous. It cannot deal with ambiguity, fundamentally. It cannot infer from the context that you wrote a line of code that is actually ironic, and it should therefore execute the opposite.

mexicocitinluez · 3 days ago

> Programming languages are natural and ambiguous too, what does READ mean?

Not nearly in the same sense actual language is ambiguous.

And ambiguity in programming is usually a bad thing, whereas in language it can usually be intended.

Good code, whatever that means, can read like a book. Event-driven architectures is a good example because the context of how something came to be is right in the event name itself.

LEDThereBeLight · 3 days ago

What is good code now is only good code because of the bad programming languages we’ve had to accept for the last hundred years because we’re tied to incremental improvements. We’re tied to static brittle types. But look at natural systems - they all use dynamic “languages.” When you get a cut, your flesh doesn’t throw an exception because it’s connected to the wrong “thing.” Maybe AI will redefine what good code means, because it’s better able to handle ambiguity.

psychoslave · 3 days ago

>Natural languages are ambiguous. That's the reason why we created programming languages.

Programming languages can be ambiguous too. The thing with formal languages is more that they put a stricter and narrower interpretation freedom as a convention where it's used. If anything there are a subset of human expression space. Sometime they are the best tool for the job. Sometime a metaphor is more apt. Sometime you need some humour. Sometime you better stay in ambiguity to play the game at its finest.

palata · 2 days ago

Programming languages are non-ambiguous, in the sense that there is no doubt what will be executed. It's deterministic. If the program crashes, you can't say "no but this line was a joke, you should have ignored it". Your code was wrong, period.

hosh · 4 days ago

I don’t have my LLMs generate literate programming. I do ask it to talk about tradeoffs.

I have full examples of something that is heavily commented and explained, including links to any schemas or docs. I have gotten good results when I ask an LLM to use that as a template, that not everything in there needs to be used, and it cuts down on hallucinations by quite a bit.

k32k · 4 days ago

"But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous."

Not only that, but there's something very annoying and deeply dissatisfying about typing a bunch of text into a thing for which you have no control over how its producing an output, nor can an output be reproduced even if the input is identical.

Agreed natural language is very ambiguous and becoming more ambiguous by the day "what exactly does 'vibe' mean?".

People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.

caseyohara · 4 days ago

> People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.

Surely you don’t mean everyone in the 1960s spoke directly, free of metaphor or euphemism or nuance or doublespeak or dog whistle or any other kind or ambiguity? Then why are there people who dedicate their entire life to interpreting religious texts and the Constitution?

gwbas1c · 3 days ago

> That's the reason why we created programming languages.

No, we created programming languages because when computers were invented:

1: They (computers) were incapable of understanding natural language.

2: Programming languages are easier to use than assembly or writing out machine code by hand.

LLMs are a quite recent invention, and require significantly more computing power than early computers had.

Deleted Comment

alkonaut · 3 days ago

Maybe if we had a really terse and unambiguous form of English? Whenever there is ambiguity we insert parentheses and operators to really make it clear what we mean. We can enclose different sentences in brackets to make sure that the scope of a logical condition and so on. Oh wait

> - Apply comments to code in all code paths and use idiomatic C# XML comments > - <summary> be brief, concise, to the point > - <remarks> add details and explain "why"; document reasoning and chain of thought, related files, business context, key decisions. > - <params> constraints and additional notes on usage > - inline comments in code sparingly where it helps clarify behavior

I have noticed a trend recently that some practices (writing a decent README or architecture, being precise and unambiguous with language, providing context, literate programming) that were meant to help humans were not broadly adopted with the argument that it's too much effort. But when done to help an LLM instead of a human a lot of people suddenly seem to be a lot more motivated to put in the effort.

zdragnar · 4 days ago

In my years of programming, I find that humans rarely give documentation more than a cursory glance up until they have specific questions. Then they ask another person if one is available rather than read for the answer.

The biggest problem is that humans don't need the documentation until they do. I recall one project that extensively used docblock style comments. You could open any file in the project and find at least one error, either in the natural language or the annotations.

If the LLM actually uses the documentation in every task it performs- or if it isn't capable of adequate output without it- then that's a far better motivation to document than we actually ever had for day to day work.

1718627440 · 4 days ago

I think this really depends on culture. If you target OS APIs or the libc, the documentation is stellar. You have several standards and then conceptual documentation and information about particular methods all with historic and current and implementation notes, then there is also an interactive hypertext system. I solve 80% of my questions with just looking at the official documentation, which is also installed on my computer. For the remaining I often try to use the WWW, but these are often so specific, that it is more successful to just read the code.

Once I step out of that ecosystem, I wonder how people even cope with the lack of good documentation.

suzzer99 · 4 days ago

The other problem is that documentation is always out of date, and one wrong answer can waste more time than 10 "I don't knows".

ijk · 4 days ago

I have discovered that the measure of good documentation is not whether your team writes documentation, but is instead determined by whether they read it.

ptak_dev · 3 days ago

This is the pattern I keep noticing too. A lot of "good engineering hygiene" that got dismissed as overhead is now paying dividends specifically because agents can consume it.

Detailed commit messages: ignored by most humans, but an agent doing a git log to understand context reads every one. Architecture decision records: nobody updates them, but an agent asked to make a change that touches a core assumption will get it wrong without them.

The irony is that the practices that make code legible to agents are the same ones that make it legible to a new engineer joining the team. We just didn't have a strong enough forcing function before.

hinkley · 4 days ago

Paraphrasing an observation I stole many years ago:

A bunch of us thought learning to talk to computers would get them out of learning to talk to humans and so they spent 4 of the most important years of emotional growth engaging in that, only to graduate and discover they are even farther behind everyone else in that area.

analog31 · 4 days ago

This raises an interesting point. I've speculated that if someone has a hard time expressing themselves to other humans verbally or in writing, they're also going to have a hard time writing human-readable code. The two things are rooted in the same basic abilities. Writing documentation or comments in the code at least gives someone two slim chances at understanding them, instead of just one.

I have the opposite problem. Granted, I'm not a software developer, but only use code as a problem solving tool. But once again, adding comments to my code gives me two slim chances of understanding it later, instead of one.

jpollock · 4 days ago

Documentation rots a lot more quickly than the code - it doesn't need to be correct for the code to work. You are usually better off ignoring the comments (even more so the design document) and going straight to the code.

hinkley · 4 days ago

I maintain you’re either grossly misappropriating the time and energy of new and junior devs if this is the case on your project, or you have gone too long since hiring a new dev and your project is stagnating because of it.

New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.

This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.

cmrdporcupine · 4 days ago

I've had LLMs proactively fix my inline documentation. Rather pleasant surprise: "I noticed the comment is out of date and does not reflect the actual implementation" even asking me if it should fix it.

jimbokun · 4 days ago

I find LLMs more diligent about keeping the documentation than any human developer, including myself.

jimbokun · 4 days ago

Well maybe if those people were managing one or more programmers and not writing the code themselves, they would have worked similarly.

what · 4 days ago

The difference is that they’re using the LLM to write those readmes and architecture and whatever else documents. They’re not putting any effort in.