As an example for one project, I realized things were getting better after it started writing integration tests. I wasn't sure if that was the act of writing the test forced it to reason about the they black box way the system would be used, or if there was another factor. Turns out it was just example usage. Extracting out the usage patterns into both the README and CLAUDE.md was itself a simple request, then I got similar performance on new tasks.
This isn't always true--some conversations go poorly and it's better to reset and start over--but it usually is.
As an example for one project, I realized things were getting better after it started writing integration tests. I wasn't sure if that was the act of writing the test forced it to reason about the they black box way the system would be used, or if there was another factor. Turns out it was just example usage. Extracting out the usage patterns into both the README and CLAUDE.md was itself a simple request, then I got similar performance on new tasks.
In your working mental model, you have broad understandings of the broader domain. You have broad understandings of the architecture. You summarize broad sections of the program into simpler ideas. module_a does x, module_b does y, insane file c does z, and so on. Then there is the part of the software you're actively working on, where you need more concrete context.
So as you move towards the central task, the context becomes more specific. But the vague outer context is still crucial to the task at hand. Now, you can certainly find ways to summarize this mental model in an input to an LLM, especially with increasing context windows. But we probably need to understand how we would better present these sorts of things to achieve performance similar to a human brain, because the mechanism is very different.
If you know how to do something, then you can give Claude the broad strokes of how you want it done and -- if you give enough detail -- hopefully it will come back with work similar to what you would have written. In this case it's saving you on the order of minutes, but those minutes add up. There is a possibility for negative time saving if it returns garbage.
If you don't know how to do something then you can see if an AI has any ideas. This is where the big productivity gains are, hours or even days can become minutes if you are sufficiently clueless about something.
I do wonder whether your code does what you think it does. Similar-sounding keywords in different languages can have completely different meanings. E.g. the volatile keyword in Java vs C++. You don't know what you don't know, right? How do you know that the AI generated code does what you think it does?
Maybe you can stay bootstrapped longer with less capital. That seems true.
With tools there is no equivalent. Maybe you could try some semantic similarity to the tool description, but I don't know of any system that does that.
What seems to be happening is building distinct "agents" that have a set of tools designed into them. An Agent is a system prompt+tools, where some of tools might be the ability to call/handoff to other agents. Each call to an agent is a new context, albeit with some limited context handed in from the caller agent. That way you are manually decomposing the project into a distinct set of sub-agents that can be concretely reasoned about and can perform a small set of related tasks. Then you need some kind of overall orchestration agent that can handle dispatch to other agents.
You bought a subscription in part for the value of one or more sections, but also because culturally everyone else did too, and once you have it you probably found something interesting. Few people ever read everything in the paper front to back regularly. That is why both then and now headlines were important.