Surely these sorts of problems must be worked upon from a mathematical standpoint.
Will have a think about how this can extended to other types of uses.
I have personally been trying to replace all tools/MCPs with a single “write code” tool which is a bit harder to get to work reliably in large projects.
My guess is it’s a top level folder which shows the cross module deps.
"The readymade components we use are essentially compressed bundles of context—countless design decisions, trade-offs, and lessons are hidden within them. By using them, we get the functionality without the learning, leaving us with zero internalized knowledge of the complex machinery we've just adopted. This can quickly lead to sharp increase in the time spent to get work done and sharp decrease in productivity."
If you want to go all in on specs, you must fully commit to allowing the AI to regenerate the codebase from scratch at any point. I'm an AI optimist, but this is a laughable stance with current tools.
That said, the idea of operating on the codebase as a mutable, complex entity, at arms length, makes a TON of sense to me. I love touching and feeling the code, but as soon as there's 1) schedule pressure and 2) a company's worth of code, operating at a systems level of understanding just makes way more sense. Defining what you want done, using a mix of user-centric intent and architecture constraints, seems like a super high-leverage way to work.
The feedback mechanisms are still pretty tough, because you need to understand what the AI is implicitly doing as it works through your spec. There are decisions you didn't realize you needed to make, until you get there.
We're thinking a lot about this at https://tern.sh, and I'm currently excited about the idea of throwing an agentic loop around the implementation itself. Adversarially have an AI read through that huge implementation log and surface where it's struggling. It's a model that gives real leverage, especially over the "watch Claude flail" mode that's common in bigger projects/codebases.
On your homepage there is a mention that Tern “writes its own tools”, could you give an example on how this works?
You could mark items in the feed to space repeat for yourself. This would also function as a “retweet”, which would align incentives such that content that gets promoted is actually durably useful or interesting. The posts people make would repeat to themselves too, so the source content should be good.
Also could think of it a little like a “Wikipedia of flashcards”.
Would you be interested in working on something like this?