> The key is to view the AI as a partner you can coach – progress over perfection on the first try
This is not how to use AI. You cannot scale the ladder of abstraction if you are babysitting a task at one rung.
If you feel that it’s not possible yet, that may be a sign that your test environment is immature. If it is possible to write acceptance tests for your project, then trying to manually coach the AI is just a cost optimization, you are simply reducing the tokens it takes the AI to get the answer. Whether that’s worth your time depends on the problem, but in general if you are manually coaching your AI you should stop and either:
1. Work on your pipeline for prompt generation. If you write down any relevant project context in a few docs, an AI will happily generate your prompts for you, including examples and nice formatting etc. Getting better at this will actually improve
2. Set up an end-to-end test command (unit/integration tests are fine too add later but less important than e2e)
These processes are how people use headless agents like CheepCode[0] to move faster. Generate prompts with AI and put them in a task management app like Linear, then CheepCode works on the ticket and makes a PR. No more watching a robot work, check the results at the end and only read the thoughts if you need to debug your prompt.
[0] the one I built - https://cheepcode.com
It doesn't take much more of a stretch to imagine teams of agents, coordinated by a "programme manager" agent, with "QA agents" working to defined quality metrics, "architect" agents that take initial requirements and break them down into system designs and github issues, and of course the super important "product owner" agent who talks to actual humans and writes initial requirements. Such a "software team system" would be another abstraction level above individual agents like Codex.
Now with headless agents (like CheepCode[0], the one I built) that connect directly to the same task management apps that we do as human programmers, you can get “good enough” PRs out of a single Linear ticket with no need to touch an IDE. For copy changes and other easy-to-verify tweaks this saves developers a lot of overhead checking out branches, making PRs, etc so they can stay focused on the more interesting/valuable work. At $1/task a “good enough” result is well worth it compared to the cost of human time.
Recently I converted all the (Google Docs) documentation of a project to markdown files and added those to the workspace. It now indexes it with RAG and can easily find relevant bits of documentation, especially in agent mode.
It really stresses the importance of getting your documentation and processes in order as well as making sure the tasks at hand are well-specified. It soon might be the main thing that requires human input or action.
In fact, I built an entirely headless coding agent for that reason: you put tasks in, you get PRs out, and you get journals of each run for debugging but it discourages micro-management so you stay in planning/documenting/architecting.
Those who can’t stop raving about how much of a superpower LLMs are for coding, how it’s made them 100x more productive, and is unlocking things they could’ve never done before.
And those who, like you, find it to be an extremely finicky process that requires extreme amount of coddling to get average results at best.
The only thing I don’t understand is why people from the former group aren’t all utterly dominating the market and obliterating their competitors with their revolutionary products and blazing fast iteration speed.
The result you described is coming soon. CheepCode[0] agents already produce working code in a satisfying percentage of cases, and I am at most 3 months away from it producing end-to-end apps and complex changes that are at least human-quality. It would take way less if I got funded to work on it full time.
Given that I'm this close as a solo founder with no employees, you can imagine what's cooking inside large companies.
[0] My product, cloud-based headless coding agents that connect directly to Linear, accept tickets, and submit GitHub PRs