I work at Ramp and have always been on the “luddite” side of AI code tools. I use them but usually I’m not that impressed and a curmudgeon when I see folks ask Claude to debug something instead of just reading the code. I’m just an old(er) neckbeard at heart.
But. This tool is scarily good. I’m seeing it “1-shot” features in a fairly sizable code base and fixes with better code and accuracy than me.
An important point here is that it isnt doing a 1-shot implementation, it is iteratively solving a problem over multiple iterations, with a closed feedback loop.
Create the right agentic feedback loop and a reasoning model can perform far better through iteration than its first 1-shot attempt.
This is very human. How much code can you reliable write without any feedback? Very little. We iterate, guided by feedback (compiler, linter, executing and exploring)
We use https://devin.ai for this and it works very well. Devin has it's own virtual environment, IDE, terminal and browser. You can configure it to run your application and connect to whatever it needs. Devin can modify the app, test changes in the browser and send you a screen recording of the working feature with a PR.
This is a great writeup! Could you share more about the sandbox <-> client communication architecture? e.g., is the agent emitting events to a queue/topic, writing artifacts to object storage, and the client subscribes; or is it more direct (websocket/gRPC) from the sandbox? I’ve mostly leaned on sandbox.exec() patterns in Modal, and I’m curious what you found works best at scale.
This is a really great post - and what they've built here is very impressive.
I wonder if we're at the point where the cost of building and maintaining this yourselves (assisted with an AI Copilot) is now more effective than an off-the-shelf?
It feels like there's a LOT of moving parts here, but also it's deeply tailored to their own setup.
FWIW - I tried pointing Claude at the post and asking it to design an implementation, (like the post said to do) and it struggled - but perhaps I prompted it wrong.
I had this exact idea, I pointed Codex to it, with giving it context of our environment which is pretty complex. It is struggling, but that is because even our dev experience where I work is not great and not documented, so that would need to be lifted before I can reliably get an agent setup as well integrated as this blog post details.
This kind of project totally shows that Claude Code is nothing special, if anything it lacks a lot of features. I hope every company develops a model agnostic coding agent rather than using a one tightly controlled by one company.
Yes. I don't think that one-size-fits-all is the future of coding agents. Different companies have different requirements. I would like to build specialised test harnesses that internal coding agents could use to iterate rapidly.
Also, inevitably these AI companies will start selling out data and become part of the surveillance state, if they're not already.
But. This tool is scarily good. I’m seeing it “1-shot” features in a fairly sizable code base and fixes with better code and accuracy than me.
Create the right agentic feedback loop and a reasoning model can perform far better through iteration than its first 1-shot attempt.
This is very human. How much code can you reliable write without any feedback? Very little. We iterate, guided by feedback (compiler, linter, executing and exploring)
Surprised they need both.
Web app submits the prompt, a sandbox starts on sprites.dev and any Claude output in the sandbox gets piped to the web app for display.
Not sure I can open source it as it's something I built for a client, but ask if you have any questions.
I wonder if we're at the point where the cost of building and maintaining this yourselves (assisted with an AI Copilot) is now more effective than an off-the-shelf?
It feels like there's a LOT of moving parts here, but also it's deeply tailored to their own setup.
FWIW - I tried pointing Claude at the post and asking it to design an implementation, (like the post said to do) and it struggled - but perhaps I prompted it wrong.
Also, inevitably these AI companies will start selling out data and become part of the surveillance state, if they're not already.
Deleted Comment
Claude code locally in a vm and/or with work trees will 1 shot far better without burning cloud infra cash.
I’d bet this ends up wasting more money and time than it’s worth in practice.