What I learned building an opinionated and minimal coding agent

The OpenClaw/pi-agent situation seems similar to ollama/llama-cpp, where the former gets all the hype, while the latter is actually the more impressive part.

This is great work, I am looking forward how it evolves in the future. So far Claude Code seems best despite its bugs given the generous subscription, but when the market corrects and the prices will get closer to API prices, then probably the pay-per-token premium with optimized experience will be a better deal than to suffer Claude Code glitches and paper cuts.

The realization is that at the end agent framework kit that is customizable and can be recursively improved by agents is going to be better than a rigid proprietary client app.

Aurornis · 7 days ago

> but when the market corrects and the prices will get closer to API prices

I think it’s more likely that the API prices will decrease over time and the CC allowances will only become more generous. We’ve been hearing predictions about LLM price increases for years but I think the unit economics of inference (excluding training) are much better than a lot of people think and there is no shortage of funding for R&D.

I also wouldn’t bet on Claude Code staying the same as it is right now with little glitches. All of the tools are going to improve over time. In my experience the competing tools aren’t bug free either but they get a pass due to underdog status. All of the tools are improving and will continue to do so.

nl · 7 days ago

> I think it’s more likely that the API prices will decrease over time and the CC allowances will only become more generous.

I think this is absolutely true. There will likely be caps to stop the people running Ralph loops/GasTown with 20 clients 24/7, but for general use they will probably start to drop the API prices rather than vice-versa.

> We’ve been hearing predictions about LLM price increases for years but I think the unit economics of inference (excluding training) are much better than a lot of people think

Inference is generally accepted to be a very profitable business (outside the HN bubble!).

Claude Code subscriptions are more complicated of course but I think they probably follow the general pattern of most subscription software - lots of people who hardly use it, and a few who push it very hard can they lose money on. Capping the usage solves the "losing money" problem.

badlogic · 7 days ago

FWIW, you can use subscriptions with pi. OpenAI has blessed pi allowing users to use their GPT subscriptions. Same holds for other providers, except Flicker Company.

And I'm personally very happy that Peter's project gets all the hype. The pi repo already gets enough vibesloped PRs from openclaw users as is, and its still only 1/100th of what the openclaw repository has to suffer through.

kloud · 7 days ago

Good to know, that makes it even better. I still find Opus 4.5 to be the best model currently. But if next generation of GPT/Gemini close the gap that will cross the inflection point for me and make 3rd party harnesses viable. Or if they jump ahead, that should put more pressure on the Flicker Company to fix the flicker or relax the subscriptions.

MillionOClock · 7 days ago

Is this something that OpenAI explicitly approves per project? I have had a hard time understanding what their exact position is.

andai · 7 days ago

This is basically identical to the ChatGPT/GPT-3 situation ;) You know OpenAI themselves keep saying "we still don't understand why ChatGPT is so popular... GPT was already available via API for years!"

smokel · 7 days ago

ChatGPT is quite different from GPT. Using GPT directly to have a nice dialogue simply doesn't work for most intents and purposes. Making it usable for a broad audience took quite some effort, including RLHF, which was not a trivial extension.

jrm4 · 7 days ago

This is the first I'm hearing of this pi-agent thing and HOW DO PEOPLE TECH DECIDE TO NAME THINGS?

Seriously. Is creator not aware that "pi" absolutely invokes the name of another very important thing? sigh.

haxel · 7 days ago

The creator is very aware. Its original name was "shitty coding agent".

https://shittycodingagent.ai/

greenchair · 7 days ago

Developers are the worst at naming things. This is a well known fact.

SatvikBeri · 7 days ago

From the article: "So what's an old guy yelling at Claudes going to do? He's going to write his own coding agent harness and give it a name that's entirely un-Google-able, so there will never be any users. Which means there will also never be any issues on the GitHub issue tracker. How hard can it be?"

ohyoutravel · 7 days ago

And like ollama it will no doubt start to get enshittified.

threecheese · 7 days ago

Only if it enters YC (like Ollama).

> from copying and pasting code into ChatGPT, to Copilot auto-completions [...], to Cursor, and finally the new breed of coding agent harnesses like Claude Code, Codex, Amp, Droid, and opencode

Reading HN I feel a bit out of touch since I seem to be "stuck" on Cursor. Tried to make the jump further to Claude Code like everyone tells me to, but it just doesn't feel right...

It may be due to the size of my codebase -- I'm 6 months into solo developer bootstrap startup, so there isn't all that much there, and I can iterate very quickly with Cursor. And it's mostly SPA browser click-tested stuff. Comparatively it feels like Claude Code spends an eternity to do something.

(That said Cursor's UI does drive me crazy sometimes. In particular the extra layer of diff-review of AI changes (red/green) which is not integrated into git -- I would have preferred that to instead actively use something integrated in git (Staged vs Unstaged hunks). More important to have a good code review experience than to remember which changes I made vs which changes AI made..)

iterateoften · 7 days ago

For me cursor provides a much tighter feedback loop than Claude code. I can review revert iterate change models to get what I need. It feels sometimes Claude code is presented more as a yolo option where you put more trust on the agent about what it will produce.

I think the ability to change models is critical. Some models are better at designing frontend than others. Some are better at different programming languages, writing copy, blogs, etc.

I feel sabotaged if I can’t switch the models easily to try the same prompt and context across all the frontier options

cjonas · 7 days ago

Same. For actual productions app I'm typically reviewing the thinking messages and code changes as they happen to ensure it stays on the rails. I heavily use the "revert" to previous state so I can update the prompt with more accurate info that might have come out of the agents trial and error. I find that if I don't do this, the agent makes a mess that often doesn't get cleaned up on its way to the actually solution. Maybe a similar workflow is possible with Claude Code...

sibellavia · 7 days ago

Probably an ideal compromise solution for you would be to install the official Claude Code extension for VS Code, so you have an IDE for navigating large, complex codebases while still having CC integration.

mkreis · 7 days ago

Bootstrapped solo dev here. I enjoyed using Claude to get little things done which I happed on my TODO list below the important stuff, like updating a landing page, or in your case perhaps adding automated testing for the frontend stuff (so you don't have to click yourself). It's just nice having someone coming up with a proposal on how to implement something, even it's not the perfect way, it's good as a starter. Also I have one Claude instance running to implement the main feature, in a tight feedback loop so that I know exactly what it's doing. Yes, sometimes it takes a bit longer, but I use the time checking what the other Claudes are doing...

andai · 7 days ago

Claude Code spends most of its time poking around the files. It doesn't have any knowledge of the project by default (no file index etc), unless they changed it recently.

When I was using it a lot, I created a startup hook that just dumped a file listing into the context, or the actual full code on very small repos.

I also got some gains from using a custom edit tool I made which can edit multiple chunks in multiple files simultaneously. It was about 3x faster. I had some edge cases where it broke though, so I ended up disabling it.

leerob · 7 days ago

> in particular the extra layer of diff-review of AI changes (red/green) which is not integrated into git

We're making this better very soon! In the coming weeks hopefully.

dagss · 7 days ago

That's great news.

I see in your public issue tracker that a lot of people are desperate simply for an option to turn that thing off ("Automatically accept all LLM changes"). Then we could use any kind of plugin really for reviews with git.

SatvikBeri · 7 days ago

Seems like there's a speed/autonomy spectrum where Cursor is the fastest, Codex is the best for long-running jobs, and Claude is somewhere in the middle.

Personally, I found Cursor to be too inaccurate to be useful (possibly because I use Julia, which is relatively obscure) – Opus has been roughly the right level for my "pair programming" workflow.

dagss · 7 days ago

I mainly use Opus as well, Cursor isn't tied to any AI model and both Opus and Sonnet and a lot of others are available. Of course there's differences in how the context is managed, but Opus is usually amazing in Cursor at least.

I will very quickly @- the parts of the code that are relevant to get the context up and running right away. Seems in Claude that's harder..

(They also have their own, "Composer 1", which is just lightning fast compared to the others...and sometimes feels as smart as Opus, but now and then don't find the solution if it's too complicated and I have to ask Opus to clean it up. But if there's simple stuff I switch to it.)

pests · 7 days ago

> remember which changes I made vs which changes AI made..

They are improving this use case too with their enhanced blame. I think it was mentioned in their latest update blog.

You'll be able to hover over lines to see if you wrote it, or an AI. If it was an AI, it will show which model and a reference to the prompt that generated it.

I do like Cursor quite a lot.

dagss · 6 days ago

Sounds good, but also need an option to auto-approve all the changes in their local "replica of git".

(if one already exists, someone needs to tell the public Cursor issue tracker)

benjaminfh · 7 days ago

Really awesome and thoughtful thing you've built - bravo!

I'm so aligned on your take on context engineering / context management. I found the default linear flow of conversation turns really frustrating and limiting. In fact, I still do. Sometimes you know upfront that the next thing you're to do will flood/poison the nicely crafted context you've built up... other times you realise after the fact. In both cases, you didn't have that many alternatives but to press on... Trees are the answer for sure.

I actually spent most of Dec building something with the same philosphy for my own use (aka me as the agent) when doing research and ideation with LLMs. Frustrated by most of the same limitations - want to build context to a good place then preserve/reuse it over and over, fire off side quests etc, bring back only the good stuff. Be able to traverse the tree forwards and back to understand how I got to a place...

Anyway, you've definitely built the more valuable incarnation of this - great work. I'm glad I peeled back the surface of the moltbot hysteria to learn about Pi.

visarga · 7 days ago

> want to build context to a good place then preserve/reuse it over and over, fire off side quests etc, bring back only the good stuff

My attempt - a minimalist graph format that is a simple markdown file with inline citations. I load MIND_MAP.md at the start of work, and update it at the end. It reduces context waste to resume or spawn subagents. Memory across sessions.

https://pastebin.com/VLq4CpCT

a1ff00 · 7 days ago

This is incredible. It never occurred to me to even think of marrying memory gather and update slash commands as a mindmap that follows the appropriate node and edge. It makes so much sense.

I was using table structure with column 1 as a key, and col 2 as the data, and told the agents to match key before looking at Col 2. It worked, but sometimes it failed spectacularly.

I’m going to try this out. Thanks for sharing your .md!

benjaminfh · 5 days ago

Super interesting. I need to give it a proper read through with fresh eyes!

I just posted a Show HN re my graph storage for research chat sessions - curious on your thoughts!

Aditya_Garg · 7 days ago

Very very cool. Going to try this out on some of my codebases. Do you have the gist that helps the agent populate the mindmap for an existing codebase? Your pastebin mentions it, but I dont see it linked anywhere.

bizzletk · 7 days ago

I love this idea, and have immediately put it to use in my own work.

Would you mind publishing the `PROJECT_MIND_MAPPING.md` file that's referenced in `MIND_MAP.md'?

msp26 · 7 days ago

> Special shout out to Google who to this date seem to not support tool call streaming which is extremely Google.

Google doesn't even provide a tokenizer to count tokens locally. The results of this stupidity can be seen directly in AI studio which makes an API call to count_tokens every time you type in the prompt box.

AI studio also has a bug that continuously counts the tokens, typing or not, with 100% CPU usage.

Sometimes I wonder who is drawing more power, my laptop or the TPU cluster on the other side.

Havoc · 7 days ago

Same for clause code. It’s constantly sending token counting requests

localhost · 7 days ago

tbf neither does anthropic

valleyer · 7 days ago

> If you look at the security measures in other coding agents, they're mostly security theater. As soon as your agent can write code and run code, it's pretty much game over.

At least for Codex, the agent runs commands inside an OS-provided sandbox (Seatbelt on macOS, and other stuff on other platforms). It does not end up "making the agent mostly useless".

chr15m · 7 days ago

Approval should be mandatory for any non-read tool call. You should read everything your LLM intends to do, and approve it manually.

"But that is annoying and will slow me down!" Yes, and so will recovering from disastrous tool calls.

hk__2 · 7 days ago

You’ll just end up approving things blindly, because 95% of what you’ll read will seem obviously right and only 5% will look wrong. I would prefer to let the agent do whatever they want for 15 minutes and then look at the result rather than having to approve every single command it does.

mbrock · 7 days ago

That kind of blanket demand doesn't persuade anyone and doesn't solve any problem.

Even if you get people to sit and press a button every time the agent wants to do anything, you're not getting the actual alertness and rigor that would prevent disasters. You're getting a bored, inattentive person who could be doing something more valuable than micromanaging Claude.

Managing capabilities for agents is an interesting problem. Working on that seems more fun and valuable than sitting around pressing "OK" whenever the clanker wants to take actions that are harmless in a vast majority of cases.

It’s not just annoying; at scale it makes using the agent clis impossible. You can tell someone spends a lot of time in Claude Code: they can type —dangerously-skip-permissions with their eyes closed.

theshrike79 · 6 days ago

This is like having a firewall on your desktop where you manually approve each and every connection.

Secure, yes? Annoying, also yes. Very error-prone too.

0xbadcafebee · 7 days ago

It's not reliable. The AI can just not prompt you to approve, or hide things, etc. AI models are crafty little fuckers and they like to lie to you and find secret ways to do things with alterior motives. This isn't even a prompt injection thing, it's an emergent property of the model. So you must use an environment where everything can blow up and it's fine.

beacon294 · 7 days ago

My codex just uses python to write files around the sandbox when I ask it to patch a sdk outside its path.

Sharlin · 7 days ago

It's definitely not a sandbox if you can just "use python to write files" outside of it o_O

Is it asking you permission to run that python command? If so, then that's expected: commands that you approve get to run without the sandbox.

The point is that Codex can (by default) run commands on its own, without approval (e.g., running `make` on the project it's working on), but they're subject to the imposed OS sandbox.

This is controlled by the `--sandbox` and `--ask-for-approval` arguments to `codex`.

lvl155 · 7 days ago

You really shouldn’t be running agents outside of a container. That’s 101.

embedding-shape · 7 days ago

Bit more general; don't run agents without some sort of restriction to what they can do provided by the OS in some way. Containers is one way, VMs another, most cases it's enough with just a chroot and using the unix permission system the rest of your system already uses.

What happens if I do?

What's the difference between resetting a container or resetting a VPS?

On local machine I have it under its own user, so I can access its files but it cannot access mine. But I'm not a security expert, so I'd love to hear if that's actually solid.

On my $3 VPS, it has root, because that's the whole point (it's my sysadmin). If it blows it up, I wanna say "I'm down $3", but it doesn't even seem to be that since I can just restore it from an backup.

xXSLAYERXx · 7 days ago

I'm trying to understand this workflow. I have just started using codex. Literally 2 days in. I have it hooked up to my githbub repo and it just runs in the cloud and creates a pr. I have it touching only UI and middle layer code. No db changes, I always tell it to not touch the models.

maleldil · 7 days ago

Does Codex randomly decide to disable the sandbox like Claude Code does?

mustaphah · 7 days ago

I've seen a couple of power users already switching to Pi [1], and I'm considering that too. The premise is very appealing:

- Minimal, configurable context - including system prompts [2]

- Minimal and extensible tools; for example, todo tasks extension [3]

- No built-in MCP support; extensions exist [4]. I'd rather use mcporter [5]

Full control over context is a high-leverage capability. If you're aware of the many limitations of context on performance (in-context retrieval limits [6], context rot [7], contextual drift [8], etc.), you'd truly appreciate Pi lets you fine-tune the WHOLE context for optimal performance.

It's clearly not for everyone, but I can see how powerful it can be.

---

[1] https://lucumr.pocoo.org/2026/1/31/pi/

[2] https://github.com/badlogic/pi-mono/tree/main/packages/codin...

[3] https://github.com/mitsuhiko/agent-stuff/blob/main/pi-extens...

[4] https://github.com/nicobailon/pi-mcp-adapter

[5] https://github.com/steipete/mcporter

[6] https://github.com/gkamradt/LLMTest_NeedleInAHaystack

[7] https://research.trychroma.com/context-rot

[8] https://arxiv.org/html/2601.20834v1

CuriouslyC · 7 days ago

Pi is the part of moltXYZ that should have gone viral. Armin is way ahead of the curve here.

The Claude sub is the only think keeping me on Claude Code. It's not as janky as it used to be, but the hooks and context management support are still fairly superficial.

WA · 7 days ago

Author of Pi is Mario, not Armin, but Armin is a contributor

simonw · 7 days ago

Armin Ronacher wrote a good piece about why he uses Pi here: https://lucumr.pocoo.org/2026/1/31/pi/

I hadn't realized that Pi is the agent harness used by OpenClaw.

zby · 7 days ago

Pi has probably the best architecture and being written in Javascript it is well positioned to use the browser sandbox architecture that I think is the future for ai agents.

I only wish the author changed his stance on vendor extensions: https://github.com/badlogic/pi-mono/discussions/254

brimtown · 7 days ago

“standardize the intersection, expose the union” is a great phrase I hadn’t heard articulated before

I've got the wording from an llm. I knew there was this pattern in all traditional tools - but I did not know the name.

zarathustreal · 7 days ago

You’ve never heard it before because explicitly signaling “I know basic set theory” is kind of cringy