For context, I'm a solo developer building UserJot. I've been recently looking deeper into integrating AI into the product but I've been wanting to go a lot deeper than just wrapping a single API call and calling it a day.
So this blog post is mostly my experience trying to reverse engineer other AI agents and experimenting with different approaches for a bit.
When you discuss caching, are you talking about caching the LLM response on your side (what I presume) or actual prompt caching (using the provider cache[0])? Curious why you'd invalidate static content?
I think I need to make this a bit more clear. I was mostly referring to caching the tools (sub-agents) if they are a pure function. But that may be a bit too speicific for the sake of this post.
i.e. you have a query that reads data that doesn't change often, so you can cache the result.
Nice post! Can you share a bit more about what variety of tasks you've used agents for? Agents can mean so many different things depending on who you're talking to. A lot of the examples seem like read-only/analysis tasks. Did you also work on tasks where agent took actions and changed state? If yes, did you find any differences in the patterns that worked for those agents?
Sure! So there are both read-only and write-only agents that I'm working on. Basically there's a main agent (main LLM) that is responsible for the overall flow (currently testing GPT-5 Mini for this) and then there are the sub-agents, like I mentioned, that are defined as tools.
Hopefully this isn't against the terms here, but I posted a screenshot here of how I'm trying to build this into the changelog editor to allow users to basically go:
1. What tickets did we recently close?
2. Nice, write a changelog entry for that.
3. Add me as author, tags, and title.
4. Schedule this changelog for monday morning.
Of course, this sounds very trivial on the surface, but it starts to get more complex when you think about how to do find and replace in the text, how to fetch tickets and analyze them, how to write the changelog entry, etc.
- Did you build your own or are you farming out to say Opencode?
- If you built your own, did you roll from scratch or use a framework? Any comments either way on this?
- How "agentic" (or constrained as the case may be) are your agents in terms of the tools you've provided them?
Not sure if I understand the question, but I'll do my best to answer.
I guess Agents/Agentic are too broad of a term. All of this is really an LLM that has a set of tools that may or may not be other LLMs. You don't really need a framework as long as you can make HTTP calls to openrouter or some other provider and handle tool calling.
I'm using the AI sdk as it plays very nicely with TypeScript and gives you a lot of interesting features like handling server-side/client-side tool calling and synchronization.
My current setup has a mix of tools, some of which are pure functions (i.e. database queries), some of which handle server-side mutations (i.e. scheduling a changelog), and some of which are supposed to run locally on the client (i.e. updating TipTap editor).
Again, hopefully this somewhat answers the question, but happy to provide more details if needed.
When you describe subagents, are those single-tool agents, or are they multi-tool agents with their own ability to reflect and iterate? (i.e. how many actual LLM calls does a subagent make?)
So I have a main agent that is responsible for streering the overall flow, and then there are the sub-agents that, as I mentioned, are stateless functions that are called by the main agent.
Now these could be anything really: API calls, pure computation, or even LLM calls.
My favorite post in a long time. Super straightforward, confirms my own early experiences but the author has gone further than I have and I can already see how his hard-won insight is going to save me time and money. One change I’m going to make immediately is to use cheaper/faster/simpler models for 3/4 of my tasks. This will also set things up nicely for having some tasks run on local models in the future.
I believe its quite unique as far as agents go. Runtime is config driven, so you can get caching, state management, security (oauth2, jwt) , Tools and MCP capabilities are granted based on scope allocation (file:read, api:write, map:generate) etc, retry handlers, push notifications / webhooks (for long running tasks). This means you can get a fully baked agent built in minutes, using just the CLI.
From there, when you need to customize and write you own business logic, you can code whatever you want and inherit all of AgentUps middleware, and have it as a plugin to the core. This is pretty neat, as it means you're plugins can be pinned as dependencies.
When I started digging into best practices for building agentic AI systems, I realized pretty quickly that the biggest challenge isn’t just making the technology smart, but making sure it’s secure, scalable, and future-proof. AI has to evolve with changing demands, and that only works when the development process is well-structured. I once looked into https://artjoker.net as your software development company in the usa, and what caught my attention was their strong use of DevOps, CI/CD pipelines, and cloud-native design. Those aren’t just buzzwords—they’re the kinds of things that keep systems reliable and adaptable as they grow. For me, the key takeaway has been that innovation has to be matched with stability. The systems that last are the ones built with a clear framework, open collaboration, and a real focus on long-term resilience.
Am I the only one who cannot stand this terrible AI generated writing style?
These awful three sentence abominations:
"Each subagent runs in complete isolation. The primary agent handles all the orchestration. Simple."
"No conversation. No “remember what we talked about.” Just task in, result out."
"No ambiguity. No interpretation. Just data."
AI is good at some things, but copywriting certainly isn't one of them. Whatever the author put into the model to get this output would have been better than what the AI shat out.
I'm genuinely curious, is it: a) the writing style you can't stand, b) the fact that this piece tripped your "this is written by AI" and it's AI-written stuff you can't stand? And what the % split between the two is.
(I find there's a growing push-back against being fed AI-anything, so when this is suspected it seems like it generates outsized reactions)
I’m pro-AI in general, but I hate the AI writing style that has gotten especially bad lately. It’s down to two things, neither of which are anti-AI sentiment.
Firstly, I find the tone of voice immensely irritating. It sounds like a mixture of LinkedIn broetry, a TEDx talk, and marketing speak. That’s irritating when a human does it, but it’s especially bad when AI applies it in cases where it’s jarringly wrong for the topic at hand.
I recently saw this example:
> This isn’t just “nicer syntax” — it’s a fundamental shift in how your software thinks.
It was talking about datetime representation in software development but it has the tone of voice of somebody earnestly gesticulating on stage while explaining how they are going to solve world hunger. This is like the uncanny valley except instead of it making me uneasy it just pisses me off.
Secondly, it’s so incredibly overused. You’ll see “it’s not X—it’s Y” three times in three consecutive paragraphs. It’s irritating the first time, so when I see it throughout a whole article, I get an exceptionally low opinion of whoever published it.
Having said that, this article wasn’t particularly bad.
The saccharin writing style would be bad in isolation, but bearable. The overexposure to it is what leads me to dislike it so much I think.
The fact is written by AI does add a layer of frustration because you know someone wrote something more human and more real, but all you get to see is what the model made of it after digestion.
I recently posted here how I’m seeing success with sub agent-based autonomous dev (not “vibe coding” as I actually review every line before I commit, but the same general idea). Different application, but I can confirm every one of the best practices described in this article, as I came to the same conclusions myself.
These are the same categories of coordination I've been looking at all day, trying to find the sweet spot in how complex the orchestration can be. I tried to add some context where the agents got it into a cycle of editing the code that another was testing and stuff like that.
I know my next step should be to give the agents a good git pattern but I'm having so much fun just finding the team organization that delivers better stuff. I have them consult with each other in tech choices and have picked what I would have picked
The consensus protocol for choices is one I really liked, and that will maybe do more self correction.
Ive been asking them to illustrate their flow of work and asking for decisions, I need to go back and see if that's the case. Probably would be made easier if I get my git experiment flow down.
The future is tooling for these. If we can team up enough that we get consensus approaching something 'safe' the tools we can give them to run in dry/live mode and have a human validate the process for a time and then once you have enough feedback move into the next thing needing fixing.
I have a lot of apps with cobra cli tooling that resides next to the server code. Being able to pump our docs into an mcp server for tool execution is giving me so many ideas.
"Subagent orchestration" is also a really quick win in Claude. You can just say "spawn a subagent to do each task in X, give it Y context".
This lets you a) run things in parallel if you want, but also b) keep the main agent's context clean, and therefore run much larger, longer running tasks without the "race against the context clock" issue.
I assume you're talking about Claude Code, right? If so, I very much agree with this. A lot of this was actually inspired by how easy it was to do in Claude Code.
I first experimented with allowing the main agent have a "conversation" with sub-agents. For example, I created a database of messages between the main agent and the sub-agents, and allowed both append to it. This kinda worked for a few messages but kept getting stuck on mid-tier models, such as GPT-5 mini.
But from my understanding, their implementation is also similar to the stateless functions I described. (happy to be proven wrong). Sub agents don't communicate back much aside from the final result, and they don't have a conversation history.
The live updates you see are mostly the application layer updating the UI which initially confused me.
For context, I'm a solo developer building UserJot. I've been recently looking deeper into integrating AI into the product but I've been wanting to go a lot deeper than just wrapping a single API call and calling it a day.
So this blog post is mostly my experience trying to reverse engineer other AI agents and experimenting with different approaches for a bit.
Happy to answer any questions.
[0]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
i.e. you have a query that reads data that doesn't change often, so you can cache the result.
Hopefully this isn't against the terms here, but I posted a screenshot here of how I'm trying to build this into the changelog editor to allow users to basically go:
https://x.com/ImSh4yy/status/1951012330487079342
1. What tickets did we recently close? 2. Nice, write a changelog entry for that. 3. Add me as author, tags, and title. 4. Schedule this changelog for monday morning.
Of course, this sounds very trivial on the surface, but it starts to get more complex when you think about how to do find and replace in the text, how to fetch tickets and analyze them, how to write the changelog entry, etc.
Hope this helps.
- Did you build your own or are you farming out to say Opencode? - If you built your own, did you roll from scratch or use a framework? Any comments either way on this? - How "agentic" (or constrained as the case may be) are your agents in terms of the tools you've provided them?
I guess Agents/Agentic are too broad of a term. All of this is really an LLM that has a set of tools that may or may not be other LLMs. You don't really need a framework as long as you can make HTTP calls to openrouter or some other provider and handle tool calling.
I'm using the AI sdk as it plays very nicely with TypeScript and gives you a lot of interesting features like handling server-side/client-side tool calling and synchronization.
My current setup has a mix of tools, some of which are pure functions (i.e. database queries), some of which handle server-side mutations (i.e. scheduling a changelog), and some of which are supposed to run locally on the client (i.e. updating TipTap editor).
Again, hopefully this somewhat answers the question, but happy to provide more details if needed.
Now these could be anything really: API calls, pure computation, or even LLM calls.
https://github.com/RedDotRocket/AgentUp
I believe its quite unique as far as agents go. Runtime is config driven, so you can get caching, state management, security (oauth2, jwt) , Tools and MCP capabilities are granted based on scope allocation (file:read, api:write, map:generate) etc, retry handlers, push notifications / webhooks (for long running tasks). This means you can get a fully baked agent built in minutes, using just the CLI.
From there, when you need to customize and write you own business logic, you can code whatever you want and inherit all of AgentUps middleware, and have it as a plugin to the core. This is pretty neat, as it means you're plugins can be pinned as dependencies.
Plugin Example: https://github.com/RedDotRocket/AgentUp-systools
You then end up with a portable agent, where anyone can clone the repo, `agentup run` and like docker, it pulls in all it needs and is serving.
Its currently aligned with the A2A specification, so it will talk to Pydantic, Langchain and Google Agent SDK developed agents.
Its early days, but getting traction.
Previous to this , I created sigstore and have been building OpenSource for many years.
The docs also give a good overview: https://docs.agentup.dev
These awful three sentence abominations:
"Each subagent runs in complete isolation. The primary agent handles all the orchestration. Simple." "No conversation. No “remember what we talked about.” Just task in, result out." "No ambiguity. No interpretation. Just data."
AI is good at some things, but copywriting certainly isn't one of them. Whatever the author put into the model to get this output would have been better than what the AI shat out.
(I find there's a growing push-back against being fed AI-anything, so when this is suspected it seems like it generates outsized reactions)
Firstly, I find the tone of voice immensely irritating. It sounds like a mixture of LinkedIn broetry, a TEDx talk, and marketing speak. That’s irritating when a human does it, but it’s especially bad when AI applies it in cases where it’s jarringly wrong for the topic at hand.
I recently saw this example:
> This isn’t just “nicer syntax” — it’s a fundamental shift in how your software thinks.
— https://news.ycombinator.com/item?id=44873145
It was talking about datetime representation in software development but it has the tone of voice of somebody earnestly gesticulating on stage while explaining how they are going to solve world hunger. This is like the uncanny valley except instead of it making me uneasy it just pisses me off.
Secondly, it’s so incredibly overused. You’ll see “it’s not X—it’s Y” three times in three consecutive paragraphs. It’s irritating the first time, so when I see it throughout a whole article, I get an exceptionally low opinion of whoever published it.
Having said that, this article wasn’t particularly bad.
The fact is written by AI does add a layer of frustration because you know someone wrote something more human and more real, but all you get to see is what the model made of it after digestion.
https://news.ycombinator.com/item?id=44893025
I know my next step should be to give the agents a good git pattern but I'm having so much fun just finding the team organization that delivers better stuff. I have them consult with each other in tech choices and have picked what I would have picked
The consensus protocol for choices is one I really liked, and that will maybe do more self correction.
Ive been asking them to illustrate their flow of work and asking for decisions, I need to go back and see if that's the case. Probably would be made easier if I get my git experiment flow down.
The future is tooling for these. If we can team up enough that we get consensus approaching something 'safe' the tools we can give them to run in dry/live mode and have a human validate the process for a time and then once you have enough feedback move into the next thing needing fixing.
I have a lot of apps with cobra cli tooling that resides next to the server code. Being able to pump our docs into an mcp server for tool execution is giving me so many ideas.
This lets you a) run things in parallel if you want, but also b) keep the main agent's context clean, and therefore run much larger, longer running tasks without the "race against the context clock" issue.
I first experimented with allowing the main agent have a "conversation" with sub-agents. For example, I created a database of messages between the main agent and the sub-agents, and allowed both append to it. This kinda worked for a few messages but kept getting stuck on mid-tier models, such as GPT-5 mini.
But from my understanding, their implementation is also similar to the stateless functions I described. (happy to be proven wrong). Sub agents don't communicate back much aside from the final result, and they don't have a conversation history.
The live updates you see are mostly the application layer updating the UI which initially confused me.