Don't Build Multi-Agents

I'm building a simple agent accessible over SMS for a family member. One of their use cases is finding recipes. A problem I ran into was that doing a web search for recipes would pull tons of web pages into the context, effectively clobbering the system prompt that told the agent to format responses in a manner suited for SMS. I solved this by creating a recipe tool that uses a sub-agent to do the web search and return the most promising recipe to the main agent. When the main agent uses this tool instead of performing the web search itself, it is successfully able to follow the system prompt's directions to format and trim the recipe for SMS. Using this sub-agent to prevent information from entering the context dramatically improved the quality of responses. More context is not always better!

I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.

edoceo · 6 days ago

Are you in USA? How to get around those 10DLC limits on typical SMS/API things (eg Twilio). Or did you go through that process (which seems a lot for a private use-case)

colonCapitalDee · 6 days ago

I am in the USA! Although these days that exclamation point doesn't feel great...

I'm using an old Android phone (Pixel 2 from 2017), a 5$ a month unlimited SMS plan from Tello, and https://github.com/capcom6/android-sms-gateway. For bonus points (I wanted to roll my own security, route messages from different numbers to prod and ppe instances of my backend, and dedup messages) I built a little service in Go that acts as an intermediary between my backend and android-sms-gateway. I deploy this service to my android device using ADB, android-sms-gateway talks to it, and it talks to my backend. I also rooted the android device so I could disable battery management for all apps (don't do this if you want to walk around with the phone of course). It works pretty well!

I plan to open-source this eventually TM, but first I need to decouple my personal deployment infra from the bits useful to everyone else

Dead Comment

_0ffh · 5 days ago

You mean sub-agent as in the formatting agent calls on the the search-and-filter agent? In that case you might just make a pipeline. Use a search agent, then a filter agent (or maybe only one search-and-filter agent), then a formatting agent. Lots of tasks work better with a fixed pipeline than with freely communicating agents.

tfirst · 5 days ago

The article addresses this specific use under the 'Claude Code Subagents' section.

> The benefit of having a subagent in this case is that all the subagent’s investigative work does not need to remain in the history of the main agent, allowing for longer traces before running out of context.

jauntywundrkind · 5 days ago

This very narrow very specific single-purpose task-oriented subagent was one of the first things talked about in this every lovely recent & popular submission (along with other fun to read points):

What makes Claude Code so damn good (and how to recreate that magic in your agent)!?https://minusx.ai/blog/decoding-claude-code/https://news.ycombinator.com/item?id=44998295

Deleted Comment

faangguyindia · 5 days ago

History is nothing just list of dict containing role and messages? You can evict any entry at any point.

sitkack · 5 days ago

The large models have all the recipes memorized, you don't need to do a search.

simianwords · 5 days ago

Why are you reinventing the wheel? Just use gpt api with search turned on.

knlam · 5 days ago

You want to use multiple providers, so if I am not happy with result from gpt, I can switch to perplexity or something else. The power of plug and play is very powerful when you are building agent/subagent systems

We're in the context engineering stone age. You the engineer shouldn't be trying to curate context, you should be building context optimization/curation engines. You shouldn't be passing agents context like messages, they should share a single knowledge store with the parent, and the context optimizer should just optimally pack their context for the task description.

hansvm · 6 days ago

You're not wrong. This is just a storage/retrieval problem. But ... the current systems have limits. If you want commercial success in <3yrs, are any of those ideas remotely viable?

CuriouslyC · 6 days ago

Oh yeah, and if you tried to do one now it'd be a bad idea because I'm almost done :)

The agentic revolution is very different from the chatbot/model revolution because agents aren't a model problem, they're a tools/systems/process problem. Honestly the models we have now are very close to good enough for autonomous engineering, but people aren't giving them the right tools, the right processes, we aren't orchestrating them correctly, most people have no idea how to benchmark them to tune them, etc. It's a new discipline and it's very much in its infancy.

adastra22 · 5 days ago

Dude, ChatGPT isn’t even 3 years old. That’s an eternity.

pglevy · 5 days ago

Not an engineer but I think this is where my mind was going after reading the post. Seems like what will be useful is continuously generated "decision documentation." So the system has access to what has come before in a dynamic way. (Like some mix of RAG with knowledge graph + MCP?) Maybe even pre-outlining "decisions to be made," so if an agent is checking in, it could see there is something that needs to be figured out but hasn't been done yet.

CuriouslyC · 5 days ago

I actually have a "LLM as a judge" loop on all my codebases. I have an architecture panel that debates improvements given an optimization metric and convergence criteria and I feed their findings into a deterministic spec generator (cue /w validation) that can emit unit/e2e tests, scaffold terraform. It's pretty magical.

This cue spec gets decomposed into individual tasks by an orchestrator that does research per ticket and bundles it.

kordlessagain · 5 days ago

Great insight!

*We're hand-crafting context like medieval scribes when we should be building context compilers.*

nickreese · 6 days ago

Is there a framework for this?

CuriouslyC · 6 days ago

I have one that's currently still cooking, I have good experimental validation for it but I need to go back and tune the latency and improve the install story. It should help any model quite a bit but you have to hack other agents to integrate it into their api call machinery, I have a custom agent I've built that makes it easy to inject though.

jrvarela56 · 6 days ago

Ive used CrewAI to compose agents, it’s easy to mix and match and it does seem to change context based on roles https://docs.crewai.com/en/guides/agents/crafting-effective-...

knlam · 5 days ago

The best one is google ADK, I must say they are quite thoughful of all the use cases

sippeangelo · 6 days ago

"should just"?

CuriouslyC · 6 days ago

It's really not hard. It's just all the IR/optimization machinery we already have applied to a shared context tree with locality bias.

faangguyindia · 6 days ago

There’s both “no multi-agent system” and “multi-agent system,” depending on how you look at it. In reality, you’re always hitting the same /chat/completion API, itself has no awareness of any agents. Any notion of an agent comes purely from the context and instructions you provide.

Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.

It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.

Do i need to think differently about this problem? if yes, you need a different agent!

So yes, conceptually, using separate agents for separate tasks is the better approach.

datadrivenangel · 5 days ago

Calling a different prompt template an 'agent' doesn't help communicate meaningful details about an overall system design. Unnecessary verbiage or abstraction in this case.

It is what it is. That ship has sailed.

eab- · 5 days ago

There's both "no multi-program system" and "multi-program system", depending on how you look at it. In reality, you're always executing the same machine code, itself has no awareness of programs.

stirfish · 5 days ago

This unironically helped me work through a bug just now

jmull · 5 days ago

> By using React, you embrace building applications with a pattern of reactivity and modularity, which people now accept to be a standard requirement, but this was not always obvious to early web developers.

This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)

pmontra · 5 days ago

> the web started off reactive

I was there but I didn't notice reactivity. Maybe we are using two different definitions of reactivity. Do you care to elaborate?

I agree that "It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way".

curl-up · 6 days ago

In the context compression approach, why aren't the agents labeled as subagents instead? The compressed context is basically a "subtask".

This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.

Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.

I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.

jskalc92 · 6 days ago

I think the most common implementation of "subagents" doesn't get full context of a conversation, rather just an AI-generated command.

Here task is fullfilled with the full context so far, and then compressed. Might work better IMO.

adastra22 · 6 days ago

In my experience it does not work better. There are two context related values to subagent tool calls: (1) the subagent trials and deliberations don’t poison the callers context [this is a win here]; and (2) the called agent isn’t unduly influenced by the caller’s context. [problem!]

The latter is really helpful for getting a coding assistant to settle on a high quality solution. You want critic subagents to give fresh and unbiased feedback, and not be influenced by arbitrary decisions made so far. This is a good thing, but inheriting context destroys it.

peab · 6 days ago

Yeah, i agree with thinking of things as a single agent + tools.

From the perspective of the agent, whether the tools are deterministic functions, or agents themselves, is irrelevant.

OutOfHere · 5 days ago

I think one can simplify it further. There is no agent; it's just tools all the way.

The caveat is that multiple tools must be able to run asynchronously.

Context compression of one's own lengthy context is also useful and necessary as the message chain starts to get too large.

sputknick · 6 days ago

This is very similar to the conclusion I have been coming to over the past 6 months. Agents are like really unreliable employees, that you have to supervise, and correct so often that its a waste of time to delegate to them. The approach I'm trying to develop for myself is much more human centric. For now I just directly supervise all actions done by an AI, but I would like to move to something like this: https://github.com/langchain-ai/agent-inbox where I as the human am the conductor of work agents do, then check in with me for further instructions or correction.

worik · 5 days ago

> Agents are like really unreliable employees,

Yes

But employees (should) get better over time, they learn from me

mreid · 6 days ago

Is it concerning to anyone else that the "Simple & Reliable" and "Reliable on Longer Tasks" diagrams look kind of like the much maligned waterfall design process?

One reason it is concerning.

I am mostly worried that I am wrong, in my opinion, that "agents" is a bad paradigm for working with LLMs

I have been using LLMs since I got my first Open AI API key, I think "human in the loop" is what makes them special

I have massively increased my fun, and significantly increased my productivity using just the raw chat interface.

It seems to me that building agents to do work that I am responsible for is the opposite of fun and a productivity sink as I correct the rare, but must check for it, bananas mistakes these agents inevitably make

The thing is, the same agent that made the bananas mistake is also quite good at catching that mistake (if called again with fresh context). This results in convergence on working, non-bananas solutions.

Waterfall is just a better process with agents. Agile is garbage when inserting yourself in the loop causes the system to drop to 10% velocity.

amelius · 6 days ago

It looks more like alchemy, thb.

DarkNova6 · 6 days ago

To me it seems more like the typical trap of a misfit bounded context.

> As of June 2025, Claude Code is an example of an agent that spawns subtasks. However, it never does work in parallel with the subtask agent, and the subtask agent is usually only tasked with answering a question, not writing any code.

Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).

clbrmbr · 6 days ago

I’ve been quite successful since June doing parallel edits just on different components within the same codebase. But I’ve not been able to do it with “auto-accept” because I need a way to course correct if one of the agents goes off the rails.

kikuska · 14 hours ago

Forgive me I can't figure out how to reply on the original thread but I saw you used a script to build an Anki deck with the native plant species in your area. I'd love to do the same but I've never coded in my life and doing it manually would take forever. Would you be willing to share how this is done? Thanks!

I wrote something that watches the directories and started working on a node graph to visualize relationships between changes and dates things like that: https://github.com/kordless/gnosis-flow. It that is useful to you let me know!

What is the use case?