Agent design is still hard

I've started a company in this space about 2 years ago. We are doing fine. What we've learned so far is that a lot of these techniques are simply optimisations to tackle some deficiency in LLMs that is a problem "today". These are not going to be problems tomorrow because the technology will shift. As it happened many time in the span of the last 2 years.

So yah, cool, caching all of that... but give it a couple of months and a better technique will come out - or more capable models.

Many years ago when disc encryption on AWS was not an option, my team and I had to spend 3 months to come up with a way to encrypt the discs and do so well because at the time there was no standard way. It was very difficult as that required pushing encrypted images (as far as I remember). Soon after we started, AWS introduced standard disc encryption that you can turn on by clicking a button. We wasted 3 months for nothing. We should have waited!

What I've learned from this is that often times it is better to do absolutely nothing.

siva7 · 22 days ago

This is the most important observation. I'm getting so many workshop invitations from my corporate colleagues about AI and agents. What most people don't get that these clever patterns they "invented" will be obsolete next week. This nice company blog about agents - one which got viral recently - will be obsolete next month. It's hard to swallow for my colleagues that in these age - like when you studied gang of four or a software architecture pattern book that you have learned a common language - no, these days the half-life of a pattern for AI is about a week. Even when you ask 10 professionals what an agent actually is - you will get 10 different answers yet they assume that how they use it is the common understanding.

Vinnl · 22 days ago

This is also why it's perfectly fine to wait out this AI hype and see what sticks afterward. It probably won't cost too much time to catch up, because at that point everyone who knows what they're doing only learned that a month or two ago anyway.

lowbloodsugar · 21 days ago

Counterpoint to these two posts: a journeyman used to have to make his own tools. He could easily have bought them, or his master could have made them. Making your own tools gives you vastly greater skills when using the tools. So I know how fast AI agents and model APIs are evolving, but I’m writing them anyway. Every break in my career has been someone telling me it’s impossible and then me doing it anyway. If you use an agent framework, you really have no idea how artificially constrained you. You’re so constrained, and yet you are oblivious to it.

On the “wasting three months” remark (GP), if it’s a key value proposition, just do it. Don’t wait. If it’s not a key value prop, then don’t do it at all. Often times what I’ve built has been better tailored to our product than what AWS built.

hibikir · 22 days ago

Note that even many of those "long knowledge" things people learned are today obsolete, but people that follow them just haven't figured it out yet. See how many of those object oriented design patters just look very silly the minute you use immutable data structures, and have access to functional programming constructs in your language. And nowadays most do. Many seminal books on how to program in the early 2000s, especially those covering "pure" OO, look quite silly today.

AYBABTME · 21 days ago

And yet despite being largely obsolete in the specifics, gang of four remains highly relevant and useful in the generalities. All these books continue to be absolutely great foundations if you look past their immediate advice.

I wagger the same for AI agent techniques.

lelanthran · 22 days ago

> I've started a company in this space about 2 years ago. We are doing fine.

You have a positive cash flow from sales of agents? Your revenue exceeds your operating costs?

I've been very skeptical that it is possible to make money from agents, having seen how difficult it was for the current well-known players to do so.

What is your secret sauce?

yayitswei · 21 days ago

I'm cash flow positive on my SMS sales agent, it serves just one client and my revenue is at least 3x the cost of inference/hosting.

Imo the key is to serve one use case really well rather than overgeneralize.

nvader · 21 days ago

Bumping for interest too. Would love to hear what you believe is correlated to success.

gchamonlive · 22 days ago

I think knowing when to do nothing is being able to evaluate if the problem the team is tackling is essential or tangential to the core focus of the project, and also whether the problem is something new or if it's been around for a while and there is still no standard way to solve it.

gessha · 22 days ago

Yeah, that will be the make it to brake it moment because if it’s too essential, it will be implemented but if it’s not, it may become a competitive advantage

ramraj07 · 21 days ago

Vehement disagree. We implemented our own context editing features 4 months back. Claude released a very similar featureset we had all along last month. We were still glad we did it because (A) it took me half a day to do that work (B) our solution is still more powerful for our use case (C) our solution works on other models as well.

It all comes down to trying to predict what will be your vendors' roadmap (or if youre savvy, get a peek into it) and whether the feature you want to create is fundamental to your applications behavior (I doubt encryption is unless youre a storage company).

popcorncowboy · 21 days ago

This is the "Wait Calculation" and it's fiendish because there exists only some small, finite window in which it is indeed better to start before the tech is "better" in order to "win" (i.e. get "there" first, wherever "there" is in your scenario).

Here's a nice article about it: https://www.oneusefulthing.org/p/the-lazy-tyranny-of-the-wai...

exe34 · 22 days ago

if we wait long enough, we just end up dead, so it turns out we didn't need to do anything at all whatsoever. of course there's a balance - often times starting out and growing up with the technology gives you background and experience that gives you an advantage when it hits escape velocity.

nrhrjrjrjtntbt · 21 days ago

If you wait long enough in AI they may not need your agent they just use OpenAI directly.

DrewADesign · 21 days ago

These days it seems like training yourself into a specialty that provides steadyish income for a year before someone obliterates your professional/corporate/field’s scaffolding with AI and you have to start over is kind of a win. Doesn’t it feel like a win? Look at the efficiency!

nowittyusername · 22 days ago

I agree with the sentiment. things are moving so fast that waiting now is a legitimate strategy. though it is also easy to fall in to the trap of. well if we continue along these lines might as well wait 4-5 years and we get agi. which still true imo does feel off as you arent participating in the process.

an0malous · 22 days ago

> These are not going to be problems tomorrow because the technology will shift. As it happened many time in the span of the last 2 years.

What technology shifts have happened for LLMs in the last 2 years?

dcre · 22 days ago

One example is that there used to be a whole complex apparatus around getting models to do chain of thought reasoning, e.g., LangChain. Now that is built in as reasoning and they are heavily trained to do it. Same with structured outputs and tool calls — you used to have to do a bunch of stuff to get models to produce valid JSON in the shape you want, now it’s built in and again, they are specifically trained around it. It used to be you would have to go find all relevant context up front and give it to the model. Now agent loops can dynamically figure out what they need and make the tool calls to retrieve it. Etc etc.

postalcoder · 22 days ago

If we expand this to 3 years, the single biggest shift that totally changed LLM development is the increase in size of context windows from 4,000 to 16,000 to 128,000 to 256,000.

When we were at 4,000 and 16,000 context windows, a lot of effort was spent on nailing down text splitting, chunking, and reduction.

For all intents and purposes, the size of current context windows obviates all of that work.

What else changed?

- Multimodal LLMs - Text extraction from PDFs was a major issue for rag/document intelligence. A lot of time was wasted trying to figure out custom text extraction strategies for documents. Now, you can just feed the image of a PDF page into an LLM and get back a better transcription.

- Reduced emphasis on vector search. People have found that for most purposes, having an agent grep your documents is cheaper and better than using a more complex rag pipeline. Boris Cherny created a stir when he talked about claude code doing it that way[0]

https://news.ycombinator.com/item?id=43163011#43164253

throwaway13337 · 22 days ago

I'm amazed at this question and the responses you're getting.

These last few years, I've noticed that the tone around AI on HN changes quite a bit by waking time zone.

EU waking hours have comments that seem disconnected from genAI. And, while the US hours show a lot of resistance, it's more fear than a feeling that the tools are worthless.

It's really puzzling to me. This is the first time I noticed such a disconnect in the community about what the reality of things are.

To answer your question personally, genAI has changed the way I code drastically about every 6 months in the last two years. The subtle capability differences change what sorts of problems I can offload. The tasks I can trust them with get larger and larger.

It started with better autocomplete, and now, well, agents are writing new features as I write this comment.

deepdarkforest · 22 days ago

On the foundational level, test time compute(reasoning), heavy RL post training, 1M+ plus context length etc.

On the application layer, connecting with sandboxes/VM's is one of the biggest shifts. (Cloudfares codemode etc). Giving an llm a sandbox unlocks on the fly computation, calculations, RPA, anything really.

MCP's, or rather standardized function calling is another one.

Also, local llm's are becoming almost viable because of better and better distillation, relying on quick web search for facts etc.

WA · 22 days ago

Not the LLMs. The APIs got more capabilities such as tool/function calling, explicit caching etc.

echelon · 22 days ago

We started putting them in image and video models and now image and video models are insane.

I think the next period of high and rapid growth will be in media (image, video, sound, 3D), not text.

It's much harder to adapt LLMs to solving business use cases with text. Each problem is niche, you have to custom tailor the solution, and the tooling is crude.

The media use cases, by contrast, are low hanging fruit and result in 10,000x speedups and cost reductions almost immediately. The models are pure magic.

I think more companies would be wise to ignore text for now and focus on visual domain problems.

Nano Banana has so much more utility than agents. And there are so many low hanging fruit ways to make lots of money.

Don't sleep on image and video. That's where the growth salient is.

sethev · 21 days ago

I suspect you're right, but it's a bit discouraging to consider that an alternative way of framing this is that companies like OpenAI have a huge advantage in this landscape and anything that works will end up behind their API.

toddmorey · 22 days ago

In some ways, the fact that the technology will shift is the problem as model behavior keeps changing. It's rather maddening unstable ground to build on. Really hard to gauge the impact to customer experience from a new model.

ares623 · 22 days ago

For a JS dev, it’s just another Tuesday

verdverm · 22 days ago

Vendor choice matters.

You could use the like of Amazon / Anthropic, or use Google who has had transparent disk encryption for 10+ years, and Gemini which already had the transparent caching discussed built in.

te_chris · 22 days ago

If you’ve spent any time with the vertex LLM apis you wouldn’t be so enthusiastic about using Google’s platform (I say this as someone who prefers GCP to aws for compute and networking).

wolttam · 21 days ago

This has been my intuition with these models since close to the beginning.

Any framework you build around the model is just behaviour that can be trained into the model itself

jFriedensreich · 22 days ago

exactly what my experience is too. we focus all our energy on the parts that will not be solved by someone else in a few months.

Amen. Been seeing these agent SDKs coming out left and right for a couple of years and thought it'd be a breeze to build an agent. Now I'm trying to build one for ~3 weeks, and I've tried three different SDKs and a couple of architectures.

Here's what I found:

- Claude Code SDK (now called Agent SDK) is amazing, but I think they are still in the process of decoupling it from the Claude Code, and that's why a few things are weird. e.g, You can define a subagent programmatically, but not skills. Skills have to be placed in the filesystem and then referenced in the plugin. Also, only Anthoripic models are supported :(

- OpenAI's SDK's tight coupling with their platform is a plus point. i.e, you get agents and tool-use traces by default in your dashboard. Which you can later use for evaluation, distillation, or fine-tuning. But: 2. They have agent handoffs (which works in some cases), but not subagents. You can use tools as subagents, though. 1. Not easy to use a third-party model provider. Their docs provide sample codes, but it's not as easy as that.

- Google Agent Kit doesn't provide any Typescript SDK yet. So didn't try.

- Mastra, even though it looks pretty sweet, spins up a server for your agent, which you can then use via REST API. umm.. why?

- SmythOS SDK is the one I'm currently testing because it provides flexibility in terms of choosing the model provider and defining your own architecture (handoffs or subagents, etc.). It has its quirks, but I think it'll work for now.

Question: If you don't mind sharing, what is your current architecture? Agent -> SubAgents -> SubSubAgents? Linear? or a Planner-Executor?

I'll write a detailed post about my learnings from architectures (fingers crossed) soon.

copypaper · 21 days ago

Every single SDK I've used was a nightmare once you get past the basics. I ended up just using an OpenRouter client library [1] and writing agents by hand without an abstraction layer. Is it a little more boilerplatey? Yea. Does it take more LoC to write? Yea. Is it worth it? 100%. Despite writing more code, the mental model is much easier (personally) to follow and understand.

As for the actual agent I just do the following:

- Get metadata from initial query

- Pass relevant metadata to agent

- Agent is a reasoning model with tools and output

- Agent runs in a loop (max of n times). It will reason which tool calls to use

- If there is a tool call, execute it and continue the loop

- Once the agent outputs content, the loop is effectively finished and you have your output

This is effectively a ReAct agent. Thanks to the reasoning being built in, you don't need an additional evaluator step.

Tools can be anything. It can be a subagent with subagents, a database query, etc. Need to do an agent handoff? Just output the result of the agent into a different agent. You don't need an sdk to do a workflow.

I've tried some other SDKs/frameworks (Eino and langchaingo), and personally found it quicker to do it manually (as described above) than fight against the framework.

[1]: https://github.com/reVrost/go-openrouter

peab · 22 days ago

I think the term sub-agent is almost entirely useless. An agent is an LLM loop that has reasoning and access to tools.

A "sub agent" is just a tool. It's implantation should be abstracted away from the main agent loop. Whether the tool call is deterministic, has human input, etc, is meaningless outside of the main tool contract (i.e Params in Params out, SLA, etc)

moinism · 22 days ago

I agree, technically, "sub agent" is also another tool. But I think it's important to differentiate tools with deterministic input/output from those with reasoning ability. A simple 'Tool' will take the input and try to execute, but the 'subagent' might reason that the action is unnecessary and that the required output already exists in the shared context. Or it can ask a clarifying question from the main agent before using its tools.

the_mitsuhiko · 21 days ago

> It's implantation should be abstracted away from the main agent loop. Whether the tool call is deterministic, has human input, etc, is meaningless outside of the main tool contract (i.e Params in Params out, SLA, etc)

Up to a point. You're obviously right in principle, but if that task itself has the ability to call into "adjacent" tools then the behavior changes quite a bit. You can see this a bit with how the Oracle in Amp surfaces itself to the user. The oracle as sub-agent has access to the same tools as the main agent, and the state changes (rare!) that it performs are visible to itself as well as the main agent. The tools that it invokes are displayed similarly to the main agent loop, but they are visualized as calls within the tool.

verdverm · 22 days ago

ADK differentiates between tools and subagents based on the ability to escalate or transfer control (subagents), where as tools are more basic

I think this is a meaningful distinction, because it impacts control flow, regardless what they are called. The lexicon are quite varied vendor-to-vendor

nostrebored · 21 days ago

Nah, when working on anything sufficiently complicated you will have many parallel subagents that need their own context window, ability to mutate shared state, sandboxing differences, durability considerations, etc.

If you want to rewrite the behavior per instance you totally can, but there is a definite concept here that is different than “get_weather”.

I think that existing tools don’t work very well here or leave much of this as an exercise for the user. We have tasks that can take a few days to finish (just a huge volume of data and many non deterministic paths). Most people are doing way too much or way too little. Having subagents with traits that can be vended at runtime feels really nice.

Deleted Comment

Vinnl · 22 days ago

What does "has reasoning" mean? Isn't that just a system prompt that says something like "make a plan" and includes that in the loop?

ColinEberhardt · 22 days ago

Oh, so _that_ is what a sub-agent is. I have been wondering about that for a while now!

blancm · 22 days ago

Hello, about Claude Code where only Anthoripic models are supported, in reality you can use Claude Code router (https://github.com/musistudio/claude-code-router) to use other models in Claude Code. I use it since some weeks with opensource models and it works pretty well. You can even use "free" models from openrouter

moinism · 22 days ago

Thank you. But the main blocker for me right now is their skill definition: https://platform.claude.com/docs/en/agent-sdk/skills#how-ski...

verdverm · 22 days ago

Google's ADK is pretty nice, I'm using the Go version, which is less mature than the python on. Been at it a bit over a week and progress is great. This weekend I'm aiming for tracking file changes in the session history to allow rewinding / forking

It has a ton of day 2 features, really nice abstractions, and positioned itself well in terms of the building blocks and constructing workflows.

ADK supports working with all the vendors and local LLMs

dragonwriter · 22 days ago

I really wish ADK had a local persistent memory implementation, though.

mountainriver · 22 days ago

The frameworks are all pointless, just use AI assist to create agents in python or ideally a language with concurrency.

You will be happy you did

moinism · 22 days ago

How do you deal with the different APIs/Tooluse schema in a custom build? As other people have mentioned, it's a bigger undertaking than it sounds.

moduspol · 21 days ago

You will undoubtedly be recreating what already exists in LangGraph. And you'll probably be doing it worse.

otterley · 22 days ago

Have you tried AWS’s Strands Agents SDK? I’ve found it to be a very fluent and ergonomic API. And it doesn’t require you to use Bedrock; most major vendor native APIs are supported.

(Disclaimer: I work for AWS, but not for any team involved. Opinions are my own.)

moinism · 22 days ago

This looks good. Even though it's only in Python, I think its worth a try. Thanks.

kordlessagain · 22 days ago

If you are still open to trying Codex, I'm working on a containerized version with various features: https://github.com/DeepBlueDynamics/codex-container

moinism · 22 days ago

This looks good, but a bit overkill for what I'm trying to build tbh.

ph4rsikal · 22 days ago

My favourite is Smolagents from Huggingface. You can easily mix and match their models in your agents.

moinism · 22 days ago

Dude, it looks great, but I just spent half an hour learning about its 'CodeAgents' feature. Which essentially is 'actions written as code'.

This idea has been floating around in my head, but it wasn't refined enough to implement. It's so wild that what you're thinking of may have already been done by someone else on the internet.

https://huggingface.co/docs/smolagents/conceptual_guides/int...

For those who are wondering, it's kind of similar to the 'Code Mode' idea implemented by Cloudflare and now being explored by Anthropic; Write code to discover and call MCPs instead of stuffing context window with their definations.

thewhitetulip · 22 days ago

Did you try langchain/langgraph? Am I confusing what the OP means aa agents?

langitbiru · 22 days ago

What about AI SDK from Vercel?

https://ai-sdk.dev/docs/agents/overview

moinism · 22 days ago

Haven't tried it yet, but it looks similar to OpenAI's. What is your experience?