Readit News logoReadit News
mccoyb · 7 months ago
Building agents has been fun for me, but it's clear that there are serious problems with "context engineering" that must be overcome with new ideas. In particular, no matter how big the context window size is increased - one must curate what the agent sees: agents don't have very effective filters on what is relevant to supercharge them on tasks, and so (a) you must leave *.md files strewn about to help guide them and (b) you must put them into roles. The *.md system is essentially a rudimentary memory system, but it could get be made significantly more robust, and could involve e.g. constructing programs and models (in natural language) on the fly, guided by interactions with the user.

What Claude Code has taught me is that steering an agent via a test suite is an extremely powerful reinforcement mechanism (the feedback loop leads to success, most of the time) -- and I'm hopeful that new thinking will extend this into the other "soft skills" that an agent needs to become an increasingly effective collaborator.

blks · 7 months ago
Sounds like you are spending more time battling with your own tools than doing actual work.
mccoyb · 7 months ago
Ah yes, everything has to be about getting work done, right? You always have to be productive!

Do you think, just maybe, it might be interesting to play around with these tools without worrying about how productive you're being?

zmgsabst · 7 months ago
I’ve found managing the context is most of the challenge:

- creating the right context for parallel and recursive tasks;

- removing some steps (eg, editing its previous response) to show only the corrected output;

- showing it its own output as my comment, when I want a response;

Etc.

mccoyb · 7 months ago
I've also found that relying on agents to build their own context _poisons_ it ... that it's necessary to curate it constantly. There's kind of a <1 multiplicative thing going on, where I can ask the agent to e.g. update CLAUDE.mds or TODO.mds in a somewhat precise way, and the agent will multiply my request in a lot of changes which (on the surface) appear well and good ... but if I repeat this process a number of times _without manual curation of the text_, I end up with "lower quality" than I started with (assuming I wrote the initial CLAUDE.md).

Obvious: while the agent can multiply the amount of work I can do, there's a multiplicative reduction in quality, which means I need to account for that (I have to add "time doing curation")

ModernMech · 7 months ago
It's funny because things are finally coming full circle in ML.

10-15 years ago the challenge in ML/PR was "feature engineering", the careful crafting of rules that would define features in the data which would draw the attention of the ML algorithm.

Then deep learning came along and it solved the issue of feature engineering; just throw massive amounts of data at the problem and the ML algorithms can discern the features automatically, without having to craft them by hand.

Now we've gone as far as we can with massive data, and the problem seems to be that it's difficult to bring out the relevent details when there's so much data. Hence "context engineering", a manual, heuristic-heavy processes guided by trial and error and intuition. More an art than science. Pretty much the same thing that "feature engineering" was.

franktankbank · 7 months ago
Is there a recommended way to construct .md files for such a system? For instance when I make them for human consumption they'd have lots of markup for readability but that may or may not be consumable by an llm. Can you create a .md the same as for human consumption that doesn't hinder an llm?
artpar · 7 months ago
I am using these files (most of them are llm generated based on my prompt to reduce its lookups when working on a codebase)

https://gist.github.com/artpar/60a3c1edfe752450e21547898e801...

(specially the AGENT.knowledge is quite helpful)

sothatsit · 7 months ago
Just writing a clear document, like you would for a person, gets you 95% of the way there. There are little tweaks you can do, but they don't matter as much as just being concise and factual, and structuring the document clearly. You just don't want the documentation to get too long.
golergka · 7 months ago
I've had very good experience with building a very architecture-conscious folder structure and putting AGENTS.md in every folder (and, of course, instruction to read _and_ update those in the root prompt). But with Agent-written docs I also have to run doc maintainer agent pretty often.
moritz64 · 7 months ago
> steering an agent via a test suite is an extremely powerful reinforcement mechanism

can you elaborate a bit? how do you proceed? what does your process look like?

mccoyb · 7 months ago
I spend a significant amount of time (a) curating the test suite, and making sure it matches my notion of correctness and (b) forcing the agent to make PNG visuals (which Claude Code can see, by the way, and presumably also Gemini CLI, and maybe Aider?, etc)

I'd have to do this anyways, if I was writing the code myself, so this is not "time above what I'd normally spend"

The visuals it makes for me I can inspect and easily tell if it is on the right path, or wrong. The test suite is a sharper notion of "this is right, this is wrong" -- more sharp than just visual feedback and my directions.

The basic idea is to setup a feedback loop for the agent, and then keep the agent in the loop, and observe what it is doing. The visuals are absolutely critical -- as a compressed representation of the behavior of the codebase, which I can quickly and easily parse and recognize if there are issues.

mindwok · 7 months ago
I'm not yet convinced (though I remain open to the idea) that AI agents are going to be a widely adopted pattern in the way people on LinkedIn suggest.

The way I use AI today is by keeping a pretty tight leash on it, a la Claude Code and Cursor. Not because the models aren't good enough, but because I like to weigh in frequently to provide taste and direction. Giving the AI more agency isn't necessarily desirable, because I want to provide that taste.

Maybe that'll change as I do more and new ergonomics reveal themselves, but right now I don't really want AI that's too agentic. Otherwise, I kind of lose connection to it.

thimabi · 7 months ago
Do you think that, over time, knowing how the models behave, simply providing more/better context and instructions can fill this gap of wanting to provide taste and direction to the models’ outputs and actions?

My experience is that, for many workflows, well-done “prompt engineering” is more than enough to make AI models behave more like we’d like without constantly needing us to weight in.

mindwok · 7 months ago
I suppose it's possible, although the models would have to have a really nuanced understanding about my tastes and even then it seems doubtful.

If we use a real world analogy, think of someone like an architect designing your house. I'm still going to be heavily involved in the design of my house, regardless of how skilled and tasteful the architect is. It's fundamentally an expression of myself - delegating that basically destroys the point of the exercise. I feel the same for a lot of the stuff I'm building with AI now.

troupo · 7 months ago
> knowing how the models behave, simply providing more/better context and instructions can fill this gap

No.

--- start quote ---

prompt engineering is nothing but an attempt to reverse-engineer a non-deterministic black box for which any of the parameters below are unknown:

- training set

- weights

- constraints on the model

- layers between you and the model that transform both your input and the model's output that can change at any time

- availability of compute for your specific query

- and definitely some more details I haven't thought of

https://dmitriid.com/prompting-llms-is-not-engineering

--- end quote ---

heavyset_go · 7 months ago
Look at what happens whenever models are updated or new models come out: previous "good" prompts might not return the expected results.

What's good prompting for one model can be bad for another.

apwell23 · 7 months ago
taste cannot be reduced to a bunch of instructions.
prmph · 7 months ago
Exactly. I made a similar comment as this elsewhere on this discussion:

The old adage still applies: there is no free lunch. It makes sense that LLMs are not going to be able to take humans entirely out of the loop.

Think about what it would mean if that were the case: if people, on the basis of a few simple prompts could let the agents loose and create sophisticated systems without any further input, the there would be nothing to differentiate those systems, and thus they would lose their meaning and value.

If prompting is indeed the new level of abstraction we are working at, then what value is added by asking Claude: make me a note-taking app? A million other people could also issue this same low-effort prompt; thus what is the value added here by the prompter?

chamomeal · 7 months ago
I’ve been thinking about that too! If you can only make an app by “vibe coding” it, then anybody else in the world with internet access can make it, too!

Although sometimes the difficult part is knowing what to make, and LLMs are great for people who actually know what they want, but don’t know how to do it

afc · 7 months ago
My thinking is that over time I can incrementally codify many of these individual "taste" components as prompts that each review a change and propose suggestions.

For example, a single prompt could tell an llm to make sure a code change doesn't introduce mutability when the same functionality can be achieved with immutable expressions. Another one to avoid useless log statements (with my specific description of what that means).

When I want to evaluate a code change, I run all these prompts separately against it, collecting their structured (with MCP) output. Of course, I incorporate this in my code-agent to provide automated review iterations.

If something escapes where I feel the need to "manually" provide context, I add a new prompt (or figure out how to extend whichever one failed).

transcriptase · 7 months ago
I love that there are somehow authorities on tech that realistically they could have 1-2 years experience with tops. It’s the reverse of the “seeking coder with 10 years of experience in a 2 year old language” meme.
noosphr · 7 months ago
I've been building what's called ai agents since gpt3 came out. There are plenty of other people who did the same thing. That's five years now. If you can't be an expert after 5 years then there is no such thing as experts.

Of course agents is now a buzzword that means nothing so there is that.

sailingparrot · 7 months ago
“Agent” involves having agency. Calling the GPT-3 API and asking it to do some classification or whatever else your use case was, would not be considered agentic. Not only were there no tools back then to allow an LLM to carry out a plan of its own, even if you had developed your own, GPT-3 still sucked way too much to trust it with even basic tasks.

I have been working on LLMs since 2017, both training some of the biggest and then creating products around them and consider I have no experience with agents.

skeeter2020 · 7 months ago
I took a course* on agent based system in grad school in 2006, but nobody has been building what agents mean today for 5 or even 3 years.

*https://www.slideserve.com/verdi/seng-697-agent-based-softwa...

GPerson · 7 months ago
5 years is barely a beginner in lots of fields.
djabatt · 7 months ago
I agree with your point. After working with LLMs and building apps with them for the past four years, I consider myself a veteran and perhaps an authority (to some) on the subject. I find developing programs that use LLMs both fascinating and frustrating. Nevertheless, I'm going to continue with my work and curiosities, and let the industry change the names of what I'm doing—whether it's called agent development, context engineering, or whatever comes next.
eadmund · 7 months ago
> If you can't be an expert after 5 years then there is no such thing as experts.

I think you’ll find that after 10 years one’ll look back on oneself at 5 years’ experience and realise that one wasn’t an expert back then. The same is probably true of 20 years looking back on 10.

Given a median career of about 40 years, I think it’s fair to estimate that true expertise takes at least 10–15 years.

Mengkudulangsat · 7 months ago
Jiro's son is only allowed to make sushi after 30 years.
apwell23 · 7 months ago
curious what did you build? experience only counts if you are shipping right?
zzzeek · 7 months ago
Totally my reaction - "I've worked with dozens of teams ....". Really ?
zmmmmm · 7 months ago
Which means they had at best shallow involvement and left the scene pretty quickly. Probably no realistic idea whether the systems created survived long term impact with reality or not. But hey, free advice!
joeblubaugh · 7 months ago
Why do so many examples break down to “send better spam faster”?
malfist · 7 months ago
lol, that was literally their example wasn't it? Troll linkedin looking for people and spam them with "personalized" emails.
Animats · 7 months ago
That's what's so funny about this.

Spamming is not only obnoxious, but a very weak example. Spamming is so error tolerant that if 30% of the output is totally wrong, the sender won't notice. Response rates are usually very low. This is a singularly un-demanding problem.

You don't even need "AI" for this. Just score LinkedIn profiles based on keywords, and if the score is high enough, send a spam. Draft a few form letters, and send the one most appropriate for the keywords. Probably would have about the same reply rate.

rglover · 7 months ago
What is a wheel without its grease?
rm999 · 7 months ago
A really short version of it is that you don't need an agent if you have a well-defined solution that can be implemented in advance (e.g. the 'patterns' in this article). Programmers often work on problems that have programmatic solutions and then the advice is totally correct: reach for simpler more reliable solutions. In the future AIs will probably be smart enough to just brute force any problem, but for now this is adding unneeded complexity.

I suspect a reason so many people are excited about agents is they are used to "chat assistants" as the primary purpose of LLMs, which is also the ideal use case for agents. The solution space in chat assistants is not defined in advance, and more complex interactions do get value from agents. For example, "find my next free Friday night and send a text to Bob asking if he's free to hang out" could theoretically be programmatically solved, but then you'd need to solve for every possible interaction with the assistant; there are a nearly unlimited number of ways of interfacing with an assistant, so agents are a great solution.

franktankbank · 7 months ago
Works great when you can verify the response quicker than it would take to just do yourself. Personally I have a hard ass time trusting it without verifying.
deadbabe · 7 months ago
A key thing we may be forced to admit someday is that AI agents are really just expensive temporary glue that we use to build services quickly until we have cheaper hard coded functions developed once the AI agent gives us sufficient experience with the scope of the problem domain.
Onewildgamer · 7 months ago
An interesting take, only if the stakes are low when the decisions are wrong. I'm not confident to have an LLM taking decisions for a customer or me. I'd rather have it suggest things to customers, sugesstive actions and some useful insights that user may have overlooked.
malfist · 7 months ago
Can you imagine a bank taking this approach? Sorry, we didn't have enough time to build a true ledger, and now the AI says you have no money.
dmezzetti · 7 months ago
This article is missing an even more important point: you don't always need to start with an LLM, plain old coding still solves a lot of problems.
skeeter2020 · 7 months ago
It's funny how when I talk to ML practitioners who have experience & work in the field they're the most pragmatic voices, like our staff developer on the ML team: "if you can solve the problem algorithmically you should definitely do that!"
dmezzetti · 7 months ago
For full disclosure, I work on txtai one of the more popular AI frameworks out there. So this checks out :)
riku_iki · 7 months ago
but you can't build 5B startup in 10 months with plain old coding..
dmezzetti · 7 months ago
There are plenty of AI companies solving interesting problems and possibly worth it. But most problems are more simple than that and that hasn't changed.
nine_k · 7 months ago
Is coding the bottleneck there?
imhoguy · 7 months ago
you can build AI unicorn without AI even: builder.ai /s
ilaksh · 7 months ago
I think this was true late 2023 or early 2024, but not necessarily in mid 2025 for most tasks (as long as they require some AI and aren't purely automation) and you use SOTA LLMs.

I used to build the way most of his examples are just functions calling LLMs. I found it almost necessary due to poor tool selection etc. But I think the leading edge LLMs like Gemini 2.5 Pro and Claude 4 are smart enough and good enough at instruction following and tool selection that it's not necessarily better to create workflows.

I do have a checklist tool and delegate command and may break tasks down into separate agents though. But the advantage of creating instructions and assigning tool commands, especially if you have an environment with a UI where it is easy to assign tool commands to agents and otherwise define them, is that it is more flexible and a level of abstraction above something like a workflow. Even for visual workflows it's still programming which is more brittle and more difficult to dial in.

This was not the case 6-12 months ago and doesn't apply if you insist on using inferior language models (which most of them are). It's really only a handful that are really good at instruction following and tool use. But I think it's worth it to use those and go with agents for most use cases.

The next thing that will happen over the following year or two is going to be a massive trend of browser and computer use agents being deployed. That is again another level of abstraction. They might even incorporate really good memory systems and surely will have demonstration or observation modes that can extract procedures from humans using UIs. They will also learn (record) procedural details for optimization during exploration from verbal or written instructions.

bonzini · 7 months ago
The techniques he has in the post are mostly "model your problem as a data flow graph and follow it".

If you skip the modeling part and rely on something that you don't control being good enough, that's faith not engineering.

ilaksh · 7 months ago
I didn't say to skip any kind of problem modeling. I just didn't emphasize it.

The goal _should_ be to avoid doing traditional software engineering or create a system that requires typical engineering to maintain.

Agents with leading edge LLMs allow smart users to have flexible systems that they can evolve by modifying instructions and tools. This requires less technical skill than visual programming.

If you are only taking advantage of the LLM to handle a few wrinkles or a little bit of natural language mapping then you aren't really taking advantage of what they can do.

Of course you can build systems with rigid workflows and sprinkling of LLM integration, but for most use cases it's probably not the right default mindset for mid-2025.

Like I said, I was originally following that approach a little ways back. But things change. Your viewpoint is about a year out of date.

clbrmbr · 7 months ago
I agree that the strongest agentic models (Claude Opus 4 in particular) change the calculus. They still need good context, but damn are they good at reaching for the right tool.