12-factor Agents: Patterns of reliable LLM applications

I am wondering how libraries like DSPY [0] fits in your factor-2 [1]

As I was reading, I saw mention of BAML > (the above example uses BAML to generate the prompt ...

Personally, in my experience hand-writing prompts for extracting structured information from unstructured data has never been easy. With DSPY, my experience has been quite good so far.

As you have used raw prompt from BAML, what do you think of using the raw prompts from DSPY [2]?

[0] https://dspy.ai/

[1] https://github.com/humanlayer/12-factor-agents/blob/main/con...

[2] https://dspy.ai/tutorials/observability/#using-inspect_histo...

dhorthy · 4 months ago

interesting - I think I have to side with the Boundary (YC W23) folks on this one - if you want bleeding edge performance, you need to be able to open the box and hack on the insides.

I don't agree fully with this article https://www.chrismdp.com/beyond-prompting/ but the comparison of punchards -> assembly -> c -> higher langs is quite useful here

I just don't know when we'll get the right abstraction - i don't think langchain or dspy are the "C programming language" of AI yet (they could get there!).

For now I'll stick to my "close to the metal" workbench where I can inspect tokens, reorder special tokens like system/user/JSON, and dynamically keep up with the idiosyncrasies of new models without being locked up waiting for library support.

chrismdp · 4 months ago

It's always true that you need to drop down a level of abstraction in order to extract the ultimate performance. (eg I wrote a decent-sized game + engine entirely in C about 10 years ago and played with SIMD vectors to optimise the render loop)

However, I think the vast majority of use cases will not require this level of control, and we will abandon prompts once the tools improve.

Langchain and DSPY are also not there for me either - I think the whole idea of prompting + evals needs a rethink.

(full disclaimer: I'm working on such a tool right now!)

Very informative wiki, thank you, I will definitely use it. So Ive made my own "AI Agents framework" [0] based on actor model, state machines and aspect oriented programming (released just yesterday, no HN post yet) and I really like points 5 and 7:

    5: Unify execution state and business state
    8. Own your control flow

That is exactly what SecAI does, as it's a graph control flow library at it's core (multigraph instead of DAG) and LLM calls are embedded into graph's nodes. The flow is reinforced with negotiation, cancellation and stateful relations, which make it more "organic". Another thing often missed by other frameworks are dedicated devtools (dbg, repl, svg) - programming for failure, inspecting every step in detail, automatic data exporters (metrics, traces, logs, sql), and dead-simple integrations (bash). I've released the first tech demo [1] which showcases all the devtools using a reference implementation of deepresearch (ported from AtomicAgents). You may especially like the Send/Stop button, which is nothings else then "Factor 6. Launch/Pause/Resume with simple APIs". Oh and it's network transparent, so it can scale.

Feel free to reach out.

[0] https://github.com/pancsta/secai

[1] https://youtu.be/0VJzO1S-gV0

serverlessmania · 4 months ago

"Another thing often missed by other frameworks are dedicated devtools"

From my experience, PydanticAI really nailed it with Logfire—debugging[0] agents was significantly easier and more effective compared to the other frameworks and libraries I tested.

[0] https://ai.pydantic.dev/logfire/#pydantic-logfire

pancsta · 4 months ago

Logfire is a tracing app, an equivalent of Jaeger and other Otel UIs. While I wont discuss reimplementation-vs-integration in this case, traces are just one way of debugging. am-dbg focuses on debugging of the state consensus, instead of the execution tree, without requiring a SaaS account.

Execution trees are enough for workflows, but bots/agents aren't simple workflows.

dhorthy · 4 months ago

i like the terminal UI and otel integrations - what tasks are you using this for today?

pancsta · 4 months ago

Thanks, terminal UI is an important design choice - it's fast, cheap, and runs everywhere (like the web via wasm / ssh, or on iphones with touch). The LLM layer is still fresh, and I personally use it for web scraping, but the underlying workflow engine is quite mature and ubiquitous - it was used for sync engines, UIs, daemons, network services. It shines when faces complexity, nondeterminism, and retry logic - the more chaotic the flow is, the bigger the gains.

The approach is to shape behavior from chaos by exclusion, instead of defining all possible transitions. With LLMs, this process could be automated and effectively an agent would be dynamically creating itself using a DSL (state schema and predefined states). The great thing about LLMs is being charged by tokens instead of a number of requests. We can just interrogate them about every detail separately and build a flow graph with transparent (and debuggable) reasoning. I also have API sketches for proactive scenarios (originally made for an ML prototype) [0].

[0] https://github.com/pancsta/secai/blob/474433796c5ffbc7ec5744...

wfn · 4 months ago

This is great, thank you so much for sharing!

mgdev · 4 months ago

These are great. I had my own list of takeaways [0] after doing this for a couple years, though I wouldn't go so far as calling mine factors.

Like you, biggest one I didn't include but would now is to own the lowest level planning loop. It's fine to have some dynamic planning, but you should own an OODA loop (observe, orient, decide, act) and have heuristics for determining if you're converging on a solution (e.g. scoring), or else breaking out (e.g. max loops).

I would also potentially bake in a workflow engine. Then, have your model build a workflow specification that runs on that engine (where workflow steps may call back to the model) instead of trying to keep an implicit workflow valid/progressing through multiple turns in the model.

[0]: https://mg.dev/lessons-learned-building-ai-agents/

this guide is great, i liked the "chat interfaces are dumb" take - totally agree. AI-based UIs have a very long way to go

hhimanshu · 4 months ago

daxfohl · 4 months ago

This old obscure blog post about framework patterns has resonated with me throughout my career and I think it applies here too. LLMs are best used as "libraries" rather than "frameworks", for all the reasons described in the article and more, especially now while everything is in such flux. "Frameworks" are sexier and easier to sell though, and lead to lock-in and add-on services, so that's what gets promoted.

https://tomasp.net/blog/2015/library-frameworks/

saadatq · 4 months ago

This is so good…

“… you can find frameworks not just in software, but also in ordinary life. If you buy package holidays, you're buying a framework - they transport you to some place, put you in a hotel, feed you and your activities have to fit into the shape provided by the framework (say, go into the pool and swim there). If you travel independently, you are composing libraries. You have to book your flights, find your accommodation and arrange your program (all using different libraries). It is more work, but you are in control - and you can arrange things exactly the way you need.”

My favorite blog post / presentation is Sandi Metz "The Wrong Abstraction", but this one is up there. Definitely punches above its weight for a small obscure post.

oh heck yeah this rocks. I'm gonna add to the links section

Additionally in terms of career development, you're going to be a lot better off learning the low level LLM interfaces rather than being dependent on a framework (or their even more evil cousin, platforms). Once you learn those, jumping to a platform is usually trivial, whereas the reverse can be more challenging. Junior devs often think that the more frameworks they have on their resume the better, but it often pigeonholes you more than it helps.

And I don't mean to imply that frameworks are always bad. Things like security best practices out of the box can be worth it. But especially in AI right now, nobody knows what those best practices are going to be. So it's best to spend this time learning how to do things at a low level rather than attaching to some framework that may be obsolete in a year.

Another one: plan for cost at scale.

These things aren't cheap at scale, so whenever something might be handled by a deterministic component, try that first. Not only save on hallucinations and latency, but could make a huge difference in your bottom line.

Deleted Comment

Yeah definitely. I think the pattern I see people using most is “start with slow, expensive, but low dev effort, and then refine overtime as you fine speed/quality/cost bottlenecks worth investing in”

Manfred · 4 months ago

I believe the principles would be easier to follow if there is a consistent narrative through the factors, why which I mean using potentially real-world example for such a system.

This is a great bit of feedback - what kinda of use cases do you think would make sense?

Definitely wanna evolve this in the open with the community

May be if you pick a real-world agent workflow (toy from your production experience, trim it down), and showcase how all these factors will come along in a project.

I am inspired by the simplicity of these 12 factors and definitely want to learn more with an example that embraces these factors.

I don’t have any experience in that area so I can’t really suggest anything.

glial · 4 months ago

This is great -- and I have learned 80% the hard way. The other 20% will be valuable reading!

Personally I've had success with LangGraph + pydantic schemas. Curious to know what others have found useful.

funny you say

> I have learned 80% the hard way

because the other working title for this was "Agents the Hard Way" (in the spirit of https://github.com/kelseyhightower/kubernetes-the-hard-way)

This could not have come at a better time for me, thank you!

I've been tinkering with an idea for an audiovisual sandbox[1] (like vvvv[2] but much simpler of course, barebones).

Idea is to have a way to insert LM (or some simple locally run neural net) "nodes" which are given specific tasks and whose output is expected to be very constrained. Hence your example:

    "question -> answer: float"

Is very attractive here. Of course, some questions in my case would be quite abstract, but anyway. Also, multistage pipelines are also very interesting.

[1]: loose set of bulletpoints brainstorming the idea if curious, not organised: https://kfs.mkj.lt/#audiovisllm (click to expand description)

[2]: https://vvvv.org/

Typed outputs from an LLM is a game changer!