Levels of Agentic Engineering

As a lowly level 2 who remains skeptical of these software “dark factories” described at the top of this ladder, what I don’t understand is this:

If software engineering is enough of a solved problem that you can delegate it entirely to LLM agents, what part of it remains context-specific enough that it can’t be better solved by a general-purpose software factory product? In other words, if you’re a company that is using LLMs to develop non-AI software, and you’ve built a sufficient factory to generate that software, why don’t you start selling the factory instead of whatever you were selling before? It has a much higher TAM (all of software)

2001zhaozhao · 2 days ago

Why sell the factory when you can create automated software cloner companies that make millions off of instantly copying promising startups as soon as they come out of stealth?

If you could get a dark factory working when others don't have one, you can make much more money using it than however much you can make selling it

jochem9 · 5 hours ago

ASML has a near monopoly on the most advanced chip machines. They maintain that by 'just' being the most advanced and having lots of patents.

They haven't branched off into making chips themselves. They keep their focus on selling the factories.

I think they haven't, because ASML itself doesn't have production lines. Every machine is one off. It even gets delivered with a team of engineers to keep it running.

The same probably holds true for software factories: the best ones are assembled by the smartest people (wielding AI in ways most of us don't). They are not in the business to produce software at scale, they are in the business to ensure others can do that using increasingly advanced software factories.

This relies on the premise that such a factory cannot produce a more advanced factory without significant human intervention (e.g. high ingenuity and/or lots of elbow grease). If this doesn't hold true, then we are in for some interesting times x100.

tkiolp4 · 2 days ago

That’s not true. Even if we assume LLMs can generate the code needed to support the next Facebook, one still has to: buy/rent tons of hardware (virtual or baremetal), put tons of money in marketing, break the network effect, pay for 3rd party services for monitoring, alerting and what not. That’s money, and LLMs don’t help with that

antonvs · 2 days ago

Producing the software is only a small part of the picture when it comes to generating revenue.

So far, we haven’t seen much to suggest that LLMs can (yet) replace sales and most of the related functions.

whattheheckheck · 2 days ago

Too bad they cant

hakanderyal · 2 days ago

We are not there yet. While there are teams applying dark factory models to specific domains with self-reported success, it's yet to be proven, or generalizable enough to apply everywhere.

glhast · 2 days ago

Also a measly level 2er. I'm curious what kind of project truly needs an autonomous agent team Ralph looping out 10,000 LOCs per hour? Seems like harness-maxxing is a competitive pursuit in its own right existing outside the task of delivering software to customers.

Feels like K8s cult, overly focused on the cleverness of _how_ something is built versus _what_ is being built.

maxdo · a day ago

essentially any enterprise software for example, surprisingly, that needs to be custom tailored and not scaled for millions of views. e.g. anything that has a high context.

Youtube's of this world will not enjoy it, they will use rules of scale for billions of users.

Every Dashboard Chart, Security review system, Jira, ERP, CRM, LMS, chatbot, you name it. The problem that will win from a customization per smaller unit ( company, group of people or even more so an indvidual, like CEO, or CxO group) will win from such software.

The level 6 and and 7 is essentially death of enterprise software.

cheevly · 2 days ago

Software that is otherwise not feasible for humans to build by hand.

Deleted Comment

pydry · 2 days ago

I have the same question about people who sell "get rich with real estate" seminars.

dist-epoch · 2 days ago

Codex and Claude Code are these (proto)factories you talk about - almost every programmer uses them now.

And when they will be fully dark factories, yes, what will happen is that a LOT of software companies will just disappear, they will be dis-intermediated by Codex/Claude Code.

I coded a level 8 orchestration layer in CI for code review, two months before Claude launched theirs.

It's very powerful and agents can create dynamic microbenchmarks and evaluate what data structure to use for optimal performance, among other things.

I also have validation layers that trim hallucinations with handwritten linters.

I'd love to find people to network with. Right now this is a side project at work on top of writing test coverage for a factory. I don't have anyone to talk about this stuff with so it's sad when I see blog posts talking about "hype".

moosehater · 2 days ago

Do you feel like you are still learning about the programming language(s) and other technologies you are using? Or do you feel like you are already a master at them?

Do you ever take the time to validate what one of the agents produces by going to the docs? Or is all debugging/changing of the code done via LLMs/agents?

I'm more like level 2 right now and genuinely curious if you feel like learning continues for you (besides with agentic orchestration, etc.) And if not, whether or not you think that matters.

jjmarr · 2 days ago

I'm learning more than ever before. I'm not a master at anything but I am getting basic proficiency in virtually everything.

> Do you ever take the time to validate what one of the agents produces by going to the docs? Or is all debugging/changing of the code done via LLMs/agents?

I divide my work into vibecoding PoC and review. Only once I have something working do I review the code. And I do so through intense interrogation while referencing the docs.

> I'm more like level 2 right now and genuinely curious if you feel like learning continues for you (besides with agentic orchestration, etc.)

Level 8 only works in production for a defined process where you don't need oversight and the final output is easy to trust.

For example, I made a code review tool that chunks a PR and assigns rule/violation combos to agents. This got a 20% time to merge reduction and catches 10x the issues as any other agent because it can pull context. And the output is easy to incorporate since I have a manager agent summarize everything.

Likewise, I'm working on an automatic performance tool right now that chunks code, assigns agents to make microbenchmarks, and tries to find optimization points. The end result should be easy to verify since the final suggestion would be "replace this data structure with another, here's a microbenchmark proving so".

jessmartin · 2 days ago

I got my own level 8 factory working in the last few days and it’s been exhilarating. Mine is based on OpenAI’s Symphony[1], ported to TypeScript.

Would be happy to swap war stories.

<myhnusername>@gmail.com

whattheheckheck · 2 days ago

How much money have you made with this approach

Deleted Comment

krackers · a day ago

What level is copy pasting snippets into the chatgpt window? Grug brained level 0? I sort of prefer it that way (using it as an amped up stackoverflow) since it forces me to decompose things in terms of natural boundaries (manual context management as it were) and allows me to think in terms of "what properties do I need this function to have" rather than just letting copilot take the wheel and glob the entire project in the context window.

ddxv · a day ago

I still do this too for tough projects in languages I know. Too many times getting burned thinking 'wow it one shot that!' only to end up debugging later.

I let agents run wild on frontend JS because I don't know it well and trust them (and an output I can look at).

tracker1 · 20 hours ago

IMO, the front end results are REALLY hit and miss... I mostly use it to scaffold if I don't really care because the full UI is just there to test a component, or I do a fair amount of the work mixed. I wish it was better at working with some of the UI component libraries with mixed environments. Describing complex UX and having it work right are really not there yet.

Lord-Jobo · a day ago

1.8: chat ide the slow way :)

This is also where I do most of my AI use. It’s the safe spot where I’m not going to accidentally send proprietary info to an unknown number of eyeballs(computer or human).

It’s also just cumbersome enough that I’m not relying on it too much and stunting my personal ability growth. But I’m way more novice than most on here.

I've found it's easy enough to have AI scaffold a working demo environment around a single component/class that I'm actually working on, then I can copy the working class/component into my "real" application. I'm in a pretty locked down environment, so using a separate computer and letting the AI scaffold everything around what I'm working on is pretty damned nice, since I cannot use it in the environment or on the project itself.

For personal projects, I'm able to use it a bit more directly, but would say I'm using it around 5/6 level as defined here... I've leaned on it a bit for planning stages, which helps a lot... not sure I trust swarms of automated agents, though it's pretty much the only way you're going to use the $200 level on Claude effectively... I've hit the limits on the $100 only twice in the past month, I downgraded after my first month. And even then, it just forced me to take a break for an hour.

giancarlostoro · a day ago

I think you bring up a good point, it falls under Chat IDE, but its the "lowest" tier if you will. Nothing wrong with it, a LOT of us started this way.

vorticalbox · a day ago

I do this too with the chatgpt mac app. It has a "pop out" feature it binds to option + space then i just ask away.

branoco · a day ago

anything, if it brings the results

antonvs · a day ago

I find the CLI agents a decent middle ground between the extremes you describe. There’s a reason they’ve gained some popularity.

waynesonfire · a day ago

Your techinque doesn't keep the kool-aid flowing. Shut up. /s

The more I try to use these tools to push up this "ladder" the more it becomes clear the technology is a no more than a 10x better Google search.

mzg · 2 days ago

holtkam2 · 2 days ago

Level 9: agent managers running agent teams Level 10: agent CEOs overseeing agent managers Level 11: agent board of directors overseeing the agent CEO

Level 12: agent superintelligence - single entity doing everything

Level 13: agent superagent, agenting agency agentically, in a loop, recursively, mega agent, agentic agent agent agency super AGI agent

Level 14: A G E N T

zenoprax · 2 days ago

Level 15 (if not succumbed to fatal context poisoning from malicious agent crime syndicate): Agents creating corporations to code agentic marketplaces in which to gamble their own crypto currencies until they crash the real economy of humans.

clickety_clack · 2 days ago

Level 16: it’s not level 16, it’s level 17.

zem · 19 hours ago

stross's "accelerando" has a bit about this. fun book.

javier123454321 · a day ago

Until we solve agent consumers that become the backstop of the economic engine when we all get unemployed, who are these agents working for?

stale2002 · 2 days ago

No, level 14 is Jeff Bezos.

nimasadri11 · 2 days ago

I really like your post and agree with most things. The one thing I am not fully sure about:

> Look at your app, describe a sequence of changes out loud, and watch them happen in front of you.

The problem a lot of times is that either you don't know what you want, or you can't communicate it (and usually you can't communicate it properly because you don't know exactly what you want). I think this is going to be the bottleneck very soon (for some people, it is already the bottleneck). I am curious what are your thoughts about this? Where do you see that going, and how do you think we can prepare for that and address that. Or do you not see that to be an issue?

smallnix · 2 days ago

Reminds me of a colleague who said they don't need to learn to type faster, since they use the time to think what they want to write.

throwaw12 · a day ago

As a Level 6,

I am feeling like to go back to Level 5.

Level 6 helps with fixing bugs, but adding a new feature in a scalable way is not working out for me, I feed bunch of documents and ask it to analyze and come up with a solution.

1. It misses some details from docs when summarizing

2. It misses some details from code and its architecture, especially in multi-repo Java projects (annotations, 100 level inheritance is making it confuse a lot)

3. Then comes up with obvious (non) "solution" which is based on incorrect context summaries.

I don't think I can give full autonomy to these things yet.

But then, I wonder, people on Level 8, why don't they create bunch of clones of games, SaaS vendors and start making billions

jeanloolz · a day ago

Most of the successes, especially online, is rarely about the thing that is built but more about the marketing around it. I don't we can fully automate marketing effectively

AdamN · a day ago

Which model(s) are you using?

ftkftk · 2 days ago

I prefer Dan Shapiro's 5 level analogy (based on car autonomy levels) because it makes for a cleaner maturity model when discussing with people who are not as deeply immersed in the current state of the art. But there are some good overall insights in this piece, and there are enough breadcrumbs to lead to further exploration, which I appreciate. I think levels 3 and 4 should be collapsed, and the real magic starts to happen after combining 5 and 6; maybe they should be merged as well.

Car levels autonomy is fake. Everything including Level 3 is not a real autonomy it is hard rules + some reaction to the world, and everything above 3 is autonomy with just s slightly human security guardrails to attempt the real autonomy.

At this moment where we have human who just sit there before verify enough 9 after comas of error rates, the entire level conversation is dead. It's almost a binary state. Autonomous or not.

Similar happened with software levels. Even Level 2 was sci-fi 2 years ago, 1 year away from now anything bellow level 5 will be a joke except very regulated or billion users systems scale software.

bensyverson · a day ago

Agreed; here's the link for anyone looking for it:

https://www.danshapiro.com/blog/2026/01/the-five-levels-from...

Arainach · 2 days ago

> If your repo requires a colleague's approval before merge, and that colleague is on level 2, still manually reviewing PRs, that stifles your throughput. So it is in your best interest to pull your team up.

Until you build an AI oncaller to handle customer issues in the middle of the night (and depending on your product an AI who can be fired if customer data is corrupted/lost), no team should be willing to remove the "human reviews code step.

For a real product with real users, stability is vastly more important than individual IC velocity. Stability is what enables TEAM velocity and user trust.