The Five Levels: From spicy autocomplete to the dark factory

simonw · 11 days ago

I've talked to a team that's doing the dark factory pattern hinted at here. It was fascinating. The key characteristics:

- Nobody reviews AI-produced code, ever. They don't even look at it.

- The goal of the system is to prove that the system works. A huge amount of the coding agent work goes into testing and tooling and simulating related systems and running demos.

- The role of the humans is to design that system - to find new patterns that can help the agents work more effectively and demonstrate that the software they are building is robust and effective.

It was a tiny team and they stuff they had built in just a few months looked very convincing to me. Some of them had 20+ years of experience as software developers working on systems with high reliability requirements, so they were not approaching this from a naive perspective.

I'm hoping they come out of stealth soon because I can't really share more details than this.

spyckie2 · 10 days ago

What's the point honestly.

Given the pace of current ai, in 2 months dark factories will peak hype and then in another 6 months it will be fully identified in its cost/benefit drawbacks, and the wisdom of the crowds will have a relatively accurate understanding of its general usefulness, and the internet will move on to other things.

The next generation of ai coding will make dark factories legit due to their ability to architect decently. Then generation after will make dark factories obsolete due to their ability to make it right the first time. That's about 8 months out for SOTA, and 14 months out for Sonnet/Flash/Pro users.

No need for them to come out of stealth, just imagine 1000s of junior/mid engineers crammed into an office given vague instructions to build an app and spit out code. Imagine a cctv in the room overlooking the hundreds of desks, and then press fast forward 100x speed.

That's literally what they built, because that's what's possible with Opus.

daxfohl · 10 days ago

The funny thing is that the rest of the software industry is dying, except for the trillions of venture capital being invested into these AI coding whatevers. But given the slow death of software, once these AI coding whatevers are finished, there's going to be nothing of value left for them to code.

But I'm sure the investors will still come out just fine.

observationist · 11 days ago

You'd think at some point it'll be enough to tell the AI "ok, now do a thorough security audit, highlight all the potential issues, come up with a best practices design document, and fix all the vulnerabilities and bugs. Repeat until the codebase is secure and meets all the requisite protocol standards and industry best practices."

We're not there yet, but at some point, AI is gonna be able to blitz through things like that the way they blitz through making haikus or rewriting news articles. At some point AI will just be reliably competent.

Definitely not there yet. The dark factory pattern is terrifying, lol.

simonw · 11 days ago

That's definitely a pattern people are already starting to have good results from - using multiple "agents" (aka multiple system prompts) where one of them is a security reviewer that audits for problems and files issues for other coding agents to then fix.

I don't think this worked at all well six months ago. GPT-5.2 and Opus 4.5 might just be good enough for this pattern to start being effective.

jwpapi · 11 days ago

Honestly I’m not sure we’re not there yet, run this prompt as a ralph loop for 2 days on your codebase and see where you at...

simonw · a day ago

... they came out of stealth, it was the StrongDM AI team - details here: https://factory.strongdm.ai/

My further notes here: https://simonwillison.net/2026/Feb/7/software-factory/

noosphr · 11 days ago

Canadian girlfriend coding strikes again.

I would love for someone to point to a codebase done by an ai with the code, history and cost that's good. It's always a ball of mud that doesn't work and even the ai that coded it up can't maintain it.

simonw · 11 days ago

What were the last three that you looked at that disappointed you, and what did you find lacking with them?

qingcharles · 10 days ago

My biggest project (in LOCs) is 100% AI written and I've given up reviewing the code on it. Huge web-based content management system with a native desktop app companion. It's worked flawlessly 24/7 for the last couple of months. I add a new feature every week or so, but I just do the code-as-English dance now and test what comes out. It's almost exclusively all Gemini 3 Pro and Opus 4.5. I've gone fully dark on that project.

I have other projects where I review almost every line, but everything is edging towards the dark side.

I've been coding for 40 years in every language you can think of. Glad that's over, honestly. It always got in the way of turning an idea into a product.

antonvs · 9 days ago

> Nobody reviews AI-produced code, ever. They don't even look at it.

How is this supposed to differ from the original Karpathy definition of vibe coding? Is it just "vibe coding plus rigorous verification"?

(Or is it mainly intended to sound more desirable than vibe coding?)

simonw · 9 days ago

"vibe coding plus rigorous verification" is a really good way of describing it.

urineeeee · 11 days ago

Holy cow I actually bought this comment and it was on my mind for a bit, then saw another simonw comment about "the team" below. Check your sources folks!

Almost had me you cheeky devil you :)

Thorrez · 10 days ago

Is simonw untruthful or unreliable?

yojat661 · 10 days ago

Lol same. Didn't realize it was the ai hype master on my first read.

deadbabe · 10 days ago

Level Six: knowledge on how to build products deteriorates, more high level thinking is outsourced to AI. AI are asked to simply put out several versions and possibilities of products and testers go through harvesting candidates that are the most usable and have the least bugs, good enough for production. It could take a long time or it could happen very quick.

Level Seven: no one even knows what software is anymore, they just pray to AI to solve their problems and hope for the best. Some priests occasionally do random stuff that seems to affect outcomes, but no one knows for sure.

PaulDavisThe1st · 10 days ago

Level Eight: so few people do any paid labor any more, and society failed to figure out any sort of distributive income system such as UBI, so increasing chronic and endemic poverty is slowly eating away at revenue generation from AI designed and coded products and services.

naruhodo · 10 days ago

Pitchforks and killbots.

ekidd · 11 days ago

Having actually run some of the software produced by nearly "dark software factories," a lot of that software is completely shit.

Yegge's Beads is a genuinely good design, for example, but it's flakier and more broken the Unix vendor Motif implementations in 1993, and it eats itself more often than Windows 98 would blue screen.

I can actually run a bunch of orchestrated agents, and get code which isn't complete shit. But it's an extremely skill-intensive process, because I'm acting as product manager, lead engineer, and the backstop for the holes in the cognition of a bunch of different Claudes.

So far, the people promising completely dark software factories are either high on their own supply, or grifting to sell books (or occasionally crypto). Or so I judge from using the programs they ship.

xg15 · 11 days ago

I found it kind of fitting that didn't even describe what a human would still do at level 5 nor why it would be desirable. It's just the "natural" progression of a 5 step ladder and that seems to be reason enough.

thenfcm · 11 days ago

Well isnt the point humans wouldn't need to do basically anything?

It would be 'desirable' because the value is in the product of the labour not the labour itself. (Of course the resulting dystopian hellscape might be considered undesirable)

radu_floricica · 10 days ago

People are very pessimistic here in the comments, but I see no fundamental, long term reason why AI generated code can't be refactored, maintained and tested by AI just as well (or better) than average-quality human generated code. Especially because things are evolving - by the time the projects will need to be maintained, there will likely already be better tools to do that. So while I wouldn't vibecode drivers for life support systems yet, there is significant runway of tech debt for most use cases.

pphysch · 11 days ago

The autopilot analogy is good because level 4-5 are essentially vaporware outside of success in controlled environments backed by massive investment and engineering.

hbarka · 11 days ago

What is the AI analog for Tesla's level of robotaxi, where there's a "safety monitor" in the passenger seat or sans safety monitor there's a trailing guide car[1] and remote driver in Mumbai[2]?

[1] https://electrek.co/2026/01/22/tesla-didnt-remove-the-robota...

[2] https://insideevs.com/news/760863/tesla-hiring-humans-to-con...

renjimen · 11 days ago

We're going to need to become a lot more creative about what and how we test if we're ever to reach dark factory levels. Unit tests and integration tests are one thing, but truly testing against everything in a typical project requirements document is another thing.

simonw · 11 days ago

The team I saw doing this had a fake Slack channel full of fake users, each of which was constantly hammering away trying out different things against a staging environment version of the system.

That was just one of the tricks they were using, and this was a couple of months ago so they've no-doubt come up with a bunch more testing methods since then.

stuaxo · 10 days ago

I dread to imagine tbe state lf the code, there are some antipatterns that LLMs come back to again and again.

6510 · 10 days ago

The analogy is a good fit. I'm at level 0 because no way in hell I'm going to die from cruise control.

I imagine there should be two levels above: 6: The AI designs the product and 7: A market where AI (now completely autonomous) sells incomprehensible products to other AI's. Like a project Dwain factor enhancer where Dwain is a fictional character coined by an onlyfax DND bot.