https://github.com/magnet-linux/magnet-linux
Not really ready for prime time, but I think I have some interesting ideas there at least.
My point is about making it so that you have to actively risk money to push the truth needle in the wrong direction.
I feel like this is the sort of thing a prediction market might be able sort out.
I kind of consider them the same thing. Openpilot can drive really well on highways for hours on end when nothing interesting is happening. Claude code can do straight forward refactors, write boilerplate, do scaffolding, do automated git bisects with no input from me.
Neither one is a substitute for the 'driver'. Claude code is like the level 2 self driving of programming.
It's like letting a wet dog (who'd just been swimming in a nearby swamp) run loose inside your hermetically sealed cleanroom.
1. Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.
2. Coding agents do massively better when they have a test-driven reward signal.
3. If a problem can be framed in a way that a coding agent can solve, that speeds up development at least 10x from the base case of human + assistant.
4. From (1)-(3), if you can get all the necessary context into 50k tokens and measure progress via tests, you can speed up development by 10x.
5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.
Sure enough, I see HN projects evolving in that direction.
This has always been good practice anyway.