We are definitely inspired by Railway though!
We are definitely inspired by Railway though!
One suggestion for improvement: Add some more info to your website/GitHub about the need for a provider and which providers are compatible. It took me a bit to figure that out because there was no prominent info about it. Additionally, none of the demos showed a login or authentication part. To me, it seemed like the VMs just came out of nowhere. So at first, I thought "Cloudrouter" was a project/company that gave away free VMs/GPUs (e.g. free tier/trial thing). But that seemed too good to be true. Later, I noticed the e2b.app domain and then I also found the little note way down at the bottom of the site that says "Provider selection" and "Use E2B provider (default)". Then I got it. However, I should mention that I don't know much about this whole topic. I hadn't heard of E2B or Modal before. Other people might find it more clear.
For those that are wondering about this too, you will need to use a provider like https://e2b.dev/ or https://modal.com/ to use this skill, and you pay them based on usage time.
Each ran the same spec headlessly in their native harness (one shot).
Results:
Agent Cycles Time
─────────────────────────────────────────────
gpt-5-2 2,124 16m
claude-opus-4-5-20251101 4,973 1h 2m
gpt-5-1-codex-max-xhigh 5,402 34m
gpt-5-codex 5,486 7m
gpt-5-1-codex 12,453 8m
gpt-5-2-codex 12,905 6m
gpt-5-1-codex-mini 17,480 7m
claude-sonnet-4-5-20250929 21,054 10m
claude-haiku-4-5-20251001 147,734 9m
gemini-3-pro-preview 147,734 3m
gpt-5-2-codex-xhigh 147,734 25m
gpt-5-2-xhigh 147,734 34m
Clearly none beat Anthropic's target, but gpt-5-2 did slightly better in much less time than "Claude Opus 4 after many hours in the test-time compute harness".They could sell training data too. Though, UIs are relatively solved. But great UIs and criticizing UIs aren't.
Learned a lot from Refactoring UI, and I know (from trying) that it's impossible to make a code review bot based on out of the box sota models today. Vision capabilities are lacking here, and I can see demand for more data here. And Adam's taste likely fits well here.
Curious to hear more about your local orchestration platform, how did you solve resource sharing (mainly ports for web stuff tbh)? Or is it more intra-task vs inter-task parallelism?