- New benchmark SOTAs with 77.9% on SWE-Bench-Verified, 79.9% on SWE-Lancer, and 58.1% on TerminalBench 2.0
- Natively trained to work across many hours across multiple context windows via compaction
- 30% more token-efficient at the same reasoning level across many tasks
Let us know what you think!
I really like the "subagent" feature in Claude Code — it's super useful to manage context in complex codebases. Here are some examples of agents that can be useful: https://github.com/humanlayer/humanlayer/tree/main/.claude/a...
Would it make sense to have a similar feature in Codex CLI? I often do "spec-driven development", which is basically a loop of:
research -> implementation plan -> actual implementation (based on research + plan) -> validation
I have multiple subagents that I use for each phase that (based on subjective judgement) improve the output quality (vs keeping everything, every tool use etc. in the "main" context window).Codex CLI is great and I use it often but I'd like to have more of these convenient features for managing context from CC. I'm super happy that compaction is now available, hopefully we'll get more features for managing context.
If you’re unfamiliar, the phone connectivity situation in the 80s and 90s was messy and piecemeal. AT&T had been broken up in 1982 (see https://www.historyfactory.com/insights/this-month-in-busine...), and most people had a local phone provider and AT&T was the default long-distance provider. MCI and Sprint were becoming real competition for AT&T at the time of these commercials.
Anyway, in 1993 AT&T was still the crusty old monopoly on most people’s minds, and the idea that they were going to be the company to bring any of these ideas to the market was laughable. So the commercials were basically an image play. The only thing most people bought from AT&T was long distance service, and the main threat was customers leaving for MCI and Sprint. The ads memorable for sure, but I don’t think they blew anyone’s mind or made anyone stay with AT&T.
I wrote a post about his insistence that the "cost of inference" is going up. https://crespo.business/posts/cost-of-inference/
To be clear, I do expect that the bubble will burst at some point (my bet is 2028/2029) — but that's due to dynamics between markets and new tech. The tech itself is solid, even in the current form — but when there's a lot of money to make you tend to observe repeatable social patterns that often lead to overvaluing of the stuff in question.
I've asked Claude and this it answered this:
Skills = Instructions + resources for the current Claude instance (shared context)
Subagents = Separate AI instances with isolated contexts that can work in parallel (different context windows)
Skills make Claude better at specific tasks. Subagents are like having multiple specialized Claudes working simultaneously on different aspects of a problem.
I imagine we can probably compose them, e.g. invoke subagents (to keep separate context) which could use some skills to in the end summarize the findings/provide output, without "polluting" the main context window.It's an okay product I appreciate that it's selfhosted with good documentation but they absolutely destroyed their brand with excessive affiliate marketing and now nothing of substance is left if you search for it anywhere.
I don't know where the Overton Window is today, but it's a long way from Jimmy Carter's peanut farm.