Deleted Comment
- New benchmark SOTAs with 77.9% on SWE-Bench-Verified, 79.9% on SWE-Lancer, and 58.1% on TerminalBench 2.0
- Natively trained to work across many hours across multiple context windows via compaction
- 30% more token-efficient at the same reasoning level across many tasks
Let us know what you think!
One huge difference I notice between Codex and Claude code is that, while Claude basically disregards your instructions (CLAUDE.md) entirely, Codex is extremely, painfully, doggedly persistent in following every last character of them - to the point that i've seen it work for 30 minutes to convolute some solution that was only convoluted because of some sentence I threw in the instructions I had completely forgotten about.
I imagine Codex as the "literal genie" - it'll give you exactly what you asked for. EXACTLY. If you ask Claude to fix a test that accidentally says assert(1 + 1 === 3), it'll say "this is clearly a typo" and just rewrite the test. Codex will rewrite the entire V8 engine to break arithmetic.
Both these tools have their uses, and I don't think one approach is universally better. Because Claude just hacks its way to a solution, it is really fast, so I like using it for iterate web work, where I need to tweak some styles and I need a fast iterative loop. Codex is much worse at that because it takes like 5 minutes to validate everything is correct. Codex is much better for longer, harder tasks that have to be correct -- I can just write some script to verify that what it did work, and let it spin for 30-40 minutes.
> my ADHD regularly has me forgetting to restart it, to the tune of 100+ tabs open across multiple desktops.
My MacBook Air routinely will have 200-300 before I purge. Getting better at keeping under 100 but yeah...My Linux desktop is hooked up to my TV[0] and currently has over 100 YouTube tabs open. I'm going to watch those math videos, I swear, I'm just tired right now and so want to watch garbage.
I do have ublock origin on both machines and some stricter privacy settings, maybe that's it? But otherwise yeah, FF is just as snappy as chrome. Which I do use regularly when on other people's machines.
[0] it's a movie server, gaming machine, and for everything else there's ssh and ydotool (I wish Apple would let me make better iPhone scripts than Shortcuts allows. Shortcuts makes me want to throw my phone against a wall...)
In specific, I'm really proud of "spec driven development", which is based on the internal processes that software development teams at Amazon use to build very large technical projects. Kiro can take your basic "vibe coding" prompt, and expand it into deep technical requirements, a design document (with diagrams), and a task list to break down large projects into smaller, more realistic chunks of work.
I've had a ton of fun not just working on Kiro, but also coding with Kiro. I've also published a sample project I built while working on Kiro. It's a fairly extensive codebase for an infinite crafting game, almost 95% AI coded, thanks to the power of Kiro: https://github.com/kirodotdev/spirit-of-kiro
https://www.legaldive.com/news/Chabon-OpenAI-class-action-co...
Those authors didn’t “make their IP available on the internet”, did they?
Your argument is the same as Facebook saying “we can’t provide this service without invading your privacy” or another company saying “we can’t make this product without using cancerous materials”.
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
Once the IP is on the internet, you can't complain about a human or a machine learning from it. You made your IP available on the internet. Now, you can't stop humanity benefiting from it.