I don't get it. The title says "What makes Claude Code so damn good", which implies that they will show how Claude Code is better than other tools, or just better in general. But they go about repeating the Claude Code documentation using different wording.
Am I missing something here? Or is this just Anthropic shilling?
(blogpost author here)
Haha, that's totally fair. I've read a whole bunch of posts comparing CC to other tools, or with a dump of the the architecture. This post was mainly for people who've used CC extensively, know for a fact that it is better and wonder how to ship such an experience in their own apps.
I've used Claude Code, Cursor, and Copilot is Vscode and I don't "know" that Claude Code is better apart from the fact that it runs in the terminal, which makes it a little faster but less ergonomic than tools running inside the editor. All of the context tricks can be done with Copilot instructions as well, so I simply can't see how Claude Code is superior.
not in the title but, one of the opening sentences is this:
> I find Claude Code objectively less annoying to use compared to Cursor, or Github Copilot agents even with the same underlying model! What makes it so damn good?
The difference between Claude Code and Cursor is that one is a command line tool and the other an IDE. You can use Claude models in both and all these techniques can be applied with Cursor and its rules, too.
Not even close. An agentic tool can be fully autonomous, an IDE like Cursor is, well it's "just" an editor. Quite the opposite. Sure it does some heavy lifting too, but still the user writes the code. They start to implement fully agentic tools and models, but they are nowhere near work as good as Claude Code does.
There is also Cursor Agent CLI which is a TUI exactly like CC. I switched to it because I don't like GUI AI assistants, but I also couldn't stand CC always being overloaded and having many bugs that were affecting me. I'm not on Cursor Agent CLI with GPT5 and happy to have an alternative to CC.
not at all, it's just not a "claude model". All these companies add their own prompts hints on top. it's a totally different experience. Trying using kiro which is also a "claude model" and tell me it's the same
Unfortunately, Claude Code is not open source, but there are some tools to better figure out how it is working. If you are really interested in how it works, I strongly recommend looking at Claude Trace: https://github.com/badlogic/lemmy/tree/main/apps/claude-trac...
It dumps out a JSON file as well as a very nicely formatted HTML file that shows you every single tool and all the prompts that were used for a session.
It's all how the base model has been trained to break tasks into discrete steps and work through them patiently, with some robustness to failure cases.
You really should not check that... I saw some dude on reddit saying that you can build your own saas in 20 days and launch and sell it. I checked out some of his; Claude Code can do that in a few hours. So can I without AI as I have a batteries included framework ready that has all the plumbing done. But Claude can do those from scratch in hours. So 1 day with me doing some testing and fixing. That is not a product or a startup: it's a grift. But glory to him for getting it done anyway. Not many people launch and then actually make a few bucks.
What AI can definitely not do is launch or sell anything.
I can write some arbitrary SaaS in a few hours with my own framework, too - and know it's much more secure than anything written by AI. I also know how to launch it. (I'm not so good at the "selling" part).
But if anyone can do all of this - including the launching the selling - then they would not be selling themselves on Reddit or Youtube. Once you see someone explaining to you how to get rich quickly, you must assume that they have failed or else they would not be wasting their time trying to sell you something. And from that you should deduce that it's not wise to take their advice.
Thanks for sharing this. At a time where this is a rush towards multi-agent systems, this is helpful to see how an LLM-first organization is going after it. Lots of the design aspects here are things I experiment with day to day so it's good to see others use it as well
A few takeaways for me from this
(1) Long prompts are good - and don't forget basic things like explaining in the prompt what the tool is, how to help the user, etc
(2) Tool calling is basic af; you need more context (when to use, when not to use, etc)
(3) Using messages as the state of the memory for the system is OK; i've thought about fancy ways (e.g., persisting dataframes, parsing variables between steps, etc, but seems like as context windows grow, messages should be ok)
I want to note that: long prompts are good only if the model is optimized for it. I have tried to swap the underlying model for Claude Code. Most local models, even those claimed to work with long context and tool use, don't work well when instruction becomes too long. This has become an issue for tool use, where tool use works well in small ChatBot-type conversation demos, but when Claude's code-level prompt length increases, it just fails, either forgetting what tools are there, forgetting to use them, or returning in the wrong formats. Only the model by OpenAI, Google's Gemini, kind of works, but not as well as Anthropic's own models. Besides they feel much slower.
(author of the blogpost here)
Yeah, you can extract a LOT of performance from the basics and don't have to do any complicated setup for ~99% of use cases. Keep the loop simple, have clear tools (it is ok if tools overlap in function). Clarity and simplicity >>> everything else.
Oof, this comes at a hard moment in my Claude Code usage. I'm trying to have it help me debug some Elastic issues on Security Onion but after a few minutes it spits out a zillion lines of obfuscated JS and says:
Error: kill EPERM
at process.kill (node:internal/process/per_thread:226:13)
at Ba2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19791)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19664
at Array.forEach (<anonymous>)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19635
at Array.forEach (<anonymous>)
at Aa2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19607)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19538
at ChildProcess.W (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:20023)
at ChildProcess.emit (node:events:519:28) {
errno: -1,
code: 'EPERM',
syscall: 'kill'
}
I'm guessing one of the scripts it runs kills Node.js processes, and that inadvertantly kills Claude as well. Or maybe it feels bad that it can't solve my problem and commits suicide.
In any case, I wish it would stay alive and help me lol.
Claude and some of the edgier parts of localstack are not friends either. It's pretty okay at rust which surprised me.
It makes me think that the language/platform/architecture that is "most known" by LLMs will soon be the preferred -- sort of a homogenization of technologies by LLM usage. Because if you can be 10x as successfully vibey in, say, nodejs versus elixir or go -- well, why would you opt for those in a greenfield project at all? Particularly if you aren't a tech shop and that choice allows you to use junior coders as if they were midlevel or senior.
This mirrors a weird thought I’ve had recently. It’s not a thing I necessarily agree with, but just an idea.
I hear people say things like, “AI isn’t coming for my job because LLMs suck at [language or tech stack]!”
And I wonder, does that just mean that other stacks have an advantage? If a senior engineer with Claude Code can solve the problem in Python/TypeScript in significantly less time than you can solve it in [tech stack] then are you really safe? Maybe you still stack up well against your coworkers, but how well does your company stack up against the competition?
And then the even more distressing thought accompanies it: I don’t like the code that LLMs produce because it looks nothing like the code I write by hand. But how relevant is my handwritten code becoming in a world where I can move 5x faster with coding agents? Is this… shitty style of LLM generated code actually easier for code agents to understand?
Like I said, I don’t endorse either of these ideas. They’re just questions that make me uncomfortable because I can’t definitively answer them right now.
I have had zero good results with any LLM and elastic search. Everything it spits out is a hallucination because there aren’t very many examples of anything complete and in context on the internet.
I would try upgrading or wiping away your current install and re-installing it. There might be some cached files somewhere that are in a bad state. At least that's what fixed it for me when I recently came across something similar.
What do people think of Google's Gemini (Pro?) compared to Claude for code?
I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship
Gemini is amazing for taking a merge file of your whole repo, dropping it in there, and chatting about stuff. The level of whole codebase understanding is unreal, and it can do some amazing architectural planning assistance. Claude is nowhere near able to do that.
My tactic is to work with Gemini to build a dense summary of the project and create a high level plan of action, then take that to gpt5 and have it try to improve the plan, and convert it to a hyper detailed workflow xml document laying out all the steps to implement the plan, which I then hand to claude.
This avoids pretty much all of Claude's unplanned bumbling.
I don't think Gemini Pro is necessarily worse at coding, but in my experience Claude is substantially better at "terminal" tasks (i.e. working with the model through a CLI in the terminal) and most of the CLIs use Claude, see https://www.tbench.ai/leaderboard.
Yeah, the main strength of gemini-cli is being open-sourced and it still needs much polishing. I ended up building my own web-based interactive agent based on gemini-cli [1] out of frustration.
In my recent tests I found it quite smart at analyzing bigger picture (i.e. "hey, test failing not because of that, but because of whole assumption has changed and let me rewrite this test from scratch". But it also got stuck few times "I can't edit file, I'm stuck, let me try completely differently". But the biggest difference so far is the communication style - it's a bit.. snarky? I.e. comments like "yeah, tests are failing - as I suspected". Why the f it suspected failing test on the project it sees for the first time? :D
Pretty much every time Claude code is stuck or more or less just coding in circles i use Gemini PRO to analyze the code/data and feed the response into Claude to solve it. I also have much more success with Gemini when creating big sql transforming scripts or similar. Both are quite bad on bigger tasks, they get you 60% and then i spend days and days to trying to get to 100% .. its such a time sink when i select the wrong task for the llm.
It's doing rather well at thinking, but not at coding. When it codes, often enough it runs in circles and ignores input. Where I find it useful is to read through larger codebases and distill what I need to find out from it. Even using gemini from claude to consult it for certain things. Opus is also like that btw, but a bit better at coding. Sonnet though, excels at coding.. from my experience though.
Personally gemini has been giving me better results. Claude keeps trying to generate react code even when the whole context and my command is svelte, and failing constantly to give me something that can at least run, gemini, on the other hand has been pretty good with styling, and useful with the bussines logic. I dont get all the hype around claude.
The Gemini CLI tool is atrocious. It might work sometimes for analyzing code, but for modifying files, never. The inevitable conclusion of every session I've ever tried has been an infinite loop. Sometimes it's an infinite loop of self-deprecation, sometimes just repeating itself to failure, usually repeating the same tool failure until it catches it as an infinite loop. Tool usage frequently (we're talking 90% of the time) fails. It's also, frankly, just a bummer to talk to. The "personality" is depressed, self-deprecating, and just overall really weird.
That's been my experience, anyway. Maybe it hates me? I sure hate it.
this is so weird I am not at all getting the same experience, its tools work, it changes typescript and python confidently, makes mistakes, understands them and fixes them. I had a case of it giving up and admitting failure, but not in the way you describe
This matches my experience with it. I won’t let it touch any code I have not yet safely checked in before firing up Gemini. It will commonly get into a death loop mid session that can’t be recovered from.
I think it’s just that the base model is good at real world coding tasks - as opposed to the types of coding tasks in the common benchmarks.
If you use GitHub Copilot - which has its own system level prompts - you can hotswap between models, and Claude outperforms OpenAI’s and Google’s models by such a large margin that the others are functionally useless in comparison.
Anthropic has opportunities to optimize their models / prompts during reinforcement learning, so the advice from the article to stay close to what works in Claude code is valid and probably has more applicability for Anthropic models than applying the same techniques to others.
With a subscription plan, Anthropic is highly incentivized to be efficient in their loops beyond just making it a better experience for users.
But is it a game changer vs CoPilot in Agent mode with Claude 4 Sonnet?
Because it's twice the price and doesn't even have a trial.
I feel like if it were a game changer, like Cursor once was vs Ask mode with GPT, it would be worth it, but CoPilot has come a long way and the only up-to-date comparisons I've read point to it being marginally better or the same, but twice the price.
I read all the praise about Claude Code, tried it for a month and was very disappointed. For me it doesn't work any better than Cursor's sidebar and has worse UX on top. I wonder if I am doing something wrong because it just makes lots of stupid mistakes when coding for me, in two different code bases.
I'll suggest giving it another shot. It really is a game changer (I can't tell what you're doing wrong, but in a few people I've seen it has been about doing a psychological switch. I wrote about it a bit here - https://mnvr.in/beginners-mind, sharing in case it helps you see how you might approach it differently)
Because it’s embarrassing, and probably nobody understands why this works, depending on such heuristics that can completely change in the next model is really bad…
Am I missing something here? Or is this just Anthropic shilling?
Can i shill my business on here too or will it get canned because i'm a nobody?
Without these premises, one could state that the 1996 Yugo was so damn good. I mean, it was better than a horse.
> I find Claude Code objectively less annoying to use compared to Cursor, or Github Copilot agents even with the same underlying model! What makes it so damn good?
It's Coke vs. Pepsi.
It dumps out a JSON file as well as a very nicely formatted HTML file that shows you every single tool and all the prompts that were used for a session.
You can see the system prompts too.
It's all how the base model has been trained to break tasks into discrete steps and work through them patiently, with some robustness to failure cases.
That repository does not contain the code. It's just used for the issue tracker and some example hooks.
Anywhere to check?
What AI can definitely not do is launch or sell anything.
I can write some arbitrary SaaS in a few hours with my own framework, too - and know it's much more secure than anything written by AI. I also know how to launch it. (I'm not so good at the "selling" part).
But if anyone can do all of this - including the launching the selling - then they would not be selling themselves on Reddit or Youtube. Once you see someone explaining to you how to get rich quickly, you must assume that they have failed or else they would not be wasting their time trying to sell you something. And from that you should deduce that it's not wise to take their advice.
A few takeaways for me from this (1) Long prompts are good - and don't forget basic things like explaining in the prompt what the tool is, how to help the user, etc (2) Tool calling is basic af; you need more context (when to use, when not to use, etc) (3) Using messages as the state of the memory for the system is OK; i've thought about fancy ways (e.g., persisting dataframes, parsing variables between steps, etc, but seems like as context windows grow, messages should be ok)
for context, i want to build a claude code like agent in a WYSIWYG markdown app. that's how i stumbled on your blog post :)
In any case, I wish it would stay alive and help me lol.
It makes me think that the language/platform/architecture that is "most known" by LLMs will soon be the preferred -- sort of a homogenization of technologies by LLM usage. Because if you can be 10x as successfully vibey in, say, nodejs versus elixir or go -- well, why would you opt for those in a greenfield project at all? Particularly if you aren't a tech shop and that choice allows you to use junior coders as if they were midlevel or senior.
I hear people say things like, “AI isn’t coming for my job because LLMs suck at [language or tech stack]!”
And I wonder, does that just mean that other stacks have an advantage? If a senior engineer with Claude Code can solve the problem in Python/TypeScript in significantly less time than you can solve it in [tech stack] then are you really safe? Maybe you still stack up well against your coworkers, but how well does your company stack up against the competition?
And then the even more distressing thought accompanies it: I don’t like the code that LLMs produce because it looks nothing like the code I write by hand. But how relevant is my handwritten code becoming in a world where I can move 5x faster with coding agents? Is this… shitty style of LLM generated code actually easier for code agents to understand?
Like I said, I don’t endorse either of these ideas. They’re just questions that make me uncomfortable because I can’t definitively answer them right now.
Dead Comment
I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship
My tactic is to work with Gemini to build a dense summary of the project and create a high level plan of action, then take that to gpt5 and have it try to improve the plan, and convert it to a hyper detailed workflow xml document laying out all the steps to implement the plan, which I then hand to claude.
This avoids pretty much all of Claude's unplanned bumbling.
For the command line tool (claude code vs gemini code)? It isn't even close. Gemini code was useless. Claude code was mostly just slow.
[1] https://github.com/lifthrasiir/angel
I think Claude is much more predictable and follows instructions better- the todo list it manages seems very helpful in this respect.
That's been my experience, anyway. Maybe it hates me? I sure hate it.
If you use GitHub Copilot - which has its own system level prompts - you can hotswap between models, and Claude outperforms OpenAI’s and Google’s models by such a large margin that the others are functionally useless in comparison.
With a subscription plan, Anthropic is highly incentivized to be efficient in their loops beyond just making it a better experience for users.
Try using opus with cline in vs code. Then use Claude code.
I don't know the best way to quantify the differences, but I know I get more done in CC.
Because it's twice the price and doesn't even have a trial.
I feel like if it were a game changer, like Cursor once was vs Ask mode with GPT, it would be worth it, but CoPilot has come a long way and the only up-to-date comparisons I've read point to it being marginally better or the same, but twice the price.
Deleted Comment
Had a similar problems until I saw the advice "Dont say what it shouldn't but focus on what it should".
i.e. make sure when it reaches for the 'thing', it has the alternative in context.
Haven't had those problems since then.