I’m just today after having my first real success with Claude (and generally with coding agents). I’ve played with Cursor in the past but am now trying Claude and others.
As mentioned in the article, the big trick is having clear specs. In my case I sat down for 2 hours and wrote a 12 step document on how I would implement this (along with background information). Claude went through step by step and wrote the code. I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc. and start adjusting and adding future functionality.
Its success was rooted in the fact I knew exactly how to do what it needed to do. I wrote out all the steps and it just followed my lead.
It makes it clear to me that mid and senior developers aren’t going anywhere.
That said, it was amazing to just see it go through the requirements and implement modules full of organised documented code that I didn’t have to write.
I get excellent results and don’t do anything like that. Basically I ask Claude to write code as I do. A small step at a time. I literally prompt it to do the next step I’d do and so on and so forth. I accept all changes immediate and then commit after every change and then review the diff. If Claude did some badness then I ask it to fix that. I typically also give references to existing code that I want it to model or functions to use.
This gives me excellent results with far less typing and time.
Sometimes I do OP’s approach, sometimes yours, but in all cases, writing down what you need done in detailed English gets me to a better understand of what the hell I’m even doing.
Even if I wrote the same prompts and specs and then typed everything myself, it would have already been an improvement.
I've come to the conclusion that the best use for "AI" it typing faster than me. I work at a place with a very well defined architecture so basically implementation is usually very straight forward, Claude can follow it because as you said, it's following a spec of sorts.
On the other hand, there has been quite a few moments in the last week where I'm actually starting to question if it's really faster. Some of the random mistakes can be major depending on how quickly it gets something wrong. I feel like a computer game, I need to save every time I make progress (commit my work).
Yeah. Read “Programming as Theory Building” by Naur [1] to understand why you need to still need to develop a theory of the problem and how to model it yourself lest the LLM concoct (an incorrect) one for you.
I don't know how many times now that I've seen these things claim to have run the code and show me the hallucinated output and then go on to develop an incorrect theory based on that hallucinated output.
We need to ask LLMs to generate documentation, diagrams, FAQs in addition to code then. We all know what this means: keeping them up to date.
Has anyone managed to setup a "reactive" way to interact with LLMs in a codebase, so that when an LLM extend or updates some part of the territory, it also extends or updates the map?
Amazing, the article is 40 years old and still totally relevant today. And even more amazing is that many of today's IT managers seem unaware of its points.
> As mentioned in the article, the big trick is having clear specs
I've been building a programming language using Claude, and this is my findings, too.
Which, after discovering this, makes sense. There are a LOT of small decisions that go into programming. Without detailed guidance, LLMs will end up making educated guesses for a lot of these decision, many of which will be incorrect. This creates a compounding effect where the net effect is a wrong solution.
Can you (or anyone) share an example of such a specification document? As an amateur programmer experimenting with CC, it would be very helpful to understand the nature and depth of the information that is helpful.
I have multiple system prompts that I use before getting to the actual specification.
1. I use the Socratic Coder[1] system prompt to have a back and forth conversation about the idea, which helps me hone the idea and improve it. This conversation forces me to think about several aspects of the idea and how to implement it.
2. I use the Brainstorm Specification[2] user prompt to turn that conversation into a specification.
3. I use the Brainstorm Critique[3] user prompt to critique that specification and find flaws in it which I might have missed.
4. I use a modified version of the Brainstorm Specification user prompt to refine the specification based on the critique and have a final version of the document, which I can either use on my own or feed to something like Claude Code for context.
Doing those things improved the quality of the code and work spit out by the LLMs I use by a significant amount, but more importantly, it helped me write much better code on my own because I know have something to guide me, while before I used to go blind.
As a bonus, it also helped me decide if an idea was worth it or not; there are times I'm talking with the LLM and it asks me questions I don't feel like answering, which tells me I'm probably not into that idea as much as I initially thought, it was just my ADHD hyper focusing on something.
Search Claude-code Planning mode. You can use claude to help you write specs. Many YouTube videos, as well. I think spec docs are pretty personal and project specific....
Step 1: back and forth chat about the functionality we want. What do we want it to do? What are the inputs and outputs? Then generate a spec/requirements sheet.
Step 2: identify what language, technologies, frameworks to use to accomplish the goal. Generate a technical spec.
Step 3: architecture. Get a layout of the different files that need to be created and a general outline of what each will do.
Step 4: combine your docs and tell it to write the code.
> That said, it was amazing to just see it go through the requirements and implement modules full of organised documented code that I didn’t have to write
Small side remark, but what is the value added of the AI generated documentation for the AI generated code. It's just a burden that increases context size whenever AI needs to re-analyse or change the existing code. It's not like any human is ever going to read the code docs, when he can just ask AI what it is about.
Leaving aside the value for humans, it's actually very valuable for the AI to provide indexed summary documents of what code goes where, what it does, and what patterns it uses, and what entry points and what it's API conventions are.
This is useful because if you just have Claude Code read all the code every time, it'll run out of context very quickly, whereas if you have a dozen 50 line files that summarize the 200-2000 lines of code they represent, they can always be fresh in context. Context management is king.
This is sort of like asking “why do pilots still perform manual takeoffs and landing even though full autopilot is possible?” It’s because autopilot is intended to help pilots, not replace them. Too much could go wrong in the real world. Having some skills that you practice daily is crucial to remaining a good pilot. Similarly, it’s probably good to write some code daily to keep skills sharp.
1) when your cloud LLM has an outage, your manager probably still expects you to be able to do your work for the most part. Not to go home because openai is down lol. You being productive as an engineer should not depend on the cloud working.
2) You may want to manually write code for certain parts of the project. Important functions, classes, modules, etc. Having good auto-generated docs is still useful when using a traditional IDE like IntelliJ, WebStorm, etc.
3) Code review. I’m assuming your team does code review as part of your SDLC??? Documentation can be helpful when reviewing code.
Examples. Examples are helpful for both humans and LLMs, especially if you have a custom framework or are using an unusual language. And I find I can generate ~10 good examples with LLMs in the time it would take me to generate ~3 good examples manually.
It's entirely possible that the parameters that get activated by comments in code are highly correlated with the parameters involved in producing good code.
I’m not sure I agree that I’ll never look at the code. I think it’s still important to know how the code is working for your own mental model of the app. So in this case I’ll be testing and reviewing everything to see how it’s implemented. With that in mind it’s useful for me as well as serving as context for the AI. That said, you may be right.
frequently your session/context may drop (e.g. claude crashes, or your internet dies, or your computer restarts, etc.). Claude does best when it can recover the context and understand the current situation from clear documentation, rather than trying to reverse engineer intent and structure from an existing code base. Also, the human frequently does read the code docs as there may be places where Claude gets stuck or doesn't do what you want, but a human can reason their way into success and unstick the obstacle.
I try to prompt-enforce no line by line documentation, but encourage function/class/module level documentation that will help future developers/AI coding agents. Humans are generally better, but AI sometimes needs a help to stop it not understanding a piece of code's context and just writing it's own new function that does the same thing
Doc strings within the code could be helpful for both humans and AI. Sometimes spoken word intent is easier to digest then code and help identify side effects for both human and AI.
After someone mentioned that recently I've started to write really detailed specs with the help of ChatGPT Deep Research and editing it myself. Then getting this exported as a Markdown document and passing it to Cursor really worked very well.
It puts you in a different mind space to sit down and think about it instead of iterating too much and in the end feeling productive while actually not achieving much and going mostly in circles.
after 30+years of engineering writing lots of specs is mostly a waste of time. The problem is more or less you don’t know enough. The trick is to write the smallest simplest version of whatever you are trying to achieve and then iterate on that. Be prepared to throw it out. The nice thing with Claude (or Gemini) is that it lets to do this really really quickly.
Complete agree. It’s a core skill of a good developer. What’s interesting is that in the past I’d have started this process but then jumped into coding prematurely. Now when you know you are using an agent, the more you write, the better the results.
I too just yesterday had my first positive experience with Claude writing code in my project. I used plan mode for the first time and gave it the "think harder" shove. It was a straightforward improvement but not trivial. The spec wasn't even very detailed- I mentioned a couple specific classes and the behaviour to change, and it wrote the code I would have expected to write, with even a bit more safety checking than I would have done.
I write out a document that explains what I want. Then I write stubs for the functions and classes or whatever. Every stub I write a docstring for what it’s supposed to do. Then I have Claude write unit tests for each stub one at a time. Then I have it write the functions one at a time. At some point I should just start writing the codes itself again. Haha.
> It makes it clear to me that mid and senior developers aren’t going anywhere.
I kinda feel like this is a self-placating statement that is not going to stay true for that long. We are so early in the process of developing AI good enough to do any of these things. Yes, right now you need senior level design skills and programming knowledge, but that doesn't mean that will stay true.
>I kinda feel like this is a self-placating statement that is not going to stay true for that long. We are so early in the process of developing AI good enough to do any of these things. Yes, right now you need senior level design skills and programming knowledge, but that doesn't mean that will stay true.
So you really think that in a few years some guy with no coding experience will ask the AI "Make me a GTA 6 clone that happens in Europe" and the AI will make actually make it, the code will just work and the performance will be excellent ?
The LLMs can't do that, they are attracted to solutions they seen in their training, this means sometimes they over complicate things, they do not see clever solutions, or apply theory and sometimes they are just stupid and hallucinate variable names and functions , like say 50% of the time it would use speed and 50% of the time it would use velocity and hte code will fail because undefined stuff.
I am not afraid of LLMs taking my job, I am afraid of bullshit marketing that convinces the CEO/management that if they buy me Claude then I must work 10x faster.
I think it can already replace mid-level engineers, based on my experience. Also, you really don't need meticulously crafted specs for this - I've completed multiple projects with Claude with loose specs, iterating in case the direction is not looking good. You can always esc-out in case you see it doing something you didn't wish for.
That's the way I'd used it, I've built a document with all the requirements and then gave it to CC. But it was not a final document, I had to go back and make some changes after experimenting with the code CC built.
I'm not gonna lie, that ~/.claude/CLAUDE.md is not going to work.
There are a lot of subjective, ambigous instructions that really won't affect what Claude writes. Remember it's not a human, it's not performing careful reasoning over each individual LOC.
As of today, you cannot squeeze a true rule system out of a single file given as context. Many of us have done this mistake at some point – believing that you can specify arbitrarily many rules and that they'll be honored.
If you really care about every such rule, you'd have to create sub-agents, one per rule, and make the agents a required part of a deterministic (non-AI orchestrated) pipeline. Then costs would explode of course.
You can slash the costs by using cheap LLMs once your workflow is stable (but pricey to run!). Fine-tuning, prompt optimization, special distillation techniques, this is a well covered area.
In my experience as an early adopter of both Cursor and CC, nothing. I don't have a CLAUDE.md.
My expectations have shifted from "magic black box" to "fancy autocomplete". i.e. CC is for me an autocomplete for specific intents, in small steps, prompted in specific terms. I do the thinking.
> A key is writing a clear spec ahead of time, which provides context to the agent as it works in the codebase.
Yeah, people say that. I even was sitting next to some 'expert' (not him saying; others saying) who told me this and we did a CC session with Opus 4 & Sonnet 4. He had this well written, clear spec. It really didn't do even an inch better than my adhoc shooting in features as they came to me in /clear contexts without CLAUDE.md. His workflow kept forgetting vital things (even though there are in the context doc), making up things that are NOT in the context doc and sometimes forbidden etc. While I just typed stuff like; now add a crud page for invoices, first study the codebase and got far better results. It is anecdotal obviously but I now managed to write 100+ projects with Claude and, outside hooks to prevent it from overstepping, I found no flow working better than another; people keep claiming it does, but when asked to 'show me', it turns out they are spending countless hours fixing completely wrong stuff EVEN when told explicitly NOT to do things like that in CLAUDE.md.
Thanks for sharing your experiences. That's about what I'd have expected.
I always found it very weird to see people declaring instructions to LLMs as if they were talking to a person. "Do this", "never do that", as if there was some kind of interpreter behind that built a ruleset to follow. There isn't. It's all just context. Theory would suggest that in the rare cases that these instructions actually achieve the desired effect, they do so more coincidentally than anything else. Easy to see how it could work for something like "Write tests according to pattern X" because there are going to be examples of that around in the training data; highly unlikely that instructions like "don't repeat yourself" or "study the codebase first" would do anything reasonably effective.
My most valuable specs, when coding sans agent, is either a UI sketch, some transition diagrams for logic flows, or a few bullet points of business rules. Then coding becomes just a Zen activity. I can get most of it done in one go vibing to my favorite tunes. Then comes the tweaking phase where everything left unspecified get specified.
I still don't feel the need for an agent. The writings of the loose specs is either done offline on paper, through rounds of discussions with stakeholders, and/or with a lot of reading. When I'm hit with an error while coding, that's usually a signal that I don't know something and should probably stop to learn about it.
When it comes to tweaking, fast feedback is king. I know where the knobs are and checking the adjustment should be quick. So it's mostly tests, linting, or live editing environment.
I've been working with Claude Code daily for a month or so. It is quite excellent and better than the other agents I have used (Cursor, Q). This article has some good tips that echo some of the things I have learned.
Some additional thoughts:
- I like to start with an ideation session with Claude in the web console. I explain the goals of the project, work through high level domain modeling, and break the project down into milestones with a target releasable goal in mind. For a small project, this might be a couple hours of back and forth. The output of this is the first version of CLAUDE.md.
- Then I start the project with Claude Code, have it read my global CLAUDE.md and the project CLAUDE.md and start going. Each session begins this way.
- I have Claude Code update the project CLAUDE.md as it goes. I have it mark its progress through the plan as it goes. Usually, at the end of the session, I will have it rewrite a special section that contains its summary of the project, how it works, and how to navigate the code. I treat this like Claude's long term memory basically. I have found it helps a lot.
- Even with good guidelines, Claude seems to have a tendency to get ahead of itself. I like to keep it focused and build little increments as I would myself if it is something I care about. If its just some one off or prototype, I let it go crazy and churn whatever works.
> Does the $20 subscription hold a similar bang for your buck as cursor?
Not sure about cursor. But if you want to use Claude Code daily for more than 2-3hrs/day, the $20 plan will feel limiting
In my experience, the $100 plan is pretty good, although you still run into the rate limits if you use it for a long time everyday (especially if you use Opus, which seems to run out in the first 30min of usage)
Using claude code feels like pairing with another programmer. Cursor feels like a polished extension of the IDE. They are both good tools and easily worth $20/mo. I think Anthropic has a 7 day free trial going on. Worth trying it out.
No. If you want use Claude anything serious, you will need the 200/month subscription. I have tried them all and you will run out of Opus too quickly with the lesser ones on a daily basis.
Does anyone else find themselves starting projects that wouldn't otherwise be worth the time investment, while avoiding Claude Code for the tasks that actually have high priority?
Who has had success using Claude Code on features in older, bigger, messier projects?
Yes and yes. I find that you can really let it rip (vibe) on something greenfield, but you’ll have to take a more measured approach once something gets off the ground.
I use it daily on our 10yo production repo with success.
Absolutely. I only just started using Claude Code on Sunday and I tested it by taking a small project that I was toying with and extending it with lots of features that I had thought about adding but didn't have the time.
Then, I explored a product feature in an existing app of mine that I also had put off because I didn't feel it was worth spending several days exploring the idea. It's something that would've required me to look up tutorials and APIs on how to do some basic things and then write some golang code which I hadn't done in a while. With Claude Code, I was able to get a prototype of the idea from a client app and a golang service working within an hour!
Today I started prototyping yet another app idea I came up with yesterday. I started off doing the core of the prototype in a couple of hours by hand and then figured I'd pull Claude in to add features on top of it. I ended up spending several hours building this idea since I was making so much fantastic progress. It was genuinely addictive.
A few days ago I used it to help me explore how I should refactor a messy architecture I ended up with. I didn't initially consider it would even be useful at all but I was wowed by how it was able to understand the design I came up with and it gave me several starting points for a refactor. I ended up doing the refactor myself just because I really wanted to be sure I understood how it worked in case something went wrong. I suspect in a few weeks, I'll get used to just pairing with Claude on something like that.
That matches exactly my experience. Now there are a couple of prototypes to be finished, which still takes time. And higher priority tasks get delayed instead of sped up.
I highly recommend having fairly succinct project level CLAUDE.md files, and defer more things into sub-folders. Use the top level as a map. Then during your planning of a feature, it can reach into each folder as it sees fit to find useful context to build out your phased implementation plan. I have it use thinking mode to figure out the right set of context.
At the end of each phase, I ask claude to update my implementation plan with new context for a new instance of claude to pick it up. This way it propagates context forward, and then I can clear the context window to start fresh on the next phase.
I use Claude Code regularly and have been responsible for introducing colleagues to it. The consensus here seems to be that it’s the best coding agent out there. But since it’s the only coding agent I’ve used, when colleagues ask why it’s better than Cursor, Cline, GitHub Copilot, Gemini CLI, etc., I sometimes struggle to articulate reasons.
Claude Code power users, what would you say makes it superior to other agents?
Lots of signs point to a conclusion that the Opus and Sonnet models are fundamentally better at coding, tool usage, and general problem solving across long contexts. There is some kind of secret sauce in the way they train the models. Dario has mentioned in interviews that this strength is one of the company's closely guarded secrets.
And I don't think we have a great eval benchmark that exactly measures this capability yet. SWE Bench seems to be pretty good, but there's already a lot of anecdotal comments that Claude is still better at coding than GPT 5, despite having similar scores on SWE Bench.
I've been testing AI as a beta reader for >100k novels, and I can tell you with 100% certainty that Claude gets confused about things across long contexts much sooner than either O3/GPT5 or Gemini 2.5. In my experience Gemini 2.5 and O3/GPT5 run neck and neck until around 80-100k tokens, then Gemini 2.5 starts to pull ahead and by 150k tokens it's absolutely dominant. Claude is respectable but clearly in third place.
Yeah, agree that the benchmarks don't really seem to reflect the community consensus. I wonder if part of it is the better symbiosis between the agent (Claude Code) and the Opus and Sonnet models it uses, which supposedly are fine-tuned on Claude Code tool calls? But agree, there is probably some additional secret sauce in the training, perhaps to do with RL on multi-step problems...
Not a power user, but most recently I tried it out against Gemini and Claude produced something that compiled and almost worked - it was off in some specifics that I could easily tweak. The next thing I asked it (with slightly more detailed prompting) it more or less just nailed.
Meanwhile Gemini got itself stuck in a loop of compile/fail/try to fix/compile/fail again. Eventually it just gave up and said "I'm not able to figure this out". It does seem to have a kind of self-esteem problem in these scenarios, whereas Claude is more bullish on itself (maybe not always a good thing).
Claude seems to be the best at getting something that actually works. I do think Gemini will end up being tough competition, if nothing else because of the price, but Google really need a bit of a quality push on it. A free AI agent is worthless if it can't solve anything for me.
I mentioned this is another comment, but for me one of the big positives is nothing to do with the model, it’s the UI of how it presents itself.
I hated at first that it wasn’t like Cursor, sitting in the IDE. Then I realised I was using Cursor completely differently, using it often for small tasks where it’s only moderately helpful (refactoring, adding small functions, autocompleting)
With Claude I have to stop, think and plan before engaging with it, meaning it delivers much more impactful changes.
Put another way, it demands more from me meaning I treat it with more respect and get more out of it
This is a good point, the CLI kind of forces you to engage with the coding process through the eyes of the agent, rather than just treating it as “advanced autocomplete” in the IDE.
However, there are a lot of Claude Code clones out there now that are basically the same (Gemini CLI, Codex, now Cursor CLI etc.). Claude still seems to lead the pack, I think? Perhaps it’s some combination of better coding performance due to the underlying LLM (usually Sonnet 4) being fine-tuned on the agent tool calls, plus Claude is just a little more mature in terms of configuration options etc.?
CC is great but I prefer roo as I find it much easier to keep an eye on Claude’s work and guide (or cancel) it as it goes. You also have greater control over modes and which models you use but miss out on hooks and the secret sauce Anthropic has in it. Roo also more bugs.
Claude the model is good but not amazing, O3/GPT5/Gemini 2.5 are better in most ways IMO. The Claude model does seem to have been trained on tool use and agentic behavior more than other models though, so even though the raw benchmarks are worse, it's more performant when used for agentic tasks, at least in terms of not getting confused and making a mess.
The big thing with Claude Code seems to be agentic process they've baked into it.
I'm playing with Claude Code to build an ASCII factorio-like. I first had it write code without much code supervision. It quickly added most of the core features you'd expect (save/load, options, debug, building, map generation, building, belts, crafting, smart belt placing, QoL). Then I started fixing minor bugs and each time it would break something eg. tweaking movement broke belts. So I prompted it to add Playwright automation. Then it wasn't able to write good quality tests and have them all pass, the test were full of sleep calls, etc...
So I looked at the code more closely and it was using the React frontend and useEffect instead of a proper game engine. It's also not great at following hook rules and understanding their timing in advance scenarios. So now I'm prompting it to use a proper tick based game engine and rebuilding the game up, doing code reviews. It's going 'slower' now, but it's going much better now.
My goal is to make a Show HN post when I have a good demo.
Yep, human contribution is extremely valuable especially very early in before the AI has a skeleton it can work off of. You have to review those first few big refactors like a hawk. After that you can relax a bit.
It sounds like you implicitly delegated many important design decisions to claude? In my experience it helps to first discuss architecture and core components of the problem with Claude, then either tell it what to do for the high leverage decorations, or provide it with the relevant motivating context to allow it to make the right decisions itself.
As mentioned in the article, the big trick is having clear specs. In my case I sat down for 2 hours and wrote a 12 step document on how I would implement this (along with background information). Claude went through step by step and wrote the code. I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc. and start adjusting and adding future functionality.
Its success was rooted in the fact I knew exactly how to do what it needed to do. I wrote out all the steps and it just followed my lead.
It makes it clear to me that mid and senior developers aren’t going anywhere.
That said, it was amazing to just see it go through the requirements and implement modules full of organised documented code that I didn’t have to write.
This gives me excellent results with far less typing and time.
I dont want it to replace me, I replace reading the docs and googling or repetitive tasks.
Its a hit or miss sometimes but I get to review every snippet.
If I would generate alot of code at once like for a full project I would get brainfuck reviewing it.
I keep my normal development flow and iterate, no waterfall
Even if I wrote the same prompts and specs and then typed everything myself, it would have already been an improvement.
On the other hand, there has been quite a few moments in the last week where I'm actually starting to question if it's really faster. Some of the random mistakes can be major depending on how quickly it gets something wrong. I feel like a computer game, I need to save every time I make progress (commit my work).
I'm still on the fence about it honestly.
What’s the advantage here for you with a process like this?
In your flow you also have multiple review steps and corrections as well adds even more friction.
I can see the advantage in what parent is describing however.
[1] https://gwern.net/doc/cs/algorithm/1985-naur.pdf
Has anyone managed to setup a "reactive" way to interact with LLMs in a codebase, so that when an LLM extend or updates some part of the territory, it also extends or updates the map?
I've been building a programming language using Claude, and this is my findings, too.
Which, after discovering this, makes sense. There are a LOT of small decisions that go into programming. Without detailed guidance, LLMs will end up making educated guesses for a lot of these decision, many of which will be incorrect. This creates a compounding effect where the net effect is a wrong solution.
1. I use the Socratic Coder[1] system prompt to have a back and forth conversation about the idea, which helps me hone the idea and improve it. This conversation forces me to think about several aspects of the idea and how to implement it.
2. I use the Brainstorm Specification[2] user prompt to turn that conversation into a specification.
3. I use the Brainstorm Critique[3] user prompt to critique that specification and find flaws in it which I might have missed.
4. I use a modified version of the Brainstorm Specification user prompt to refine the specification based on the critique and have a final version of the document, which I can either use on my own or feed to something like Claude Code for context.
Doing those things improved the quality of the code and work spit out by the LLMs I use by a significant amount, but more importantly, it helped me write much better code on my own because I know have something to guide me, while before I used to go blind.
As a bonus, it also helped me decide if an idea was worth it or not; there are times I'm talking with the LLM and it asks me questions I don't feel like answering, which tells me I'm probably not into that idea as much as I initially thought, it was just my ADHD hyper focusing on something.
[1]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[2]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[3]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
http://pchristensen.com/blog/articles/first-impressions-of-v...
Step 1: back and forth chat about the functionality we want. What do we want it to do? What are the inputs and outputs? Then generate a spec/requirements sheet.
Step 2: identify what language, technologies, frameworks to use to accomplish the goal. Generate a technical spec.
Step 3: architecture. Get a layout of the different files that need to be created and a general outline of what each will do.
Step 4: combine your docs and tell it to write the code.
"Review <codebase> and create a spec for <algorithm/pattern/etc.>"
It gives you a good starting point to jump off from.
It will make you much bette at development to learn like a senior dev did today
Small side remark, but what is the value added of the AI generated documentation for the AI generated code. It's just a burden that increases context size whenever AI needs to re-analyse or change the existing code. It's not like any human is ever going to read the code docs, when he can just ask AI what it is about.
This is useful because if you just have Claude Code read all the code every time, it'll run out of context very quickly, whereas if you have a dozen 50 line files that summarize the 200-2000 lines of code they represent, they can always be fresh in context. Context management is king.
1) when your cloud LLM has an outage, your manager probably still expects you to be able to do your work for the most part. Not to go home because openai is down lol. You being productive as an engineer should not depend on the cloud working.
2) You may want to manually write code for certain parts of the project. Important functions, classes, modules, etc. Having good auto-generated docs is still useful when using a traditional IDE like IntelliJ, WebStorm, etc.
3) Code review. I’m assuming your team does code review as part of your SDLC??? Documentation can be helpful when reviewing code.
I try to prompt-enforce no line by line documentation, but encourage function/class/module level documentation that will help future developers/AI coding agents. Humans are generally better, but AI sometimes needs a help to stop it not understanding a piece of code's context and just writing it's own new function that does the same thing
It puts you in a different mind space to sit down and think about it instead of iterating too much and in the end feeling productive while actually not achieving much and going mostly in circles.
The parent wrote:
>I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc.
Guessing the time saved prior to reviewing and testing seems premature fron my end.
I kinda feel like this is a self-placating statement that is not going to stay true for that long. We are so early in the process of developing AI good enough to do any of these things. Yes, right now you need senior level design skills and programming knowledge, but that doesn't mean that will stay true.
So you really think that in a few years some guy with no coding experience will ask the AI "Make me a GTA 6 clone that happens in Europe" and the AI will make actually make it, the code will just work and the performance will be excellent ?
The LLMs can't do that, they are attracted to solutions they seen in their training, this means sometimes they over complicate things, they do not see clever solutions, or apply theory and sometimes they are just stupid and hallucinate variable names and functions , like say 50% of the time it would use speed and 50% of the time it would use velocity and hte code will fail because undefined stuff.
I am not afraid of LLMs taking my job, I am afraid of bullshit marketing that convinces the CEO/management that if they buy me Claude then I must work 10x faster.
That said, I think that the differing UIs of Cursor (in the IDE) and Claude (in the CLI) fundamentally change how you approach problems with them.
Cursor is “too available”. It’s right there and you can be lazy and just ask it anything.
Claude nudges you to think more deeply and construct longer prompts before engaging with it.
That my experience anyway
There are a lot of subjective, ambigous instructions that really won't affect what Claude writes. Remember it's not a human, it's not performing careful reasoning over each individual LOC.
Context rot is a thing (https://news.ycombinator.com/item?id=44564248 ).
As of today, you cannot squeeze a true rule system out of a single file given as context. Many of us have done this mistake at some point – believing that you can specify arbitrarily many rules and that they'll be honored.
If you really care about every such rule, you'd have to create sub-agents, one per rule, and make the agents a required part of a deterministic (non-AI orchestrated) pipeline. Then costs would explode of course.
You can slash the costs by using cheap LLMs once your workflow is stable (but pricey to run!). Fine-tuning, prompt optimization, special distillation techniques, this is a well covered area.
Sometimes I've wanted to implement it but I sense that someone else will sooner or later, putting in more resources than I could currently.
In the meantime I'm happy with vanilla CC usage.
My expectations have shifted from "magic black box" to "fancy autocomplete". i.e. CC is for me an autocomplete for specific intents, in small steps, prompted in specific terms. I do the thinking.
I do put effort in crafting good context though.
Yeah, people say that. I even was sitting next to some 'expert' (not him saying; others saying) who told me this and we did a CC session with Opus 4 & Sonnet 4. He had this well written, clear spec. It really didn't do even an inch better than my adhoc shooting in features as they came to me in /clear contexts without CLAUDE.md. His workflow kept forgetting vital things (even though there are in the context doc), making up things that are NOT in the context doc and sometimes forbidden etc. While I just typed stuff like; now add a crud page for invoices, first study the codebase and got far better results. It is anecdotal obviously but I now managed to write 100+ projects with Claude and, outside hooks to prevent it from overstepping, I found no flow working better than another; people keep claiming it does, but when asked to 'show me', it turns out they are spending countless hours fixing completely wrong stuff EVEN when told explicitly NOT to do things like that in CLAUDE.md.
I always found it very weird to see people declaring instructions to LLMs as if they were talking to a person. "Do this", "never do that", as if there was some kind of interpreter behind that built a ruleset to follow. There isn't. It's all just context. Theory would suggest that in the rare cases that these instructions actually achieve the desired effect, they do so more coincidentally than anything else. Easy to see how it could work for something like "Write tests according to pattern X" because there are going to be examples of that around in the training data; highly unlikely that instructions like "don't repeat yourself" or "study the codebase first" would do anything reasonably effective.
I still don't feel the need for an agent. The writings of the loose specs is either done offline on paper, through rounds of discussions with stakeholders, and/or with a lot of reading. When I'm hit with an error while coding, that's usually a signal that I don't know something and should probably stop to learn about it.
When it comes to tweaking, fast feedback is king. I know where the knobs are and checking the adjustment should be quick. So it's mostly tests, linting, or live editing environment.
Some additional thoughts:
- I like to start with an ideation session with Claude in the web console. I explain the goals of the project, work through high level domain modeling, and break the project down into milestones with a target releasable goal in mind. For a small project, this might be a couple hours of back and forth. The output of this is the first version of CLAUDE.md.
- Then I start the project with Claude Code, have it read my global CLAUDE.md and the project CLAUDE.md and start going. Each session begins this way.
- I have Claude Code update the project CLAUDE.md as it goes. I have it mark its progress through the plan as it goes. Usually, at the end of the session, I will have it rewrite a special section that contains its summary of the project, how it works, and how to navigate the code. I treat this like Claude's long term memory basically. I have found it helps a lot.
- Even with good guidelines, Claude seems to have a tendency to get ahead of itself. I like to keep it focused and build little increments as I would myself if it is something I care about. If its just some one off or prototype, I let it go crazy and churn whatever works.
I’m curious about the tool but I wonder if it requires more significant investment to be a daily driver.
Not sure about cursor. But if you want to use Claude Code daily for more than 2-3hrs/day, the $20 plan will feel limiting
In my experience, the $100 plan is pretty good, although you still run into the rate limits if you use it for a long time everyday (especially if you use Opus, which seems to run out in the first 30min of usage)
Who has had success using Claude Code on features in older, bigger, messier projects?
I use it daily on our 10yo production repo with success.
Then, I explored a product feature in an existing app of mine that I also had put off because I didn't feel it was worth spending several days exploring the idea. It's something that would've required me to look up tutorials and APIs on how to do some basic things and then write some golang code which I hadn't done in a while. With Claude Code, I was able to get a prototype of the idea from a client app and a golang service working within an hour!
Today I started prototyping yet another app idea I came up with yesterday. I started off doing the core of the prototype in a couple of hours by hand and then figured I'd pull Claude in to add features on top of it. I ended up spending several hours building this idea since I was making so much fantastic progress. It was genuinely addictive.
A few days ago I used it to help me explore how I should refactor a messy architecture I ended up with. I didn't initially consider it would even be useful at all but I was wowed by how it was able to understand the design I came up with and it gave me several starting points for a refactor. I ended up doing the refactor myself just because I really wanted to be sure I understood how it worked in case something went wrong. I suspect in a few weeks, I'll get used to just pairing with Claude on something like that.
At the end of each phase, I ask claude to update my implementation plan with new context for a new instance of claude to pick it up. This way it propagates context forward, and then I can clear the context window to start fresh on the next phase.
Claude Code power users, what would you say makes it superior to other agents?
And I don't think we have a great eval benchmark that exactly measures this capability yet. SWE Bench seems to be pretty good, but there's already a lot of anecdotal comments that Claude is still better at coding than GPT 5, despite having similar scores on SWE Bench.
https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/o...https://longbench2.github.io/
Meanwhile Gemini got itself stuck in a loop of compile/fail/try to fix/compile/fail again. Eventually it just gave up and said "I'm not able to figure this out". It does seem to have a kind of self-esteem problem in these scenarios, whereas Claude is more bullish on itself (maybe not always a good thing).
Claude seems to be the best at getting something that actually works. I do think Gemini will end up being tough competition, if nothing else because of the price, but Google really need a bit of a quality push on it. A free AI agent is worthless if it can't solve anything for me.
“I’m so stupid. I should be ashamed of myself. I’m such a loser. Idiot, idiot. Oh god I suck. I’m an embarrassment.”
The torture Google must RL on this model, man.
I hated at first that it wasn’t like Cursor, sitting in the IDE. Then I realised I was using Cursor completely differently, using it often for small tasks where it’s only moderately helpful (refactoring, adding small functions, autocompleting)
With Claude I have to stop, think and plan before engaging with it, meaning it delivers much more impactful changes.
Put another way, it demands more from me meaning I treat it with more respect and get more out of it
However, there are a lot of Claude Code clones out there now that are basically the same (Gemini CLI, Codex, now Cursor CLI etc.). Claude still seems to lead the pack, I think? Perhaps it’s some combination of better coding performance due to the underlying LLM (usually Sonnet 4) being fine-tuned on the agent tool calls, plus Claude is just a little more mature in terms of configuration options etc.?
The big thing with Claude Code seems to be agentic process they've baked into it.
So I looked at the code more closely and it was using the React frontend and useEffect instead of a proper game engine. It's also not great at following hook rules and understanding their timing in advance scenarios. So now I'm prompting it to use a proper tick based game engine and rebuilding the game up, doing code reviews. It's going 'slower' now, but it's going much better now.
My goal is to make a Show HN post when I have a good demo.