Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.
The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.
I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).
The recent Kimi-K2 supposedly works great.
I sympathize with both experiences and have had both. But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:
* what kind of codebase you were working on (language, tech stack, business domain, size, age, level of cleanliness, number of contributors)
* what exactly you were trying to do
* how much experience you have with the AI tool
* is your tool set up so it can get a feedback loop from changes, e.g. by running tests
* how much prompting did you give it; do you have CLAUDE.me files in your codebase
and so on.
As others pointed out, TFA also has the problem of not being specific about most of this.
We are still learning as an industry how to use these tools best. Yes, we know they work really well for some people and others have bad experiences. Let's try and move the discussion beyond that!
For context, I was using Claude Code on a Ruby + Typescript large open source codebase. 50M+ tokens. They had specs and e2e tests so yeah I did have feedback when I was done with a feature - I could run specs and Claude Code could form a loop. I would usually advise it to fix specs one by one. --fail-fast to find errors fast.
Prior to Claude Code, I have been using Cursor for an year or so.
Sonnet is particularly good at NextJS and Typescript stuff. I also ran this on a medium sized Python codebase and some ML related work too (ranging from langchain to Pytorch lol)
I don't do a lot of prompting, just enough to describe my problem clearly. I try my best to identify the relevant context or direct the model to find it fast.
I made new claude.md files.