The key thing I learned was not to just tell Claude your repository and hope for the best. The raw approach consumes tokens incredibly fast, but Claude reads entire files when it only needs one feature, retries incorrect edits more than 5 times, and loses context halfway through.
What really works is giving Claude a CLAUDE.md file in the root of your repository with specific instructions for the workflow (which tools to prefer, when to compress or read raw, etc.). Claude Code reads it automatically upon login. Think of it as an .editorconfig file, but for AI behavior.
For the $25/PR review use case specifically, the bottleneck isn't Claude's intelligence, but the context window management. A repository of 500 files can exhaust the window before Claude finishes reviewing. You would need some kind of indexing layer that provides Claude only with the relevant snippets for each PR difference, not the entire codebase.
What kind of repositories do you have in mind? The approach varies greatly depending on the size, but I'd like to hear your thoughts.
My original question was more along the lines of implementing things like PR review yourself. I was tinkering with an internal service that spins up ephemeral CC instances to analyze PRs, but realized this can easily generalize across arbitrary tasks. Was curious what sort of things folks could use that for.
I consulted Claude chat and it admitted this as a major problem with Claude these days, and suggested that I should ask what are the coordinates of UI controls are on screenshot thus forcing it to look. So I did that next time, and it just gave me invented coordinates of objects on screenshot.
I consult Claude chat again, how else can I enforce it to actually look at screenshot. It said delegate to another “qa” agent that will only do one thing - look at screenshot and give the verdict.
I do that, next time again job done but on screenshot it’s not. Turns out agent did all as instructed, spawned an agent and QA agent inspected screenshot. But instead of taking that agents conclusion coder agent gave its own verdict that it’s done.
It will do anything- if you don’t mention any possible situation, it will find a “technicality” , a loophole that allows to declare job done no matter what.
And on top of it, if you develop for native macOS, There’s no official tooling for visual verification. It’s like 95% of development is web and LLM providers care only about that.