Readit News logoReadit News
stillpointlab · 3 days ago
I'm still calibrating myself on the size of task that I can get Claude Code to do before I have to intervene.

I call this problem the "goldilocks" problem. The task has to be large enough that it outweighs the time necessary to write out a sufficiently detailed specification AND to review and fix the output. It has to be small enough that Claude doesn't get overwhelmed.

The issue with this is, writing a "sufficiently detailed specification" is task dependent. Sometimes a single sentence is enough, other times a paragraph or two, sometimes a couple of pages is necessary. And the "review and fix" phase again is totally dependent and completely unknown. I can usually estimate the spec time but the review and fix phase is a dice roll dependent on the output of the agent.

And the "overwhelming" metric is again not clear. Sometimes Claude Code can crush significant tasks in one shot. Other times it can get stuck or lost. I haven't fully developed an intuition for this yet, how to differentiate these.

What I can say, this is an entirely new skill. It isn't like architecting large systems for human development. It isn't like programming. It is its own thing.

scuff3d · 3 days ago
This is why I'm still dubious about the overall productivity increase we'll see from AI once all the dust settles.

I think it's undeniable that in narrow well controlled use cases the AI does give you a bump. Once you move beyond that though the time you have to spend on cleanup starts to seriously eat into any efficiency gains.

And if you're in a domain you know very little about, I think any use case beyond helping you learn a little quicker is a net negative.

jmvldz · 3 days ago
"It isn't like programming. It is its own thing."

You articulated what I was wrestling with in the post perfectly.

bdangubic · 3 days ago
It isn't like programming. It is its own thing.

Absolutely. And what I find fascinating that this experience is highly personal. I read probably 876 different “How I code with LLMs” and I can honestly say not a single thing I read and tried (and I tried A LOT) “worked” for me…

rootnod3 · 3 days ago
According to most enthusiasts of LLM/agentic coding you are just doing it wrong then.
kace91 · 3 days ago
>I haven't fully developed an intuition for this yet, how to differentiate these.

The big issue is that, even though there is a logical side to it, part is adapting to a close system that can change under your feet. New model, new prompt, there goes your practice.

JeremyNT · 2 days ago
> What I can say, this is an entirely new skill. It isn't like architecting large systems for human development. It isn't like programming. It is its own thing.

It's management!

I find myself asking very similar questions to you: how much detail is too much? How likely is this to succeed without my assistance? If it does succeed, will I need to refactor? Am I wasting my time delegating or should I just do it?

It's almost identical to when I delegate a task to a junior... only the feedback cycle of "did I guess correctly here" is a lot faster... and unlike a junior, the AI will never get better from the experience.

rcarr · 3 days ago
For the longer ones, are you using AI to help you write the specs?
stillpointlab · 3 days ago
My experience is: AI written prompts are overly long and overly specific. I prefer to write the instructions myself and then direct the LLM to ask clarifying questions or provide an implementation plan. Depending on the size of change I go 1-3 rounds of clarifications until Claude indicates it is ready and provides a plan that I can review.

I do this in a task_descrtiption.md file and I include the clarifications in its own section (the files follow a task.template.md format).

tarruda · 3 days ago
This illustrates a fundamental truth of maintaining software with LLMs: While programmers can use LLMs to produce huge amounts of code in a short time, they still need to read and understand it. It is simply not possible to delegate understanding a huge codebase to an AI, at least not yet.

In my experience, the real "pain" of programming lies in forcing yourself to absorb a flood of information and connecting the dots. Writing code is, in many ways, like taking a walk: you engage in a cognitively light activity that lets ideas shuffle, settle, and mature in the background.

When LLMs write all the code for you, you lose that essential mental rest. The quiet moments where you internalize concepts, spot hidden bugs, and develop a mental map of the system.

danbmil99 · 3 days ago
Has anyone else had the experience of dreading a session with Claude, because his personality is often chirpy and annoying; he's always got positive things to say; and working with him as the main code author actually takes away one of the joys of being a programmer -- the ability to interact with a system that is _not_ like people -- it is rigid and deterministic, not all soft and mushy like human beings.

When I write piece of code that is elegant, efficient, and -- "right" -- I get a dopamine rush, like I finished a difficult crossword puzzle. Seems like that joy is going to go away, replaced by something more akin to developing a good relatioship with a slightly quirky colleague who happens to be real good (and fast) at some things -- especially things management likes, like N LOC per week -- but this colleague sucks up to everyone, always thinks they have the right answer, often seems to understand things on a superficial level, and oh -- works for $200 / month...

Shades of outsourcing to other continents...

qudat · 3 days ago
As a staff swe I spend way more time reading, understanding code, and then QAing features.

Writing code is my favorite part of the job, why would I outsource it so I can spend even more time reading and QAing?

h3lp · 2 days ago
That's a great insight---the problem with LLMs is that they write code and elegant prose for us, so we have more time to do chores. I want it the other way around!!!
jmvldz · 3 days ago
100% yes. QA'ing a bunch of LLM generated code feels like a mental flood. Losing that mental rest is a great way to put it.
CuriouslyC · 3 days ago
MCP up Playwright, have a detailed spec, and tell claude to generate a detailed test plan for every story in the spec, then keep iterating on a test -> fix -> ... loop until every single component has been fully tested. If you get claude to write all the components (usually by subfolder) out to todos, there's a good chance it'll go >1 hour before it tries to stop, and if you have an anti-stopping hook it can go quite a bit longer.
rstuart4133 · 3 days ago
Another way of saying thing is only an AI reviewer could cope with the flood of code an AI can produce.

But AI reviewers can do little beyond checking coding standards.

imiric · 3 days ago
Programming and vibe coding are two entirely separate disciplines. The process of producing software, and the end result, is wildly different between them.

People who vibe code don't care about the code, but about producing something that delivers value, whatever that may be. Code is just an intermediate artifact to achieve that goal. ML tools are great for this.

People who program care about the code. They want to understand how it works, what it does, in addition to whether it achieves what they need. They may also care about its quality, efficiency, maintainability, and other criteria. ML tools can be helpful for programming, but they're not a panacea. There is no shortcut for building robust, high quality software. A human still needs to understand whatever the tool produces, and ensure that it meets their quality criteria. Maybe this will change, and future generations of this tech will produce high quality software without hand-holding, but frankly, I wouldn't bet on the current approaches to get us there.

doctoboggan · 3 days ago
When building a project from scratch using AI, it can be tempting to give in to the vibe and ignore the structure/architecture and let it evolve naturally. This is a bad idea when humans do it, and it's also a bad idea when LLM agents do it. You have to be considering architecture, dataflow, etc from the beginning, and always stay on top of it without letting it drift.

I have tried READMEs scattered through the codebase but I still have trouble keeping the agent aware of the overall architecture we built.

nchmy · 3 days ago
This should be called the eternal, unbearable slowness of code review, because the author writes that the AI actually churns out code extremely rapidly. The (hopefully capable, attentive, careful) human is the bottleneck here, as it should be
JohnMakin · 3 days ago
If only code and application quality could be measured in LoC - middle managers everywhere would rejoice
jmvldz · 3 days ago
Ooh, that's a good title for another post! And yes, I agree with you.

Initially I would barely read any of the code generated and as my project has grown in size, I have approached the limits of that approach.

Often because Claude Code makes very poor architectural choices.

nchmy · 3 days ago
Welcome to vibe/agentic engineering
falcor84 · 3 days ago
> ... I’ll keep pulling PRs locally, adding more git hooks to enforce code quality, and zooming through coding tasks—only to realize ChatGPT and Claude hallucinated library features and I now have to rip out Clerk and implement GitHub OAuth from scratch.

I don't get this, how many git hooks do you need to identify that Claude had hallucinated a library feature? Wouldn't a single hook running your tests identify that?

sc68cal · 3 days ago
They probably don't have any tests, or the tests that the LLM creates are flawed and not detecting these problems
AstroBen · 3 days ago
Just tell the AI "and make sure you don't add bugs or break anything"

Works every time

manmal · 3 days ago
Yesterday Claude Code assured me the following:

• Good news! The code is compiling successfully (the errors shown are related to an existing macro issue, not our new code).

When infact, it managed to insert 10 compilation errors that were not at all related with any macros.

deegles · 3 days ago
I tried using agents in Cursor and when it runs into issues it will just rip out the offending code :)
jmvldz · 3 days ago
I don't have a ton of tests. From what I've seen, Claude will often just update the tests to no-op so tests passing isn't trustworthy.

My workflow is often to plan with ChatGPT and what I was getting at here is ChatGPT can often hallucinate features of 3rd party libraries. I usually dump the plan from ChatGPT straight into Claude Code and only look at the details when I'm testing.

That said, I've become more careful in auditing the plans so I don't run in to issues like this.

CuriouslyC · 3 days ago
Tell Claude to use a code review sub agent after every significant change set, tell them to run the tests and evaluate the change set, don't tell Claude it wrote the code, and give them strict review instructions. Works like a charm.
pluto_modadic · 3 days ago
AI agents have been known to rip out mocks so that the tests pass.
thrown-0825 · 3 days ago
I have had human devs do that too
loandbehold · 3 days ago
"hallucinated" library features are identified even earlier, when claude builds your project. i also don't get what author is talking about.
danbmil99 · 3 days ago
What bothers me is this: Claude & I work hard on a subtle issue; eventually (often after wiping Claude's memory clean and trying again) we collectively come to a solution that works.

But the insights gleaned from that battle are (for Claude) lost forever as soon as I start on a new task.

The way LLM's (fail to) handle memory and in-situ learning (beyond prompt engineering and working within the context window) is just clearly deficient compared to how human minds work.

chipsrafferty · 2 hours ago
The reason these tools haven't achieved greatness yet is because 99% of us are struggling at work with domain knowledge - how does this special project work in the frame of this company. If an AI tool is unable to "learn the ropes" at a specific company over time, they will never be better than a mid-senior developer on day 1 at the company. They NEED to be able to learn. They NEED to be able have long-term memory and to read entire codebases.
cruffle_duffle · 3 days ago
And the thing is all these “memory features” don’t help either because the “memory” is too specific either to the task at hand and not generalizable to all things, or it is time bound and therefore won’t be useful later (eg: “user is searching for a new waterbed with flow master manifolds”). And rarely can you directly edit the memory so you are stuck with a bunch of potential nonsense polluting your context (with little control when or why the memory is presented).

I dunno.

abrookewood · 3 days ago
Yes, it's a common problem. There are 'memory' plugins that you can use to collect insights and feed it back to the LLM, but I tend just to update an AGENTS.md file (or equivalent).
dwringer · 3 days ago
Slow is smooth, smooth is fast.
pipes · 3 days ago
I've no idea why, but the phrase "it's addicting" is really annoying, I'm pretty certain it should "it's addictive". I've started seeing it everywhere. (Note, I haven't completely lost my mind, it's in that article).
chipsrafferty · 2 hours ago
I would never say "it's addictive" in any context.
jmvldz · 3 days ago
Haha fair enough. Fixed!
pipes · 2 days ago
Ha! Thanks :)