Benjammer (u/Benjammer)

Benjammer commented on A staff engineer's journey with Claude Code sanity.io/blog/first-atte... · Posted by u/kmelve

lkjdsklf · 5 hours ago

The problem is, by the time you’ve gone through the process of making a granular plan and all that, you’ve lost all productivity gains of using the agent.

As an engineer, especially as you get more experience, you can kind of visualize the plan for a change very quickly and flesh out the next step while implementing the current step

All you have really accomplished with the kind of process described is make the worlds least precise, most verbose programming language

Benjammer · 3 hours ago

I'm not sure how much experience you have, I'm not trying to make assumptions, but I've been working in software over 15 years. The exact skill you mentioned - can visualize the plan for a change quickly - is what makes my LLM usage so powerful, imo.

I can say the right precise wording in my prompt to guide it to a good plan very quickly. As the other commenter mentioned, the entire above process only takes something like 30-120 minutes depending on scope, and then I can generate code in a few minutes that would take 2-6 weeks to write myself, working 8 hr days. Then, it takes something like 0.5-1.5 days to work out all the bugs and clean up the weird AI quirks and maybe have the LLM write some playwright tests or whatever testing framework you use for integration tests to verify it's own work.

So yes, it takes significant time to plan things well for good results, and yes the results are often sloppy in some parts and have weird quirks that no human engineer would make on purpose, but if you stick to working on prompt/context engineering and getting better and faster at the above process, the key unlock is not that it just does the same coding for you, with it generating the code instead. It's that you can work as a solo developer at the abstraction level of a small startup company. I can design and implement an enterprise grade SSO auth system over a weekend that integrates with Okta and passes security testing. I can take a library written in one language and fully re-implement it in another language in a matter of hours. I recently took the native libraries for Android and iOS for a fairly large, non-trivial SDK, and had Claude build me a React Native wrapper library with native modules that integrates both natives libraries and presents a clean, unified interface and typescript types to the react native layer. This took me about two days, plus one more for validation testing. I have never done this before. I have no idea how "Nitro Modules" works, or how to configure a react native library from scratch. But given the immense scaffolding abilities of LLMs, plus my debugging/hacking skills, I can get to a really confident place, really quickly and ship production code at work with this process, regularly.

Benjammer commented on A staff engineer's journey with Claude Code sanity.io/blog/first-atte... · Posted by u/kmelve

MikeTheGreat · 6 hours ago

Genuine question: What do you mean by " ask it to implement the plan in small steps"?

One option is to write "Please implement this change in small steps?" more-or-less exactly

Another option is to figure out the steps and then ask it "Please figure this out in small steps. The first step is to add code to the parser so that it handles the first new XML element I'm interested in, please do this by making the change X, we'll get to Y and Z later"

I'm sure there's other options, too.

Benjammer · 6 hours ago

My method is that I work together with the LLM to figure out the step-by-step plan.

I give an outline of what I want to do, and give some breadcrumbs for any relevant existing files that are related in some way, ask it to figure out context for my change and to write up a summary of the full scope of the change we're making, including an index of file paths to all relevant files with a very concise blurb about what each file does/contains, and then also to produce a step-by-step plan at the end. I generally always have to tell it to NOT think about this like a traditional engineering team plan, this is a senior engineer and LLM code agent working together, think only about technical architecture, otherwise you get "phase 1 (1-2 weeks), phase 2 (2-4 weeks), step a (4-8 hours)" sort of nonsense timelines in your plan. Then I review the steps myself to make sure they are coherent and make sense, and I poke and prod the LLM to fix anything that seems weird, either fixing context or directions or whatever. Then I feed the entire document to another clean context window (or two or three) and ask it to "evaluate this plan for cohesiveness and coherency, tell me if it's ready for engineering or if there's anything underspecified or unclear" and iterate on that like 1-3 times until I run a fresh context window and it says "This plan looks great, it's well crafted, organized, etc...." and doesn't give feedback. Then I go to a fresh context window and tell it "Review the document @MY_PLAN.md thoroughly and begin implementation of step 1, stop after step 1 before doing step 2" and I start working through the steps with it.

Benjammer commented on LLMs and coding agents are a security nightmare garymarcus.substack.com/p... · Posted by u/flail

bpt3 · 16 days ago

It sounds like you can create and release high quality software with or without an agent.

What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?

People aren't concerned about you using agents, they're concerned about the second case I described.

Benjammer · 15 days ago

Are you unaware of the concept of a junior engineer working in a company? You realize that not all human code is written by someone with domain expertise, right?

Are you aware that your wording here is implying that you are describing a unique issue with AI code that is not present in human code?

>What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?

So, we're talking about two variables, so four states: human-reviewed, human-not-reviewed, ai-reviewed, ai-not-reviewed.

[non ai]

*human-reviewed*: Humans write code, sometimes humans make mistakes, so we have other humans review the code for things like critical security issues

*human-not-reviewed*: Maybe this is a project with a solo developer and automated testing, but otherwise this seems like a pretty bad idea, right? This is the classic version of "YOLO to production", right?

[with ai]

*ai-reviewed*: AI generates code, sometimes AI hallucinates or gets things very wrong or over-engineers things, so we have humans review all the code for things like critical security issues

*ai-not-reviewed*: AI generates code, YOLO to prod, no human reads it - obviously this is terrible and barely works even for hobby projects with a solo developer and no stakes involved

I'm wondering if the disconnect here is that actual professional programmers are just implicitly talking about going from [human-reviewed] to [ai-reviewed], assuming nobody in their right mind would just _skip code reviews_. The median professional software team would never build software without code reviews, imo.

But are you thinking about this as going from [human-reviewed] straight to [ai-not-reviewed]? Or are you thinking about [human-not-reviewed] code for some reason? I guess it's not clear why you immediately latch onto the problems with [ai-not-reviewed] and seem to refuse to acknowledge the validity of the state [ai-reviewed] as being something that's possible?

It's just really unclear why you are jumping straight to concerns like this without any nuance for how the existing industry works regarding similar problems before we used AI at all.

Benjammer commented on LLMs and coding agents are a security nightmare garymarcus.substack.com/p... · Posted by u/flail

bpt3 · 16 days ago

It's not bullshit. LLMs lower the bar for developers, and increase velocity.

Increasing the quantity of something that is already an issue without automation involved will cause more issues.

That's not moving the goalposts, it's pointing out something that should be obvious to someone with domain experience.

Benjammer · 16 days ago

Why is the "threshold" argument never the first thing mentioned? Do you not understand what I'm saying here? Can you explain why the "code slop" argument is _always_ the first thing that people mention, without discussing this threshold?

Every post like this has a tone like they are describing a new phenomenon caused by AI, but it's just a normal professional code quality problem that has always existed.

Consider the difference between these two:

1. AI allows programmers to write sloppy code and commit things without fully checking/testing their code

2. AI greatly increases the speed at which code can be generated, but doesn't nearly improve as much the speed of reviewing code, so we're making software harder to verify

The second is a more accurate picture of what's happening, but comes off much less sensational in a social media post. When people post the 1st example, I discredit them immediately for trying to fear-monger and bait engagement rather than discussing the real problems with AI programming and how to prevent/solve them.

Benjammer commented on LLMs and coding agents are a security nightmare garymarcus.substack.com/p... · Posted by u/flail

diggan · 16 days ago

> might ok a code change they shouldn’t have

Is the argument that developers who are less experience/in a hurry, will just accept whatever they're handed? In that case, this would be as true for random people submitting malicious PRs that someone accepts without reading, even without an LLM involved at all? Seems like an odd thing to call a "security nightmare".

Benjammer · 16 days ago

This is the common refrain from the anti-AI crowd, they start by talking about an entire class of problems that already exist in humans-only software engineering, without any context or caveats. And then, when someone points out these problems exist with humans too, they move the goalposts and make it about the "volume" of code and how AI is taking us across some threshold where everything will fall apart.

The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.

Benjammer commented on Search all text in New York City alltext.nyc/... · Posted by u/Kortaggio

daemonologist · 21 days ago

This is exceedingly fun.

A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.

Benjammer · 21 days ago

I found "intertwining" with a score of 3 also. Two instances of the word on the same sign and then a false positive third pic.

Benjammer commented on Nobody knows how to build with AI yet worksonmymachine.substack... · Posted by u/Stwerner

vishvananda · a month ago

I'm really baffled why the coding interfaces have not implemented a locking feature for some code. It seems like an obvious feature to be able to select a section of your code and tell the agent not to modify it. This could remove a whole class of problems where the agent tries to change tests to match the code or removes key functionality.

One could even imagine going a step further and having a confidence level associated with different parts of the code, that would help the LLM concentrate changes on the areas that you're less sure about.

Benjammer · a month ago

Why are engineers so obstinate about this stuff? You really need a GUI built for you in order to do this? You can't take the time to just type up this instruction to the LLM? Do you realize that's possible? You can just write instructions "Don't modify XYZ.ts file under any circumstances". Not to mention all the tools have simple hotkeys to dismiss changes for an entire file with the press of a button if you really want to ignore changes to a file or whatever. In Cursor you can literally select a block of text and press a hotkey to "highlight" that code to the LLM in the chat, and you could absolutely tell it "READ BUT DON'T TOUCH THIS CODE" or something, directly tied to specific lines of code, literally the feature you are describing. BUT, you have to work with the LLM and tooling, it's not just going to be a button for you or something.

You can also literally do exactly what you said with "going a step further".

Open Claude Code, run `/init`. Download Superwhisper, open a new file at project root called BRAIN_DUMP.md, put your cursor in the file, activate Superwhisper, talk in stream of consciousness-style about all the parts of the code and your own confidence level, with any details you want to include. Go to your LLM chat, tell it to "Read file @BRAIN_DUMP.md" and organize all the contents into your own new file CODE_CONFIDENCE.md. Tell it to list the parts of the code base and give it's best assessment of the developer's confidence in that part of the code, given the details and tone in the brain dump for each part. Delete the brain dump file if you want. Now you literally have what you asked for, an "index" of sorts for your LLM that tells it the parts of the codebase and developer confidence/stability/etc. Now you can just refer to that file in your project prompting.

Please, everyone, for the love of god, just start prompting. Instead of posting on hacker news or reddit about your skepticism, literally talk to the LLM about it and ask it questions, it can help you work through almost any of this stuff people rant about.

Benjammer commented on Nobody knows how to build with AI yet worksonmymachine.substack... · Posted by u/Stwerner

bloppe · a month ago

An I the only one who has to constantly tell Claude and Gemini to stop making edits to my codebase because they keep messing things up and breaking the build like ten times in a row, duplicating logic everywhere, etc? I keep hearing about how impressive agents are. I wish they could automate me out of my job faster

Benjammer · a month ago

Are you paying for the higher end models? Do you have proper system prompts and guidance in place for proper prompt engineering? Have you started to practice any auxiliary forms of context engineering?

This isn't a magic code genie, it's a very complicated and very powerful new tool that you need to practice using over time in order to get good results from.