Benjammer (u/Benjammer)

Benjammer commented on LLMs and coding agents are a security nightmare garymarcus.substack.com/p... · Posted by u/flail

bpt3 · 15 days ago

It sounds like you can create and release high quality software with or without an agent.

What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?

People aren't concerned about you using agents, they're concerned about the second case I described.

Benjammer · 14 days ago

Are you unaware of the concept of a junior engineer working in a company? You realize that not all human code is written by someone with domain expertise, right?

Are you aware that your wording here is implying that you are describing a unique issue with AI code that is not present in human code?

>What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?

So, we're talking about two variables, so four states: human-reviewed, human-not-reviewed, ai-reviewed, ai-not-reviewed.

[non ai]

*human-reviewed*: Humans write code, sometimes humans make mistakes, so we have other humans review the code for things like critical security issues

*human-not-reviewed*: Maybe this is a project with a solo developer and automated testing, but otherwise this seems like a pretty bad idea, right? This is the classic version of "YOLO to production", right?

[with ai]

*ai-reviewed*: AI generates code, sometimes AI hallucinates or gets things very wrong or over-engineers things, so we have humans review all the code for things like critical security issues

*ai-not-reviewed*: AI generates code, YOLO to prod, no human reads it - obviously this is terrible and barely works even for hobby projects with a solo developer and no stakes involved

I'm wondering if the disconnect here is that actual professional programmers are just implicitly talking about going from [human-reviewed] to [ai-reviewed], assuming nobody in their right mind would just _skip code reviews_. The median professional software team would never build software without code reviews, imo.

But are you thinking about this as going from [human-reviewed] straight to [ai-not-reviewed]? Or are you thinking about [human-not-reviewed] code for some reason? I guess it's not clear why you immediately latch onto the problems with [ai-not-reviewed] and seem to refuse to acknowledge the validity of the state [ai-reviewed] as being something that's possible?

It's just really unclear why you are jumping straight to concerns like this without any nuance for how the existing industry works regarding similar problems before we used AI at all.

Benjammer commented on LLMs and coding agents are a security nightmare garymarcus.substack.com/p... · Posted by u/flail

bpt3 · 15 days ago

It's not bullshit. LLMs lower the bar for developers, and increase velocity.

Increasing the quantity of something that is already an issue without automation involved will cause more issues.

That's not moving the goalposts, it's pointing out something that should be obvious to someone with domain experience.

Benjammer · 15 days ago

Why is the "threshold" argument never the first thing mentioned? Do you not understand what I'm saying here? Can you explain why the "code slop" argument is _always_ the first thing that people mention, without discussing this threshold?

Every post like this has a tone like they are describing a new phenomenon caused by AI, but it's just a normal professional code quality problem that has always existed.

Consider the difference between these two:

1. AI allows programmers to write sloppy code and commit things without fully checking/testing their code

2. AI greatly increases the speed at which code can be generated, but doesn't nearly improve as much the speed of reviewing code, so we're making software harder to verify

The second is a more accurate picture of what's happening, but comes off much less sensational in a social media post. When people post the 1st example, I discredit them immediately for trying to fear-monger and bait engagement rather than discussing the real problems with AI programming and how to prevent/solve them.

Benjammer commented on LLMs and coding agents are a security nightmare garymarcus.substack.com/p... · Posted by u/flail

diggan · 15 days ago

> might ok a code change they shouldn’t have

Is the argument that developers who are less experience/in a hurry, will just accept whatever they're handed? In that case, this would be as true for random people submitting malicious PRs that someone accepts without reading, even without an LLM involved at all? Seems like an odd thing to call a "security nightmare".

Benjammer · 15 days ago

This is the common refrain from the anti-AI crowd, they start by talking about an entire class of problems that already exist in humans-only software engineering, without any context or caveats. And then, when someone points out these problems exist with humans too, they move the goalposts and make it about the "volume" of code and how AI is taking us across some threshold where everything will fall apart.

The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.

Benjammer commented on Search all text in New York City alltext.nyc/... · Posted by u/Kortaggio

daemonologist · 20 days ago

This is exceedingly fun.

A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.

Benjammer · 20 days ago

I found "intertwining" with a score of 3 also. Two instances of the word on the same sign and then a false positive third pic.

Benjammer commented on Nobody knows how to build with AI yet worksonmymachine.substack... · Posted by u/Stwerner

vishvananda · a month ago

I'm really baffled why the coding interfaces have not implemented a locking feature for some code. It seems like an obvious feature to be able to select a section of your code and tell the agent not to modify it. This could remove a whole class of problems where the agent tries to change tests to match the code or removes key functionality.

One could even imagine going a step further and having a confidence level associated with different parts of the code, that would help the LLM concentrate changes on the areas that you're less sure about.

Benjammer · a month ago

Why are engineers so obstinate about this stuff? You really need a GUI built for you in order to do this? You can't take the time to just type up this instruction to the LLM? Do you realize that's possible? You can just write instructions "Don't modify XYZ.ts file under any circumstances". Not to mention all the tools have simple hotkeys to dismiss changes for an entire file with the press of a button if you really want to ignore changes to a file or whatever. In Cursor you can literally select a block of text and press a hotkey to "highlight" that code to the LLM in the chat, and you could absolutely tell it "READ BUT DON'T TOUCH THIS CODE" or something, directly tied to specific lines of code, literally the feature you are describing. BUT, you have to work with the LLM and tooling, it's not just going to be a button for you or something.

You can also literally do exactly what you said with "going a step further".

Open Claude Code, run `/init`. Download Superwhisper, open a new file at project root called BRAIN_DUMP.md, put your cursor in the file, activate Superwhisper, talk in stream of consciousness-style about all the parts of the code and your own confidence level, with any details you want to include. Go to your LLM chat, tell it to "Read file @BRAIN_DUMP.md" and organize all the contents into your own new file CODE_CONFIDENCE.md. Tell it to list the parts of the code base and give it's best assessment of the developer's confidence in that part of the code, given the details and tone in the brain dump for each part. Delete the brain dump file if you want. Now you literally have what you asked for, an "index" of sorts for your LLM that tells it the parts of the codebase and developer confidence/stability/etc. Now you can just refer to that file in your project prompting.

Please, everyone, for the love of god, just start prompting. Instead of posting on hacker news or reddit about your skepticism, literally talk to the LLM about it and ask it questions, it can help you work through almost any of this stuff people rant about.

Benjammer commented on Nobody knows how to build with AI yet worksonmymachine.substack... · Posted by u/Stwerner

bloppe · a month ago

An I the only one who has to constantly tell Claude and Gemini to stop making edits to my codebase because they keep messing things up and breaking the build like ten times in a row, duplicating logic everywhere, etc? I keep hearing about how impressive agents are. I wish they could automate me out of my job faster

Benjammer · a month ago

Are you paying for the higher end models? Do you have proper system prompts and guidance in place for proper prompt engineering? Have you started to practice any auxiliary forms of context engineering?

This isn't a magic code genie, it's a very complicated and very powerful new tool that you need to practice using over time in order to get good results from.

Benjammer commented on Claude for Financial Services anthropic.com/news/claude... · Posted by u/mildlyhostileux

injidup · 2 months ago

As my father always told me. Anyone selling you a system to win at the casino/racetrack/stock exchange is a scammer. If the system actually worked then the system would not be for sale.

Benjammer · 2 months ago

This isn't a financial model, they aren't selling the system itself, it's all tooling for data access and financial modeling. It's like they're setting up an OTB, not like they're selling you a system to pick winning horses at the track.

Benjammer commented on Curtis Yarvin's Plot Against America newyorker.com/magazine/20... · Posted by u/bitsavers

sitkack · 3 months ago

> He's foppish and playful more than he is scary.

This is a very unwise stance to take. Peter Thiel has teamed up with the Heritage Foundation to implement this plan. This is why A16Z and Musk put Trump in power, it is precisely to implement this plan.

Benjammer · 3 months ago

The fact that Thiel backs him so hard is what worries me more than anything. Thiel has a way of making things happen when he's really committed to something on a personal level... (see the Gawker Media case)