In the same way a LLM shouldn't have access to resources that shouldn't be directly accessible to the user. If the agent works on the user's data on the user's behalf (ex: vibe coding), then I don't consider jailbreaking to be a big problem. It could help write malware or things like that, but then again, it is not as if script kiddies couldn't work without AI.
Tricking it into writing malware isn't the big problem that I see.
It's things like prompt injections from fetching external URLs, it's going to be a major route for RCE attacks.
https://blog.trailofbits.com/2025/10/22/prompt-injection-to-...
There's plenty of things we should be doing to help mitigate these threats, but not all companies follow best practices when it comes to technology and security...
That LLMs don't give harmful information unsolicited, sure, but if you are jailbreaking, you are already dead set in getting that information and you will get it, there are so many ways: open uncensored models, search engines, Wikipedia, etc... LLM refusals are just a small bump.
For me they are just a fun hack more than anything else, I don't need a LLM to find how to hide a body. In fact I wouldn't trust the answer of a LLM, as I might get a completely wrong answer based on crime fiction, which I expect makes up most of its sources on these subjects. May be good for writing poetry about it though.
I think the risks are overstated by AI companies, the subtext being "our products are so powerful and effective that we need to protect them from misuse". Guess what, Wikipedia is full of "harmful" information and we don't see articles every day saying how terrible it is.
You have a customer facing LLM that has access to sensitive information.
You have an AI agent that can write and execute code.
Just image what you could do if you can bypass their safety mechanisms! Protecting LLMs from "social engineering" is going to be an important part of cybersecurity.
! [remote rejected] feature/ui-improvements -> feature/ui-improvements (Internal Server Error)
Edit: One minute after posting this it's working again.And yes, they're separate from the NTP Pool Project, which runs the actual servers, but the Network Time Foundation supports the software that billions of devices run on.
<script><a href="/honeypot">Click Here!</a></script>
It would fool the dumber web crawlers.