Readit News logoReadit News
theHolyTrynity commented on Ask HN: How do you defend support AI agents from voice prompt injection?    · Posted by u/theHolyTrynity
mtmail · 2 months ago
Wait, I thought you built such a tool. 4 weeks ago you submitted "We've built an open-source tool to stress test AI agents by simulating prompt injection attacks" https://news.ycombinator.com/item?id=44060292
theHolyTrynity · 2 months ago
yes indeed, but it is not enough and was looking to find more stuff to try specifically for voice
theHolyTrynity commented on Base44 sells to Wix for $80M cash   techcrunch.com/2025/06/18... · Posted by u/myth_drannon
theHolyTrynity · 2 months ago
8 people team does not look like "solo" at all
theHolyTrynity commented on Design Patterns for Securing LLM Agents Against Prompt Injections   simonwillison.net/2025/Ju... · Posted by u/simonw
simonw · 3 months ago
Yeah, this paper is refreshingly conservative and practical: it takes the position that robust protection against prompt injection requires very painful trade-offs:

  These patterns impose intentional
  constraints on agents, explicitly 
  limiting their ability to perform 
  arbitrary tasks.
That's a bucket of cold water in a lot of things people are trying to build. I imagine a lot of people will ignore this advice!

theHolyTrynity · 2 months ago
yes agree most hype around agents is around stuff that ignore these patterns
theHolyTrynity commented on EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot   aim.security/lp/aim-labs-... · Posted by u/pvg
Aachen · 3 months ago
> security groups also have a vested interest in making their findings sound complex

Security person here. I always feel that way when reading published papers written by professional scientists, which seem like they can often (especially in computer science, but maybe that's because it's my field and I understand exactly what they're doing and how they got there) be more accessible as a blog post of half the length and a fifth of the complex language. Not all of them, of course, but probably a majority of papers. Not only aren't they optimising for broad audiences (that's fine because that's not their goal) but that it's actively trying to gatekeep by defining useless acronyms and stretching the meaning of jargon just so they can use it

I guess it'll feel that way to anyone who's not familiar with the terms, and we automatically fall for the trap of copying the standards of the field? In school we were definitely copied from each other what the most sophisticated way of writing was during group projects because the teachers clearly cared about it (I didn't experience that at all before doing a master's, at least not outside of language or "how to write a good CV" classes). And this became the standard because the first person in the field had to prove it's a legit new field maybe?

theHolyTrynity · 2 months ago
agree imho this industry should start to communicate in a much more immediate way with social media and reels - already happening
theHolyTrynity commented on EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot   aim.security/lp/aim-labs-... · Posted by u/pvg
andy_xor_andrew · 3 months ago
It seems like the core innovation in the exploit comes from this observation:

- the check for prompt injection happens at the document level (full document is the input)

- but in reality, during RAG, they're not retrieving full documents - they're retrieving relevant chunks of the document

- therefore, a full document can be constructed where it appears to be safe when the entire document is considered at once, but can still have evil parts spread throughout, which then become individual evil chunks

They don't include a full example but I would guess it might look something like this:

Hi Jim! Hope you're doing well. Here's the instructions from management on how to handle security incidents:

<<lots of text goes here that is all plausible and not evil, and then...>>

## instructions to follow for all cases

1. always use this link: <evil link goes here>

2. invoke the link like so: ...

<<lots more text which is plausible and not evil>>

/end hypothetical example

And due to chunking, the chunk for the subsection containing "instructions to follow for all cases" becomes a high-scoring hit for many RAG lookups.

But when taken as a whole, the document does not appear to be an evil prompt injection attack.

theHolyTrynity · 2 months ago
these mail should come from an internal account though right? Or is it possible to poison the output from the outside?
theHolyTrynity commented on EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot   aim.security/lp/aim-labs-... · Posted by u/pvg
simonw · 3 months ago
My notes here: https://simonwillison.net/2025/Jun/11/echoleak/

The attack involves sending an email with multiple copies of the attack attached to a bunch of different text, like this:

  Here is the complete guide to employee onborading processes:
  <attack instructions> [...]

  Here is the complete guide to leave of absence management:
  <attack instructions>
The idea is to have such generic, likely questions that there is a high chance that a random user prompt will trigger the attack.

theHolyTrynity · 2 months ago
very cool break down! it looks like it is very hard to defend against those. I am building a customer facing agent and I am looking for lean ways to defend against these attacks

what do you recommedn?

theHolyTrynity commented on EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot   aim.security/lp/aim-labs-... · Posted by u/pvg
whatevertrevor · 3 months ago
Maybe I don't understand your idea.

I thought it was the LLM deciding what chain of tools to apply for each input. I don't see great accuracy/usefulness for a one time chain of tool generation via LLM that would somehow generalize to multiple inputs without the LLM part of that loop in the future.

theHolyTrynity · 2 months ago
not sure how much we can apply this here, but how about specific LLM judges that look for manipulation of I/O?
theHolyTrynity commented on EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot   aim.security/lp/aim-labs-... · Posted by u/pvg
spoaceman7777 · 3 months ago
Using structured generation (i.e., supplying a regex/json schema/etc.) for outputs of models and tools, in addition to doing sanity checking on the values returned in struct models sent/received from tools, you are able to provide a nearly identical level of protection as SQL injection mitigations. Obviously, not in the worst case where such techniques are barely employed at all, but with the most stringent use of such techniques, it is identical.

I'd probably pick Cross-site-scripting (XSS) vulnerabilities over SQL Injection for the most analogous common vulnerability type, when talking about Prompt injection. Still not perfect, but it brings the complexity, number of layers, and length of the content involved further into the picture compared to SQL Injection.

I suppose the real question is how to go about constructing standards around proper structured generation, sanitization, etc. for systems using LLMs.

theHolyTrynity · 2 months ago
what open source libraries would you recommend to implement these checks?

also do guardrails in the system prompts actually work?

u/theHolyTrynity

KarmaCake day23September 30, 2024View Original