> Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough.
No, we never claimed or believe that those will be enough. Those are just easy things that browser vendors should be doing, and would have prevented this simple attack. These are necessary, not sufficient.
It kind of seems like the only way to make sure a model doesn’t get exploited and empty somebody’s bank account would be “We’re not building that feature at all. Agentic AI stuff is fundamentally incompatible with sensible security policies and practices, so we are not putting it in our software in any way”
defense in depth is to prevent one layer failure from getting to the next, you know, exploit chains etc. Failure in a layer is a failure, not statistically expected behavior. we fix bugs. what we need to do is treat llms as COMPLETELY UNTRUSTED user input as has been pointed out here and elsewhere time and again.
you reply to me like I need to be lectured, so consider me a dumb student in your security class. what am I missing here?
Yeah the tone of that response seems unnecessarily smug.
“I’m working on removing your front door and I’m designing a really good ‘no trespassing’ sign. Only a simpleton would question my reasoning on this issue”