Bad actors will now be working to scam users' LLMs rather than the users themselves. You can use more LLMs to monitor the LLMs and try and protect them, but it's turtles all the way down.
The difference: when someone loses their $$$, they're not a fool for falling for some Nigerian Prince wire scam themselves, they're just a fool for using your browser.
Or am I missing something?
The most credible pattern I've seen for that comes from the DeepMind CaMeL paper - I would love to see a browser agent that robustly implemented those ideas: https://simonwillison.net/2025/Apr/11/camel/
We're exploring taking the action plan that a reasoning model (which sees both trusted and untrusted text) comes up with and passing it to a second model, which doesn't see the untrusted text and which then evaluates it.
> The most credible pattern I've seen for that comes from the DeepMind CaMeL paper
Yeah we're aware of the CaMeL paper and are looking into it, but it's definitely challenging from an implementation pov.
Also, I see that we said "The browser should clearly separate the user’s instructions from the website’s contents when sending them as context to the model" in the blog post. That should have been "backend", not "model". Agreed that once you feed both trusted and untrusted tokens into the LLM the output must be considered unsafe.