Deleted Comment
However, the researchers might be trying to point that out precisely -- that if autonomous agents can be baited to cheat then we should be careful about unleashing them upon the "real world" without some form of guarantees that one cannot bait them to break all the rules.
I don't think it is fearmongering -- if we are going to allow for a lot more "agency" to be made available to everyone on the planet, we should have some form of a protocol that ensures that we all get to opt-in.
I'm dubious that in the messy real world, humans will be able to enumerate every single possible misaligned action in a prompt.
This is a pure fearmongering article and I would not call this research in any measure of the word.
I’m shocked Times wrote this article and it illustrates how ridiculous some players like Pallisade Research in the “AI Safety” cabal act to get public attention. Pure fearmongering.
This is a pure fearmongering article and I would not call this research in any measure of the word.
I’m shocked Times wrote this article and it illustrates how ridiculous some players like Pallisade Research in the “AI Safety” cabal act to get public attention. Pure fearmongering.
btw i recently asked gpt this exact same question posed by op!, was quite the diplomatic response i got.
he was wrong about many DL paradigms and didn't contribute in any way to the advances that brought us LLMs for at least the last decade, but now since he won the Nobel (undeservedly imo) his wrong opinions get publicity and misinform the public and decision makers.
i think it's the mark of an intellectual to recognize when the world has moved on so far that your idea of it is outdated and wrong. he missed that mark.
The CogniLoad benchmark does this as well (in addition to scaling reasoning length and distractor ratio). Requiring the LLM to purely reason based on what is in the context (i.e. not based on the information its pretrained on), it finds that reasoning performance decreases significantly as problems get harder (i.e. require the LLM to hold more information in its hidden state simultaneously), but the bigger challenge for them is length.
https://arxiv.org/abs/2509.18458
Disclaimer: I'm the primary author of CogniLoad so feel free to ask me any questions.