Do we know if thinking was on high effort? I've found it sometimes overthinks on high, so I tend to run on medium.
it was on "max"
we see neither the conversation or any of the accompanying files the LLM is reading.
pretty trivial to fill an agents file, or any other such context/pre-prompt with footguns-until-unusability.
Deleted Comment