Put another way, LLMs are good at talking like they are thinking. That can get you pretty far, but it is not reasoning.
It's true that if it's not producing text, there is no thinking involved, but it is absolutely NOT clear that the attention block isn't holding state and modeling something as it works to produce text predictions. In fact, I can't think of a way to define it that would make that untrue... unless you mean that there isn't a system wherein something like attention is updating/computing and the model itself chooses when to make text predictions. That's by design, but what you're arguing doesn't really follow.
Now, whether what the model is thinking about inside that attention block matches up exactly or completely with the text it's producing as generated context is probably at least a little dubious, and its unlikely to be a complete representation regardless.
For anyone saying “just do server side,” no, it’s physically impossible to stop all cheating that way until we have internet faster than human perception.
I've seen videos where cheats are particularly easy to detect if you are also cheating. I.e. when you have all the information, you can start to see players reacting to other players before they should be able to detect them. So it should be possible to build a repertoire of cheating examples and clean examples using high level players to catch a fair amount of cheating behavior. And while I understand that there are ways to mitigate this and its an arms race, the less obvious the cheats are, the less effective they are, almost by definition.
If someone is consistently reacting outside the range of normal human reaction times, they're cheating. If they randomize it enough to be within human range, well, mission accomplished, kind of.
If they're reacting to other players in impossible ways by avoiding them or aiming toward them before they can be seen with unusual precision or frequency, they're cheating.
A lot of complex game dynamics can be simplified to 2D vectors and it shouldn't be that computationally intensive to process.