Regardless if you believe LLMs are probabilistic or not, I think what we are both saying is context is king and what it (LLM) says is dictated by the context (either through training) or introduced by the user.
My impression is that LLMs predict the next token based on the prior context. They do that by having learned a probability distribution from tokens -> next-token.
Then as I understand, the models are never reasoning about the problem, but always about what the next token should be given the context.
The chain of thought is just rewarding them so that the next token isn't predicting the token of the final answer directly, but instead predicting the token of the reasoning to the solution.
Since human language in the dataset contains text that describes many concepts and offers many solutions to problems. It turns out that predicting the text that describes the solution to a problem often ends up being the correct solution to the problem. That this was true was kind of a lucky accident and is where all the "intelligence" comes from.
If I'm writing code for production systems using LLMs I still review every single line - my personal rule is I need to be able to explain how it works to someone else before I'm willing to commit it.
I wrote a whole lot more about my approach to using LLMs to help write "real" code here: https://simonwillison.net/2025/Mar/11/using-llms-for-code/