I don't think this is true, in the general case: Rust has shown that languages can be safe in ways that improve runtime performance.
In particular, languages like Rust allow programmers to express stronger compile-time constraints on runtime behavior, meaning that the compiler can safely omit bounds and other checks that an ordinary C program would require for safety. Similarly, Rust's (lack of) mutable aliasing opens up entire classes of optimizations that are extremely difficult on C programs (to the extent that Rust regularly exposes bugs in LLVM's alias analysis, due to a lack of exercise on C/C++ inputs).
Edit: Other examples include ergonomic static dispatch (Rust makes things like `foo: impl Trait` look dynamic, but they're really static under the hood) and the entire notion of a "zero-cost abstraction" (Rust's abstractions are no worse than their "as if" equivalent, meaning that the programmer is restricted in their ability to create suboptimal implementations).
CoT improves results, sure. And part of that is probably because you are telling the LLM to add more things to the context window, which increases the potential of resolving some syllogism in the training data: One inference cycle tells you that "man" has something to do with "mortal" and "Socrates" has something to do with "man", but two cycles will spit those both into the context window and lets you get statistically closer to "Socrates" having something to do with "mortal". But given that the training/RLHF for CoT revolves around generating long chains of human-readable "steps", it can't really be explanatory for a process which is essentially statistical.
This is false, reasoning models are rewarded/punished based on performance at verifiable tasks, not human feedback or next-token prediction.