I'm surprised more attention isn't paid to this research direction, that nobody has tried to generalize it for example by combining the recurrence concept with next token prediction.
That said despite the considerable gains this seems to just be some hyperparameter tweaking rather than a foundational improvement.
Green flag that he references the I Ching, most original ideas come through analogy. Paul Werbos claims he invented backprop to formalize Freud's theory of “psychic energy” into an algorithm.
This needs a citation. Israel developed their nukes 50 years ago with the assistance of Jewish nuclear physicists from around the world and french materials. They didn't need to steal nuclear secrets.
Here you go: https://arxiv.org/abs/2502.05171