I had fun writing them and I'm glad that they are making a positive impact, but since then I've been consumed by my work on Math Academy, which I find even more fun/impactful. (We do have a Methods of Proof course out, which is many times more scaffolded, refined, comprehensive, and generally instructionally superior to any textbook I could write independently, not to mention it's adaptive.)
So, long story short, I enjoyed writing those textbooks and am glad they're seeing the light of day, but I've moved on to a new chapter of life and don't plan on writing any more math books in the future (with the possible exception of something super niche like the math behind maximizing learning efficiency in hierarchical knowledge structures).
I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.
I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.
I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment