Readit News logoReadit News
Scene_Cast2 · a year ago
I wonder how much improvement is owed to which changes. I've also never heard of "Muon - Momentum Orthogonalized by Newton-schulz" being used.

EDIT: there's a bit more info on his twitter - https://x.com/kellerjordan0

It looks like he created this optimizer. Works on 2D matrices only.

molticrystal · a year ago
Just needs a Zero To Hero series episode offering line by line commentary to follow along on why each choice was made over alternatives.
whiplash451 · a year ago
Cool work. No license?
byyoung3 · a year ago
do you have a baseline of the regular implementation with 3x learning rate?
m3kw9 · a year ago
So it compresses info better.
pyinstallwoes · a year ago
That is literally intelligence.
parineum · a year ago
It's not.
gavindean90 · a year ago
Seems like this is a modded NanoGPT not the original.
munchler · a year ago
Yes. It’s literally called “Modded-NanoGPT”.