Readit News logoReadit News
warkanlock · 2 years ago
This is an excellent tool to realize how an LLM actually works from the ground up!

For those reading it and going through each step, if by chance you get stuck on why 48 elements are in the first array, please refer to the model.py on minGPT [1]

It's an architectural decision that it will be great to mention in the article since people without too much context might lose it

[1] https://github.com/karpathy/minGPT/blob/master/mingpt/model....

taliesinb · 2 years ago
Wow, I love the interactive wizzing around and the animation, very neat! Way more explanations should work like this.

I've recently finished an unorthodox kind of visualization / explanation of transformers. It's sadly not interactive, but it does have some maybe unique strengths.

First, it gives array axis semantic names, represented in the diagrams as colors (which this post also uses). So sequence axis is red, key feature dimension is green, multihead axis is orange, etc. This helps you show quite complicated array circuits and get an immediate feeling for what is going on and how different arrays are being combined with each-other. Here's a pic of the the full multihead self-attention step for example:

https://math.tali.link/raster/052n01bav6yvz_1smxhkus2qrik_07...

It also uses a kind of generalization tensor network diagrammatic notation -- if anyone remembers Penrose's tensor notation, it's like that but enriched with colors and some other ideas. Underneath these diagrams are string diagrams in a particular category, though you don't need to know (nor do I even explain that!).

Here's the main blog post introducing the formalism: https://math.tali.link/rainbow-array-algebra

Here's the section on perceptrons: https://math.tali.link/rainbow-array-algebra/#neural-network...

Here's the section on transformers: https://math.tali.link/rainbow-array-algebra/#transformers

jimmySixDOF · 2 years ago
You might also like this interactive 3D walk through explainer from PyTorch :

https://pytorch.org/blog/inside-the-matrix/

riemannzeta · 2 years ago
Are you referring specifically to line 141, which sets the number of embedding elements for gpt-nano to 48? That also seems to correspond to the Channel size C referenced in the explanation text?

https://github.com/karpathy/minGPT/blob/master/mingpt/model....

tomnipotent · 2 years ago
That matches the name of default model selected in the right pane, "nano-gpt". I missed the "bigger picture" at first before I noticed the other models in the right pane header.
namocat · 2 years ago
Yes, thank you - It was unexplained, so I got stuck on "Why 48?", thinking I'd missing something right out of the gate.
zombiwoof · 2 years ago
I was thinking 42 ;-)
jayveeone · 2 years ago
Yes yes it was the 48 elements thing that got me stuck. Definitely not everything from the second the page loaded.
holtkam2 · 2 years ago
The visualization I've been looking for for months. I would have happily paid serious money for this... the fact that it's free is such a gift and I don't take it for granted.
terminous · 2 years ago
Same... this is like a textbook, but worth it
wills_forward · 2 years ago
My jaw drop to see algorhythmic complexity laid out so clearly in a 3d space like that. I wish I was smart enough to know if it's accurate or not.
block_dagger · 2 years ago
To know, you must perform intellectual work, not merely be smart. I bet you are smart enough.
nocoder · 2 years ago
What a nice comment!! This has been a big failing of my mental model. I always believed if I was smart enough I should understand things without effort. Still trying to unlearn this....
SubiculumCode · 2 years ago
99% persperation, 1% inspiration, as the addage goes...and I completely agree.

The frustration for the curious is that there is more than you can ever learn. You encounter something new and exciting, but then you realize that to really get to the spot where you can contribute will take at least a year or six, and that will require dropping other priorities.

gryfft · 2 years ago
Damn, this looks phenomenal. I've been wanting to do a deep dive like this for a while-- the 3D model is a spectacular pedagogic device.
quickthrower2 · 2 years ago
Andrej Karpathy twisting his hands as he explains it is also a great device. Not being sarcastic, when he explains it I understand it for a good minute it two. Then need to rewatch as I forget (but that is just me)!
hodanli · 2 years ago
which video specifically?
baq · 2 years ago
Could as well be titled 'dissecting magic into matmuls and dot products for dummies'. Great stuff. Went away even more amazed that LLMs work as well as they do.
mark_l_watson · 2 years ago
I am looking at Brenden’s GitHub repo https://github.com/bbycroft/llm-viz

Really nice stuff.

flockonus · 2 years ago
Twitter thread by the author sharing some extra context on this work: https://twitter.com/BrendanBycroft/status/173104295714982714...
itslennysfault · 2 years ago
Thanks for sharing. This is a great thread.

Since X now hides replies for non-logged in user here is a nitter link for those without an account (like me) that might want to see the full thread.

https://nitter.net/BrendanBycroft/status/1731042957149827140

3abiton · 2 years ago
I wish it could integrate other open source LLMs in the backend, but this is already an amazing viz.
tysam_and · 2 years ago
Another visualization I would really love would be a clickable circular set of possible prediction branches, projected onto a Poincare disk (to handle the exponential branching component of it all). Would take forever to calculate except on smaller models, but being able to visualize branch probabilities angularly for the top n values or whatever, and to go forwards and backwards up and down different branches would likely yield some important insights into how they work.

Good visualization precludes good discoveries in many branches of science, I think.

(see my profile for a longer, potentially more silly description ;) )