Andrej's series is excellent, Sebastian's book + this video are excellent. There's a lot of overlap but they go into more detail on different topics or focus on different things. Andrej's entire series is absolutely worth watching, his upcoming Eureka Labs stuff is looking extremely good too. Sebastian's blog and book are definitely worth the time and money IMO.
Nice write up Sebastian, looking forward to the book. There are lots of details on the LLM and how it’s composed, would be great if you can expand on how Llama and OpenAI could be cleaning and structuring their training data given it seems this is where the battle is heading in the long run.
But isn't it the beauty of llm's that they need comparably little preparation (unstructured text as input) and pick the features on their own so to say?
I really like Sebastian's content but I do agree with you. I didn't get into deep learning until starting with Karpathy's series, which starts by creating an autograd engine from scratch. Before that I tried learning with fast.ai, which dives immediately into building networks with Pytorch, but I noped out of there quickly. It felt about as fun as learning Java in high school. I need to understand what I'm working with!
Maybe it's just different learning styles. Some people, me included, like to start getting some immediate real world results to keep it relevant and form some kind of intuition, then start peeling back the layers to understand the underlying principles. With fastAI you are already doing this by the 3rd lecture.
Like driving a car, you don't need to understand what's under the hood you start driving, but eventually understanding it makes you a better driver.
Bach (Johann Sebastian .. there were many musical Bach's in the family) owned and wrote for harpsichords, lute-harpsichords, violin, viola, cellos, a viola da gamba, lute and spinet.
Never had a piano, not even a fortepiano .. though reportedly he played one once.
Considering i seem to be the minority here based on all the other responses the message you replied to, the answer i'd give is "by mine, i guess".
At least when i saw the "Building LLMs from the Ground Up" what i expected was someone to open vim, emacs or their favorite text editor and start writing some C code (or something around that level) to implement, well, everything from the "ground" (the operating system's user space which in most OSes is around the overall level of C) and "up".
No it is not. From scratch has a meaning. To me it means: in a way that letxs you undrrstand the important details, e.g. using a programming language without major dependencies.
Calling that from scratch is like saying "Just go to the store and tell them what you want" in a series called: "How to make sausage from scratch".
When I want to know how to do X from scratch I am not interested in "how to get X the fastest way possible", to be frank I am not even interested in "How to get X in the way others typically get it", what I am interested in is learning how to do all the stuff that is normally hidden away in dependencies or frameworks myself — or, you know, from scratch. And considering the comments here I am not alone in that reading.
You could always go deeper and from some points of view, it's not "from the ground up" enough unless you build your own autograd and tensors from plain numpy arrays.
Your comment is one of the most pompous that I've ever read.
NVDIA value lies only in pytorch and cuda optimizations with respect with pure c implementation, so saying that you need go lower level than cuda or pytorch means simply reinventing Nvidia. Good luck with that
1. I only said the meaning of the title is wrong, and I praised the content
2. I didn't say CUDA wouldn't be ground up or low level (please re-read) (I say in another comment about a no-code guide with CUDA, but it's obviously a joke)
3. And finally, I think your comment comes out as holier than thou and finger pointing and making a huge deal out of a minor semantic observation.
Wanted to say the same thing. As an educator who once gave a course on a similar topic for non-programmers you need to start way, way earlier.
E.g.
1. Programming basics
2. How to manipulate text using programs (reading, writing, tokenization, counting words, randomization, case conversion, ...)
3. How to extract statistical properties from texts (ngrams, etc, ...)
4. How to generate crude text using markov chains
5. Improving on markov chains and thinking about/trying out different topologies
Etc.
Sure markov chains are not exactly LLMS, but they are a good starting point to byild a intuition how programs can extract statistical properties from text and generate new text based on that. Also it gives you a feeling how programes can work on text.
If you start directly with a framework there is some essential understanding missing.
Beyond learning how it all works and demo, there is not much practical usage. You can train it on current events if you feed that corpus during training instead of just OpenWebText. Shouldn't be hard.
Quite a cry, in a submission page from one of the most language "obsessed" in this community.
Now: "code" is something you establish - as the content of the codex medium (see https://en.wikipedia.org/wiki/Codex for its history); from the field of law, a set of rules, exported in use to other domains since at least the mid XVI century in English.
"Program" is something you publish, with the implied content of a set of intentions ("first we play Bach then Mozart" - the use postdates "code"-as-"set of rules" by centuries).
"Develop" is something you unfold - good, but it does not imply "rules" or "[sequential] process" like the other two terms.
I am from Brazil and I find this funny because in my circle of friends/co-wroekers we mostly use "coding" when speaking English, or "codar" (code as a Portuguese verb) with other Brazilians.
I am not sure why, but I think it is because "program" has a strong association with prostitution in Brazilian Portuguese.
I am from Europe and I am not completely sure about that to be honest. I also prefer programming.
I also dislike software development as it reminds me of developing a photograhic negative – like "oh let's check out how the software we developed came out".
It should be software engineering and it should be held to a similar standard as other engineering fields if it isn't done in a non-professional context.
This is the exact level of details I was looking for. I'm fairly experienced with deep learning and pytorch and don't want to see them built from scratch. I found Andrej's materials too low level and I tend to get lost in the weeds. This is not a criticism but just a comment for someone in a similar situation as I'm.
This is great. Just yesterday I was wondering how exactly transformers/attention and LLMs work. I'd worked through how back-propagation works in a deep RNN a long while ago and thought it would be interesting to see the rest.
This is great! Hope it works on a Windows 11 machine too (I often find that when Windows isn't explicitly mentioned, the code isn't tested on it and usually fails to work due to random issues).
Anyway I will watch it tonight before bed. Thank you for sharing.
Dead Comment
https://ai.meta.com/research/publications/the-llama-3-herd-o...
edit: grammar
It's a fine PyTorch tutorial but let's not pretend it's something low level.
Like driving a car, you don't need to understand what's under the hood you start driving, but eventually understanding it makes you a better driver.
But it doesn’t start there. It uses top-down pedagogy, instead of bottom up.
Never had a piano, not even a fortepiano .. though reportedly he played one once.
At least when i saw the "Building LLMs from the Ground Up" what i expected was someone to open vim, emacs or their favorite text editor and start writing some C code (or something around that level) to implement, well, everything from the "ground" (the operating system's user space which in most OSes is around the overall level of C) and "up".
Deleted Comment
Calling that from scratch is like saying "Just go to the store and tell them what you want" in a series called: "How to make sausage from scratch".
When I want to know how to do X from scratch I am not interested in "how to get X the fastest way possible", to be frank I am not even interested in "How to get X in the way others typically get it", what I am interested in is learning how to do all the stuff that is normally hidden away in dependencies or frameworks myself — or, you know, from scratch. And considering the comments here I am not alone in that reading.
NVDIA value lies only in pytorch and cuda optimizations with respect with pure c implementation, so saying that you need go lower level than cuda or pytorch means simply reinventing Nvidia. Good luck with that
2. I didn't say CUDA wouldn't be ground up or low level (please re-read) (I say in another comment about a no-code guide with CUDA, but it's obviously a joke)
3. And finally, I think your comment comes out as holier than thou and finger pointing and making a huge deal out of a minor semantic observation.
It’s definitely out there and in productive use.
Unfortunately, your argument is a well known fallacy and carries no weight.
Dead Comment
E.g.
1. Programming basics
2. How to manipulate text using programs (reading, writing, tokenization, counting words, randomization, case conversion, ...)
3. How to extract statistical properties from texts (ngrams, etc, ...)
4. How to generate crude text using markov chains
5. Improving on markov chains and thinking about/trying out different topologies
Etc.
Sure markov chains are not exactly LLMS, but they are a good starting point to byild a intuition how programs can extract statistical properties from text and generate new text based on that. Also it gives you a feeling how programes can work on text.
If you start directly with a framework there is some essential understanding missing.
https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...
What sort of things could you do with it? How do you train it on current events?
Beyond learning how it all works and demo, there is not much practical usage. You can train it on current events if you feed that corpus during training instead of just OpenWebText. Shouldn't be hard.
Now: "code" is something you establish - as the content of the codex medium (see https://en.wikipedia.org/wiki/Codex for its history); from the field of law, a set of rules, exported in use to other domains since at least the mid XVI century in English.
"Program" is something you publish, with the implied content of a set of intentions ("first we play Bach then Mozart" - the use postdates "code"-as-"set of rules" by centuries).
"Develop" is something you unfold - good, but it does not imply "rules" or "[sequential] process" like the other two terms.
ChatGPT doesn't know either.
I also dislike software development as it reminds me of developing a photograhic negative – like "oh let's check out how the software we developed came out".
It should be software engineering and it should be held to a similar standard as other engineering fields if it isn't done in a non-professional context.
https://developer.nvidia.com/cuda-downloads?target_os=Linux&...