Ask HN: What are the foundational texts for learning about AI/ML/NN?

Haugeland is GOFAI/cognitive science, not directly relevant to modern machine learning variety of models unless you are doing reinforcement learning or trees stuff (hey poker/chess/Go bots are pretty cool!). Russel and Norvig are the typical introductory textbooks for those. Marks and Haykins are all severely out of date (they have solid content, but they don't have the same scale of modern deep learning which has many emergent properties).

You are approaching this like an established natural sciences field where old classics = good. This is not true for ML. ML is developing and evolving quickly.

I suggest taking a look at Kevin Murphy's series for the foundational knowledge. Sutton and Barto for reinforcement learning. Mackay's learning algorithms and information theory book is also excellent.

Kochenderfer's ML series is also excellent if you like control theory and cybernetics

https://algorithmsbook.com/https://mitpress.mit.edu/9780262039420/algorithms-for-optimi...https://mitpress.mit.edu/9780262029254/decision-making-under...

For applied deep learning texts beyond the basics, I recommend picking up some books/review papers on LLMs, Transformers, GANs. For classic NLP, Jurafsky is the go-to.

Seminal deep learning papers: https://github.com/anubhavshrimal/Machine-Learning-Research-...

Data engineering/science: https://github.com/eugeneyan/applied-ml

For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds

mtlmtlmtlmtl · 3 years ago

A quick point about the "tree stuff" and Norvig&Russell:

While it does cover minimax trees, alphabeta etc, it only really provides a very brief overview. The book is more of an overview of the AI/ML fields as a whole. Game playing AI is dense with various game-specific heuristics that the book scarcely mentions.

Not sure about books, but the best resource I've found on at least chess AI is chessprogramming.org, then just ingesting the papers from the field.

ipnon · 3 years ago

To your second point I have a sneaking suspicion whatever is recommended in this very thread will suddenly jump in its estimation as a “classic.” History is made up as it goes along!

KRAKRISMOTT · 3 years ago

Well, GP's Neural Smithing is a solid example. There is nothing wrong with it, it is surprisingly well written and correct for something published before the millenium.

https://books.google.com/books/about/Neural_Smithing.html?id...

Take a look at the Google Books preview (click view sample). The basics are all there, intro to biological history of neural networks, backpropagation, gradient descent, and partial derivatives etc. It even hints at teacher-student methods!

The only issue is that it missed out on two decades of hardware development (and a bag of other optimization tricks). Modern deep learning implementations requires machine sympathy at scale. It also doesn't have any literature on autoregressive networks like RNNs or image processing tricks like CNNs.

starwind · 3 years ago

Does the order matter for Kochenderfer? Any one of those put more emphasis on controls than the others?

mfrieswyk · 3 years ago

Appreciate the comment very much. I feel like I need to build a foundation context in order to appreciate the significance of the latest developments, but I agree that most of what I posted doesn't represent the state of the art.

There are none anymore. We now know that throwing a bunch of bits into the linear algebra meat grinder gets you endless high quality art and decent linguistic functionality. The architecture of these systems takes maybe a week to deeply understand, or maybe a month for a beginner. That's really it. Everything else is obsolete or no longer applicable unless you're interested in theoretical research on alternatives to the current paradigm.

rg111 · 3 years ago

You are plain exaggerating. You can't do all of them in a few weeks. Algorithms: Lin Reg -> Log Reg -> NN -> CNN + RNN -> GANs + Transformers -> ViT -> Multimodal AI + LLMs + Diffusion + Auto Encoders

    SVM, PCA, kNN, k-means clustering, etc.

    LightGBM, XGboost, Catboost, etc.

    Optimization and optimizers.

    Application-wise:
    Classification, Semantic Segmentation, Pose Estimation, Text Generation, Summarization, NER, Image Generation, Captioning, Sequence Generation (like music/speech), text to speech, speech to text, recommender systems, sentiment amalysis, tabular data, etc.

    Frameworks:
    pandas, sklearn, PyTorch, Jax -> training  inference, data loading

    Platforms:
    AWS + GCP + Azure
    And a lot of GPU shenanigans + framework/platform specific quirks

All these will take you ~2 years or 1.5 years at least,

given that:

- you already know Python/any programming language properly

- you already know college level math (many people say you don't need it, but haven't met a single soul in ML research/modelling without college level math)

- you know Stats 101 matching a good uni curriculum and ability to learn beyond

- you know git, docker, cli, etc.

Every influencer and their mother promising to teach you Data Science in 30 days are plain lying.

Edit: I see that I left out Deep RL. Let's keep it that way for now.

Edit2: Added tree based methods. These are very important. XGBoost outperforms NNs every time on tabular data. I also once used an RF head appended to a DNN, for final prediction. Added optimizers.

jimbokun · 3 years ago

> SVM, PCA, kNN, k-means clustering

Are these still relevant in the age of Deep Neural Networks?

sillysaurusx · 3 years ago

A month to deeply understand?

I've been doing it since early 2019 and there are still subtleties that catch me off guard. Get back to me when you're not surprised that you can get rid of biases from many layers without harming training.

I broadly agree with you, but the timeline was just a little too aggressive. By about 10x. :)

hooande · 3 years ago

This is separate from understanding how a language model or transformer works. You could read the major papers behind those ideas and read every line of code involved several times over in a month. I'd recommend it, if you're super curious.

You can figure out the bias thing after about a month (or so) of hands on practice. Do one Kaggle seriously and it'll become pretty clear, pretty quickly.

topspin · 3 years ago

> I've been doing it since early 2019 and there are still subtleties that catch me off guard.

That's true of every non-trivial discipline. I often learn subtleties about programming languages and hobbies I've been dealing with for decades.

jtmcmc · 3 years ago

This is definitely a take that ignores the massive amount of utility for ML that exists outside of generative images and NLP on the one hand and on the other vastly misrepresents the time it takes to understand a model, assuming one does not already have a background in CS, linear algebra and in particular matrix calculus, probability, stats, etc...

cyber_kinetist · 3 years ago

You still need to understand some basic theory/math about probabilistic inference (along with some knowledge of linear algebra), or else you’ll get a bit overwhelmed by some of the equations and not understand what the papers are talking about. PRML by Bishop is probably more than enough to start reading ML papers comfortably though. (This would probably be too easy for a competent math major, but not all of us are trained that way from the beginning…)

jeffreyrogers · 3 years ago

I'm not sure why you're getting downvoted. I find it hard to believe that someone without a decently strong math background could make sense of a modern paper on deep learning. I have a math minor from a good school and had to brush up on some topics before papers started making sense to me.

moneywoes · 3 years ago

What resources are there to understand in a month?

softwaredoug · 3 years ago

"Introduction to Statistical Learning" - https://www.statlearning.com/

(there's also "Elements of Statistical Learning" which is a more advanced version)

AI: A Modern Approach - https://aima.cs.berkeley.edu/

ISL is a legit good book. Has the correct amount and balance or rigor and application.

The explanation, examples, projects, math- all are crisp.

As the name suggests, it is only an introduction (unlike CLRS). And it does serve as a great beginners' book giving you proper foundation for the things that you learn and apply in the future.

One thing people complain about is it being written in R, but no serious hacker should fear R, as it can be picked up in 30 minutes, and you can implement the ideas in Python.

As someone with industry experience in Deep Learning, I will recommend this book.

The ML course by Andrew Ng has no parallel, though. One must try and do that course. Not sure about the current iteration, but the classic one (w/ Octabe/MATLAB) was really great.

bjornsing · 3 years ago

The Elements of Statistical Learning, by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie. I’ve seen it referenced quite a few times and the TOC looks good.

This was one of the first books my advisor told me to read when I started my ML phd a...long time ago. The fundamentals of machine learning haven't changed and it's a great book.

kevinskii · 3 years ago

I agree. I read the first edition to Intro to Statistical Learning and it went into just the right level of mathematical depth. The authors also have Youtube lectures that accompany the chapters, and these are a great reinforcement of the material.

jay3ss · 3 years ago

Do you have a link to the YouTube lectures? I'm taking a course and this is one of the books that we're using

ranc1d · 3 years ago

Nice I didn't realise they released a 2nd edition this book and also new website too! Thanks for sharing

TaupeRanger · 3 years ago

raz32dust · 3 years ago

I personally consider Linear algebra to be foundational in AI/ML. Intro to Linear algebra, Gilbert Strang. And his free course on MIT OCW is fantastic too.

While having strong mathematical foundation is useful, I think developing intuition is even more important. For this, I recommend Andrew Ng's coursera courses first before you dive too deep.

mindcrime · 3 years ago

Another interesting resource for Linear Algebra is the "Coding the Matrix" course.

http://codingthematrix.com/

https://www.youtube.com/playlist?list=PLEhMEyM9jSinRHXJgRCOL...

viscanti · 3 years ago

Strang is great but he covers a lot of things that don't have much carryover to AI/ML and doesn't really cover things like Jacobians which do. Maybe there's something more useful for someone who is only learning Calculus and Linear Algebra for AI/ML than what Strang teaches.

AadiBugg · 3 years ago

These two O'Reilly books might be exactly what someone like this needs: https://www.oreilly.com/library/view/practical-linear-algebr...https://www.oreilly.com/library/view/essential-math-for/9781...

nephanth · 3 years ago

Linear algebra, and differential calculus (needs linear algebra), and a bit of optimisation (at least get an understanding of sgd)

Also proba/statistics! Without those you can end up doing stuff pretty wrong

I never took beyond Precalculus in school, thanks for the tip!

NationalPark · 3 years ago

Many of the suggestions so far are assuming you have taken undergraduate linear algebra and calculus. I'd start with those two subjects, you really can't build a foundational understanding of modern AI techniques without them.

p1esk · 3 years ago

Oh, most recommendations here assume stem college math knowledge. You should become comfortable with calculus, linear algebra, and probability/stats - those are the foundations of ML.

crosen99 · 3 years ago

"Neural Networks and Deep Learning", by Michael Nielsen http://neuralnetworksanddeeplearning.com (full text)

The first chapter walks through a neural network that recognizes handwritten digits implemented in a little over 70 lines of Python and leaves you with a very satisfying basic understanding of how neural networks operate and how they are trained.

martythemaniak · 3 years ago

This is the thing that made NNs "click" for me, I think it was very good. Before this I did Andrew Ng's old ML course on coursera, so I thought that was a good intro to old ML approaches, common terms/techniques and flowed nicely into NNs.

But there's are both kinda old now, so there must be something newer that'll give you an equally good intro to transformers, etc.

nmfisher · 3 years ago

+1 for this, when I was coming in as a complete newb to neural networks, this was the clearest and most accessible material I found.

conjectureproof · 3 years ago

+1 on Elements of Statistical Learning.

Here is how I used that book, starting with a solid foundation in linear algebra and calculus.

Learn statistics before moving on to more complex models (neural networks).

Start by learning ols and logistic regression, cold. Cold means you can implement these models from scratch using only numpy ("I do not understand what I cannot build"). Then try to understand regularization (lasso, ridge, elasticnet), where you will learn about the bias/variance tradeoff, cross-validation and feature selection. These topics are explained well in ESL.

For ols and logistic regression I found it helpful to strike a 50-50 balance between theory (derivations and problems) and practice (coding). For later topics (regularization etc) I found it helpful to tilt towards practice (20/80).

If some part of ESL is unclear, consult the statsmodels source code and docs (top preference) or scikit (second preference, I believe it has rather more boilerplate... "mixin" classes etc). Approach the code with curiosity. Ask questions like "why do they use np.linalg.pinv instead of np.linalg.inv?"

Spend a day or five really understanding covariance matrices and the singular value decomposition (and therefore PCA which will give you a good foundation for other more complicated dimension reduction techniques).

With that foundation, the best way to learn about neural architectures is to code them from scratch. Start with simpler models and work from there. People much smarter than me have illustrated how that can go: https://gist.github.com/karpathy/d4dee566867f8291f086 https://nlp.seas.harvard.edu/2018/04/03/attention.html

While not an AI expert, I feel this path has left me reasonably prepared to understand new developments in AI and to separate hype from reality (which was my principal objective). In certain cases I am even able to identify new developments that are useful in practical applications I actually encounter (mostly using better text embeddings).

Good luck. This is a really fun field to explore!

poulsbohemian · 3 years ago

I'm sitting ten feet from my copy of Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig. While I will say it has a lot of still worthwhile basic information, I really wouldn't recommend it. It's an enormous book, so physically difficult to read, but also the bulk of the content is somewhere between dated and terse. I went through school and studied AI ten years before it was written, and I'm glad I didn't use it as an undergrad textbook - would have been overwhelming.

One of the problem with AI is exactly what you noted above - there are a lot of subcategories and my gut tells me these will grow. For the real neophyte, I'd say start with something that interests you or that you need for work - you likely aren't going to digest all of this in a month and probably no single book will meet all your needs.

bradreaves2 · 3 years ago

This is off the beaten path, but consider Abu-Mostafa et al.'s "Learning from Data". https://www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/...

I adore PRML, but the scope and depth is overwhelming. LfD encapsulates a number of really core principles in a simple text. The companion course is outstanding and available on EdX.

The tradeoff is that LfD doesn't cover a lot of breath in terms of looking at specific algorithms, but your other texts will do a better job there.

My second recommendation is to read the documentation for Scikit.Learn. It's amazingly instructive and a practical guide to doing ML in practice.

vowelless · 3 years ago

I strongly second this. Abu Mostafa has videos and homework for this course too. This course was the one that made a LOT of fundamental things “click”, like, why does learning even work and what are some broad expectations about what we can and cannot learn.

PartiallyTyped · 3 years ago

LfD is a great book to get people to think about complexity classes and model families. We used that in my grad course and I can recommend it.