I've picked up the following, just wondering what everyone's thoughts are on the best books for a strong foundation:
Pattern Recognition and Machine Learning - Bishop
Deep Learning - Goodfellow, Bengio, Courville
Neural Smithing - Reed, Marks
Neural Networks - Haykin
Artificial Intelligence - Haugeland
(there's also "Elements of Statistical Learning" which is a more advanced version)
AI: A Modern Approach - https://aima.cs.berkeley.edu/
The explanation, examples, projects, math- all are crisp.
As the name suggests, it is only an introduction (unlike CLRS). And it does serve as a great beginners' book giving you proper foundation for the things that you learn and apply in the future.
One thing people complain about is it being written in R, but no serious hacker should fear R, as it can be picked up in 30 minutes, and you can implement the ideas in Python.
As someone with industry experience in Deep Learning, I will recommend this book.
The ML course by Andrew Ng has no parallel, though. One must try and do that course. Not sure about the current iteration, but the classic one (w/ Octabe/MATLAB) was really great.
You are approaching this like an established natural sciences field where old classics = good. This is not true for ML. ML is developing and evolving quickly.
I suggest taking a look at Kevin Murphy's series for the foundational knowledge. Sutton and Barto for reinforcement learning. Mackay's learning algorithms and information theory book is also excellent.
Kochenderfer's ML series is also excellent if you like control theory and cybernetics
https://algorithmsbook.com/https://mitpress.mit.edu/9780262039420/algorithms-for-optimi...https://mitpress.mit.edu/9780262029254/decision-making-under...
For applied deep learning texts beyond the basics, I recommend picking up some books/review papers on LLMs, Transformers, GANs. For classic NLP, Jurafsky is the go-to.
Seminal deep learning papers: https://github.com/anubhavshrimal/Machine-Learning-Research-...
Data engineering/science: https://github.com/eugeneyan/applied-ml
For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds
While it does cover minimax trees, alphabeta etc, it only really provides a very brief overview. The book is more of an overview of the AI/ML fields as a whole. Game playing AI is dense with various game-specific heuristics that the book scarcely mentions.
Not sure about books, but the best resource I've found on at least chess AI is chessprogramming.org, then just ingesting the papers from the field.
https://books.google.com/books/about/Neural_Smithing.html?id...
Take a look at the Google Books preview (click view sample). The basics are all there, intro to biological history of neural networks, backpropagation, gradient descent, and partial derivatives etc. It even hints at teacher-student methods!
The only issue is that it missed out on two decades of hardware development (and a bag of other optimization tricks). Modern deep learning implementations requires machine sympathy at scale. It also doesn't have any literature on autoregressive networks like RNNs or image processing tricks like CNNs.
given that:
- you already know Python/any programming language properly
- you already know college level math (many people say you don't need it, but haven't met a single soul in ML research/modelling without college level math)
- you know Stats 101 matching a good uni curriculum and ability to learn beyond
- you know git, docker, cli, etc.
Every influencer and their mother promising to teach you Data Science in 30 days are plain lying.
Edit: I see that I left out Deep RL. Let's keep it that way for now.
Edit2: Added tree based methods. These are very important. XGBoost outperforms NNs every time on tabular data. I also once used an RF head appended to a DNN, for final prediction. Added optimizers.
Are these still relevant in the age of Deep Neural Networks?
I've been doing it since early 2019 and there are still subtleties that catch me off guard. Get back to me when you're not surprised that you can get rid of biases from many layers without harming training.
I broadly agree with you, but the timeline was just a little too aggressive. By about 10x. :)
You can figure out the bias thing after about a month (or so) of hands on practice. Do one Kaggle seriously and it'll become pretty clear, pretty quickly.
That's true of every non-trivial discipline. I often learn subtleties about programming languages and hobbies I've been dealing with for decades.
While having strong mathematical foundation is useful, I think developing intuition is even more important. For this, I recommend Andrew Ng's coursera courses first before you dive too deep.
http://codingthematrix.com/
https://www.youtube.com/playlist?list=PLEhMEyM9jSinRHXJgRCOL...
Also proba/statistics! Without those you can end up doing stuff pretty wrong
The first chapter walks through a neural network that recognizes handwritten digits implemented in a little over 70 lines of Python and leaves you with a very satisfying basic understanding of how neural networks operate and how they are trained.
But there's are both kinda old now, so there must be something newer that'll give you an equally good intro to transformers, etc.
Here is how I used that book, starting with a solid foundation in linear algebra and calculus.
Learn statistics before moving on to more complex models (neural networks).
Start by learning ols and logistic regression, cold. Cold means you can implement these models from scratch using only numpy ("I do not understand what I cannot build"). Then try to understand regularization (lasso, ridge, elasticnet), where you will learn about the bias/variance tradeoff, cross-validation and feature selection. These topics are explained well in ESL.
For ols and logistic regression I found it helpful to strike a 50-50 balance between theory (derivations and problems) and practice (coding). For later topics (regularization etc) I found it helpful to tilt towards practice (20/80).
If some part of ESL is unclear, consult the statsmodels source code and docs (top preference) or scikit (second preference, I believe it has rather more boilerplate... "mixin" classes etc). Approach the code with curiosity. Ask questions like "why do they use np.linalg.pinv instead of np.linalg.inv?"
Spend a day or five really understanding covariance matrices and the singular value decomposition (and therefore PCA which will give you a good foundation for other more complicated dimension reduction techniques).
With that foundation, the best way to learn about neural architectures is to code them from scratch. Start with simpler models and work from there. People much smarter than me have illustrated how that can go: https://gist.github.com/karpathy/d4dee566867f8291f086https://nlp.seas.harvard.edu/2018/04/03/attention.html
While not an AI expert, I feel this path has left me reasonably prepared to understand new developments in AI and to separate hype from reality (which was my principal objective). In certain cases I am even able to identify new developments that are useful in practical applications I actually encounter (mostly using better text embeddings).
Good luck. This is a really fun field to explore!
One of the problem with AI is exactly what you noted above - there are a lot of subcategories and my gut tells me these will grow. For the real neophyte, I'd say start with something that interests you or that you need for work - you likely aren't going to digest all of this in a month and probably no single book will meet all your needs.
I adore PRML, but the scope and depth is overwhelming. LfD encapsulates a number of really core principles in a simple text. The companion course is outstanding and available on EdX.
The tradeoff is that LfD doesn't cover a lot of breath in terms of looking at specific algorithms, but your other texts will do a better job there.
My second recommendation is to read the documentation for Scikit.Learn. It's amazingly instructive and a practical guide to doing ML in practice.