The new edition has been split in two parts. The pdf draft (921 pages) and python code [1] of the first part are now available. The table of contents of the second part is here [2].
From the preface:
"By Spring 2020, my draft of the second edition had swollen to about 1600 pages, and I was still not
done. At this point, 3 major events happened. First, the COVID-19 pandemic struck, so I decided
to “pivot” so I could spend most of my time on COVID-19 modeling. Second, MIT Press told me
they could not publish a 1600 page book, and that I would need to split it into two volumes. Third,
I decided to recruit several colleagues to help me finish the last ∼ 15% of “missing content”. (See
acknowledgements below.)
The result is two new books, “Probabilistic Machine Learning: An Introduction”, which you are
currently reading, and “Probabilistic Machine Learning: Advanced Topics”, which is the sequel to
this book [Mur22].
Together these two books attempt to present a fairly broad coverage of the field
of ML c. 2020, using the same unifying lens of probabilistic modeling and Bayesian decision theory
that I used in the first book.
Most of the content from the first book has been reused, but it is now split fairly evenly between
the two new books. In addition, each book has lots of new material, covering some topics from deep
learning, but also advances in other parts of the field, such as generative models, variational inference
and reinforcement learning. To make the book more self-contained and useful for students, I have
also added some more background content, on topics such as optimization and linear algebra, that
was omitted from the first book due to lack of space.
Another major change is that nearly all of the software now uses Python instead of Matlab."
It's very encouraging to see Matlab losing ground in the educational space. I don't know why so many engineers let their foundational skills to be locked behind a proprietary ecosystem like that.
>I don't know why so many engineers let their foundational skills to be locked behind a proprietary ecosystem like that.
Because no open source toolkit can do what Matlab can do.
The same is true of a lot of high end software: Photoshop, pretty much any serious parametric CAD modeling system (say, SolidWorks), DaVinci Resolve, Ableton Live, etc. When a professional costs $100K+ to employ, paying a few grand to make them vastly more productive is a no brainer. If open source truly offered a replacement, then these costly programs would die. But there just isn't anything close for most work.
Matlab is used for massive amounts of precise numerical engineering design, modeling, and running systems. So while Python is good for some tasks, for the places Matlab shines Python is no where near usable. And before Python catches up in this space, I'd expect Julia to get there faster.
It's a regression as far as code readability goes for fairly straightforward reasons: almost everything in Matlab is a matrix. Matrices are not first class citizens in Python, and it matters. I use Python a hell of a lot more than Matlab, but for examining how an algorithm works (say, for implementing in another language or modifying it to do tricks), Matlab wins. Go look at these PRML collections in Python and Matlab and see if you disagree:
This is probably my favorite introductory machine learning book. The fact that he places almost everything in the language of graphical models is such a good common ground to build off.
This really sets you up to realize that there is (and should be) a lot more to doing a good job in machine learning than simply minimizing an objective function. The answers you get depend on the model you create as do the questions you can hope to answer.
I don't see a clear list of differences between this new edition. Does anyone know what's new?
Agree with you. But none of this is useful for practical (applied) machine learning. I don't want to disappoint you but you can read it as machine learning porn, but otherwise don't waste time on it.
I mean, as a graduate student, it was definitely incredibly useful. As a practicing data scientist, I’d have to say that it’s also incredibly useful.
I’ve used this stuff, and more often, the ideas taught, to break down a problem into a tackle-able set of pieces more times than I can count.
Never underestimate the fundamentals. Too many of my colleagues use models without actually understanding any of it. I’ve debugged so many problems by looking at the technical details in original papers and textbooks.
Graphical models are just a way to encode relationships between different variables in a probabilistic model. Directed acyclic graphs (DAGs) allow you to specify (most of) the conditional independence structures that you can have between things like parameters and random variables.
This is really useful information because it can help you identify what information is truly relevant for the estimation of certain parameters (so sufficient statistics) or help you crystallize your understanding of the implications of the model you’ve created. In other words, it helps show you the ways in which your model says different aspects of your data should influence others.
This creates testable implications of the model. If your model says that two variables should be conditionally independent given a third, but they’re not, you have an avenue for refinement. You can also clearly identify your assumptions or the implications of your assumptions.
Another great thing about them is that exact inference for certain (most) structures is known to be computationally infeasible. There are a lot of different inference schemes available that can help you with different approximations with various drawbacks/advantages, heuristics that sort of work, or even ways of drawing samples from the true distribution if you can identify the structures. See belief propagation, loopy belief propagation, sequential Monte Carlo, and Markov chain Monte Carlo methods.
On top of this it helps you see everything in a general framework. Lots of the fundamental pieces of ML models are really just slight tweaks to other things. For instance, SVMs are linear models on kernel spaces with a specific structural prior. Same with splines; it’s just a different basis function. All of this helps you see the pieces of different methods that are actually identical. This helps you make connections and learn more effectively, in my opinion.
For anybody truly serious about this field, I recommend the below book. It has some poor reviews on Amazon, which I was shocked to see, but it is my favourite book and taught me the core of probability theory and statistics, in a way most books don’t. Your understanding of Machine Learning will be better than 90% of those out there, if you can get through the principles in this book.
I topped statistics at the most prestigious university in my country both at the undergrad and postgrad level, and had no problem discussing advanced concepts with Senior PHDs in Quantitative Fields, and I thank this book the most for beginning my journey on this. But, and this is important, make sure to do all the exercises!
Is it your favourite book because of how much your personal history is tied to it, and the time you deboted to it, or are you comparing it against other books based on an analytic review and comparison of several books that you did at some point?
Nothing wrong with the former case, I also have favourites that I recommend, but if it’s the latter the recommendation is more helpful; in that case it would be awesome to detail why this one over others.
This is a very good post and I agree with your comments.
The book for me was partly great because of its contents and partly because I worked through every problem and realized how much it taught me. I should do a more factual write up on why it’s a great book, I’ll try to when I get some time.
Looking at the table of contents (for someone who is not familiar with the term 'Probabilistic Machine Learning'), is this just covering typical ML methods through the lens of probability?
Answer is not so black and white as everything in ml has to use probability. You can ignore this unless you are among 20 top researchers who are working on frontier of ml. Bayesian probabilistic techniques does not work or are very slow for any practical purpose.
Why ML books should be so big? In many cases the books are various sub books pasted as one or extensive bibliography reviews that just list progress without any pedagogy. I would suggest splitting them in 200-250 pp format in parts that can serve independently.
From the preface:
"By Spring 2020, my draft of the second edition had swollen to about 1600 pages, and I was still not done. At this point, 3 major events happened. First, the COVID-19 pandemic struck, so I decided to “pivot” so I could spend most of my time on COVID-19 modeling. Second, MIT Press told me they could not publish a 1600 page book, and that I would need to split it into two volumes. Third, I decided to recruit several colleagues to help me finish the last ∼ 15% of “missing content”. (See acknowledgements below.)
The result is two new books, “Probabilistic Machine Learning: An Introduction”, which you are currently reading, and “Probabilistic Machine Learning: Advanced Topics”, which is the sequel to this book [Mur22].
Together these two books attempt to present a fairly broad coverage of the field of ML c. 2020, using the same unifying lens of probabilistic modeling and Bayesian decision theory that I used in the first book. Most of the content from the first book has been reused, but it is now split fairly evenly between the two new books. In addition, each book has lots of new material, covering some topics from deep learning, but also advances in other parts of the field, such as generative models, variational inference and reinforcement learning. To make the book more self-contained and useful for students, I have also added some more background content, on topics such as optimization and linear algebra, that was omitted from the first book due to lack of space.
Another major change is that nearly all of the software now uses Python instead of Matlab."
[1] https://github.com/probml/pyprobml
[2] https://probml.github.io/pml-book/book2.html
Because no open source toolkit can do what Matlab can do.
The same is true of a lot of high end software: Photoshop, pretty much any serious parametric CAD modeling system (say, SolidWorks), DaVinci Resolve, Ableton Live, etc. When a professional costs $100K+ to employ, paying a few grand to make them vastly more productive is a no brainer. If open source truly offered a replacement, then these costly programs would die. But there just isn't anything close for most work.
Matlab is used for massive amounts of precise numerical engineering design, modeling, and running systems. So while Python is good for some tasks, for the places Matlab shines Python is no where near usable. And before Python catches up in this space, I'd expect Julia to get there faster.
https://github.com/ctgk/PRML
https://github.com/PRML/PRMLT
This really sets you up to realize that there is (and should be) a lot more to doing a good job in machine learning than simply minimizing an objective function. The answers you get depend on the model you create as do the questions you can hope to answer.
I don't see a clear list of differences between this new edition. Does anyone know what's new?
I’ve used this stuff, and more often, the ideas taught, to break down a problem into a tackle-able set of pieces more times than I can count.
Never underestimate the fundamentals. Too many of my colleagues use models without actually understanding any of it. I’ve debugged so many problems by looking at the technical details in original papers and textbooks.
This is really useful information because it can help you identify what information is truly relevant for the estimation of certain parameters (so sufficient statistics) or help you crystallize your understanding of the implications of the model you’ve created. In other words, it helps show you the ways in which your model says different aspects of your data should influence others.
This creates testable implications of the model. If your model says that two variables should be conditionally independent given a third, but they’re not, you have an avenue for refinement. You can also clearly identify your assumptions or the implications of your assumptions.
Another great thing about them is that exact inference for certain (most) structures is known to be computationally infeasible. There are a lot of different inference schemes available that can help you with different approximations with various drawbacks/advantages, heuristics that sort of work, or even ways of drawing samples from the true distribution if you can identify the structures. See belief propagation, loopy belief propagation, sequential Monte Carlo, and Markov chain Monte Carlo methods.
On top of this it helps you see everything in a general framework. Lots of the fundamental pieces of ML models are really just slight tweaks to other things. For instance, SVMs are linear models on kernel spaces with a specific structural prior. Same with splines; it’s just a different basis function. All of this helps you see the pieces of different methods that are actually identical. This helps you make connections and learn more effectively, in my opinion.
I topped statistics at the most prestigious university in my country both at the undergrad and postgrad level, and had no problem discussing advanced concepts with Senior PHDs in Quantitative Fields, and I thank this book the most for beginning my journey on this. But, and this is important, make sure to do all the exercises!
https://www.amazon.com/John-Freunds-Mathematical-Statistics-...
Is it your favourite book because of how much your personal history is tied to it, and the time you deboted to it, or are you comparing it against other books based on an analytic review and comparison of several books that you did at some point?
Nothing wrong with the former case, I also have favourites that I recommend, but if it’s the latter the recommendation is more helpful; in that case it would be awesome to detail why this one over others.
This idea of compared review is useful here:
https://fivebooks.com/
And here:
https://www.lesswrong.com/posts/xg3hXCYQPJkwHyik2/the-best-t...
The book for me was partly great because of its contents and partly because I worked through every problem and realized how much it taught me. I should do a more factual write up on why it’s a great book, I’ll try to when I get some time.
Deleted Comment