This is my favorite book on statistics. Full stop. The author Andrew Gelman created a whole new branch of Bayesian statistics with both his theoretical work on hierarchical modeling while also publishing Stan to enable practical applications of hierarchical models.
It took me about a year to work through this book on the side (including the exercises) and it provided the foundation for years of fruitful research into hierarchical Bayesian models. It’s a definitely not an introductory read, but for any looking to advance their statistical toolkit, I cannot recommend this book highly enough.
As a starting point, I’d strongly suggest the first 5 chapters for an excellent introduction to Gelman’s modeling philosophy, and then jumping around the table of contents to any topics that look interesting.
His book on hierarchical modeling with Hill has 20398 cites on Google Scholar https://scholar.google.com/scholar?cluster=94492350364273118... and Wikipedia calls him "a major contributor to statistical philosophy and methods especially in Bayesian statistics[6] and hierarchical models.[7]", which sounds like the claim is more true than false.
Finally Bayesian:
Johnson, Ott, Dogucu - https://www.bayesrulesbook.com/
This is a great book, it will teach you everything from very basics to advanced hierachical bayesian modeling and all that by using reproducible code and stan/rstanarm
Once you master this, next level may be using brms and Solomon Kurz has done full Regression and Other Stories Book using tidyerse/brms. His knowledge of tidyverse and brms is impressive and demonstrated in his code.
https://github.com/ASKurz/Working-through-Regression-and-oth...
I don’t mean for the bar to sound too high. I think working through khan academy’s full probability, calculus and linear algebra courses would give you a strong foundation. I worked through this book having just completed the equivalent courses in college.
It’s just a relatively dense book. There’s some other really good suggestions in this thread, most of which I’ve heard good things about. If you have a background in programming, I’d suggest Bayesian Methods for Hackers as a really good starting point. But you can also definitely tackle this book head on, and it will be very rewarding.
Bayesian Statistics the Fun Way is probably the best place to start if you're coming at this from 0. It covers the basics of most of the foundational math you'll need along the way and assumes basically no prerequisites.
After than Statistical Rethinking will take you much deeper into more complex experiment design using linear models and beyond as well as deepening your understanding of other areas of math required.
Regression and Other Stories. It’s also co-authored by Gelman and it reads like an updated version of his previous book Data Analysis Using Hierarchical/Multilevel Models.
If you are near Columbia the visiting students post baccalaureate program(run by the SPS last I recall) allows you to take for credit courses in the Social Sciences department. Professor Ben Goodrich has an excellent course on Bayesian Statistics in Social Sciences which teaches it using R(now it might be in Stan).
That course is a good balance between theory and practice. It gave me a practical intuition understanding why posterior distribution of parameters and data are important and how to compute them.
I took the course in 2016 so a lot could have changed.
I found the book from David Mackay on Information Theory, Inference, and Learning Algorithms to be well written and easy to follow. Plus it is freely available from his website: https://www.inference.org.uk/itprnn/book.pdf
It goes through fundamentals of Bayesian ideas in the context of applications in communication and machine learning problems. I find his explanations uncluttered.
For effectively and efficiently learning the calculus, linear algebra, and probability underpinning these fields, Math Academy is going to be your best resource.
Can you explain to me in simple terms how your fruitful research benefited you in a concrete way. Is this simply an enlightening hobby or do you have significant everyday applications? What kind of cool job has you employ Bayesian Data Analysis day to day and for what benefit? How do the suits relate to such knowledge and it's beneficial application that may be well beyond their ken?
My applications have focused on noisy, high dimensional small datasets in which it is either very expensive or impossible to get more data.
One example is rare class prediction on long form text data eg phone calls, podcasts, transcripts. Other networks including neural networks and LLMs are either not flexible enough or require far too much data to achieve the necessary performance. Structured hierarchical modeling is the balance between those two extremes.
Another example is in genomic analysis. Similarly high dimensional, noisy, low data. Additionally, you don’t actually care about the predictions, you want to understand what genes or sets of genes are driving phenotypic behaviors.
I’d be happy to go into more depth via email or chat if this is something you are interested in (on my profile).
The key insight to recognize is that within the Bayesian framework hypothesis testing is parameter estimation. Your certainty in the outcome of the test is your posterior probability over the test-relevant parameters.
Once you realize this you can easily develop very sophisticated testing models (if necessary) that are also easy to understand and reason about. This dramatically simplifies.
If you're looking for a specific book recommendation Statistical Rethinking does a good job covering this at length and Bayesian Statistics the Fun Way is a more beginner friendly book that covers the basics of Bayesian hypothesis testing.
This book is very relevant to those fields. There is a common choice in statistics to either stratify or aggregate your dataset.
There is an example in his book discussing efficacy trials across seven hospitals. If you stratify the data, you lose a lot of confidence, if you aggregate the data, you end up just modeling the difference between hospitals.
Hierarchical modeling allows you to split your dataset under a single unified model. This is really powerful for extracting signal for noise because you can split your dataset according to potential confounding variables eg the hospital from which the data was collected.
I am writing this on my phone so apologies for the lack of links, but in short the approach in this book is extremely relevant of medical testing.
I can attest how useful Bayesian analysis is. My team recently needed to sample from many millions of items to test their qualities. The question is that given a certain budget and expectation, what's the minimum or maximum number of items that we need to sample. There was an elegant solution to this problem.
What was surprising, though, was how reluctant the engineers are to learn such basic techniques. It's not like the math was hard. They all went through the first-year college math and I'm sure they did reasonably well.
What were they reluctant to learn? Why do they need to learn it?
Plenty of engineers have to take an introductory stats course, but it's not clear why you'd want your engineers to learn bayesian statistics? I would be surprised if they could correctly interpret a p-value or regression coefficient, let alone one with interaction effects. (It'd be wholly useless if they could, fwiw).
It'd be nice if the statisticians/'data scientists' on my team learned their way around the CI/CD pipelines, understood kubernetes pods, and could write their own distributed training versions of their pytorch models, but division-of-labor is a thing for a reason, and I don't expect them to nor need them to.
I guess I have a different philosophy: whoever owns the problem should learn everything necessary to solve the problem. In my case, the engineers showed no interests in learning the algorithm and the math behind it. For instance, when they built the dashboard for the testing, they omitted a few important columns and got the column names wrong. When I tested them on their understanding of the method, there was none. To say the least, my team should know enough to challenge me in case I made any mistake, or so I assume.
On a side note, I believe it is an individual's responsibility to find the coolness in their project. What's the fun of building a dashboard that I have done a thousand times? What's the fun of carrying out a routine that does not challenge me? But solving a problem in a most rigorous and generalized way? That is something in which an engineer can find some fun. Or maybe it's just me.
BDA is THE book to learn Bayesian Modeling in depth rigorously. For different approaches there are a number shared here like Statistical Rethinking from Richard McElreath or Regression and other stories which Gelman and Aki wrote as well.
I also write a book on the topic which is focused a code and example approach. It's available for open access here. https://bayesiancomputationbook.com
Bayesians moving from defense to offense: “I really think it’s kind of irresponsible now not to use the information from all those thousands of medical trials that came before. Is that very radical?”
https://statmodeling.stat.columbia.edu/2023/12/23/bayesians-...
While we're here - I've gained a lot from "Data Analysis: A Bayesian Tutorial" by DS Sivia and J Skilling. It's a graduate level text, and I found the chapters very concise and the subject well-laid out. It was one of those books that gave me continuous insight and fresh inspiration - even though it's more than 10 years old.
It took me about a year to work through this book on the side (including the exercises) and it provided the foundation for years of fruitful research into hierarchical Bayesian models. It’s a definitely not an introductory read, but for any looking to advance their statistical toolkit, I cannot recommend this book highly enough.
As a starting point, I’d strongly suggest the first 5 chapters for an excellent introduction to Gelman’s modeling philosophy, and then jumping around the table of contents to any topics that look interesting.
Deleted Comment
First learn some basic probability theory: Peter K. Dunn (2024). The theory of distributions. https://bookdown.org/pkaldunn/DistTheory
Then frequentist statistics: Chester Ismay, Albert Y. Kim, and Arturo Valdivia - https://moderndive.com/v2/ Mine Çetinkaya-Rundel and Johanna Hardin - https://openintrostat.github.io/ims/
Finally Bayesian: Johnson, Ott, Dogucu - https://www.bayesrulesbook.com/ This is a great book, it will teach you everything from very basics to advanced hierachical bayesian modeling and all that by using reproducible code and stan/rstanarm
Once you master this, next level may be using brms and Solomon Kurz has done full Regression and Other Stories Book using tidyerse/brms. His knowledge of tidyverse and brms is impressive and demonstrated in his code. https://github.com/ASKurz/Working-through-Regression-and-oth...
It’s just a relatively dense book. There’s some other really good suggestions in this thread, most of which I’ve heard good things about. If you have a background in programming, I’d suggest Bayesian Methods for Hackers as a really good starting point. But you can also definitely tackle this book head on, and it will be very rewarding.
After than Statistical Rethinking will take you much deeper into more complex experiment design using linear models and beyond as well as deepening your understanding of other areas of math required.
Statistical Rethinking is a good option too.
That course is a good balance between theory and practice. It gave me a practical intuition understanding why posterior distribution of parameters and data are important and how to compute them.
I took the course in 2016 so a lot could have changed.
It goes through fundamentals of Bayesian ideas in the context of applications in communication and machine learning problems. I find his explanations uncluttered.
One example is rare class prediction on long form text data eg phone calls, podcasts, transcripts. Other networks including neural networks and LLMs are either not flexible enough or require far too much data to achieve the necessary performance. Structured hierarchical modeling is the balance between those two extremes.
Another example is in genomic analysis. Similarly high dimensional, noisy, low data. Additionally, you don’t actually care about the predictions, you want to understand what genes or sets of genes are driving phenotypic behaviors.
I’d be happy to go into more depth via email or chat if this is something you are interested in (on my profile).
Some useful reads
[1] https://sturdystatistics.com/articles/text-classification
[2] https://pmc.ncbi.nlm.nih.gov/articles/PMC5028368/
Once you realize this you can easily develop very sophisticated testing models (if necessary) that are also easy to understand and reason about. This dramatically simplifies.
If you're looking for a specific book recommendation Statistical Rethinking does a good job covering this at length and Bayesian Statistics the Fun Way is a more beginner friendly book that covers the basics of Bayesian hypothesis testing.
There is an example in his book discussing efficacy trials across seven hospitals. If you stratify the data, you lose a lot of confidence, if you aggregate the data, you end up just modeling the difference between hospitals.
Hierarchical modeling allows you to split your dataset under a single unified model. This is really powerful for extracting signal for noise because you can split your dataset according to potential confounding variables eg the hospital from which the data was collected.
I am writing this on my phone so apologies for the lack of links, but in short the approach in this book is extremely relevant of medical testing.
What was surprising, though, was how reluctant the engineers are to learn such basic techniques. It's not like the math was hard. They all went through the first-year college math and I'm sure they did reasonably well.
Plenty of engineers have to take an introductory stats course, but it's not clear why you'd want your engineers to learn bayesian statistics? I would be surprised if they could correctly interpret a p-value or regression coefficient, let alone one with interaction effects. (It'd be wholly useless if they could, fwiw).
It'd be nice if the statisticians/'data scientists' on my team learned their way around the CI/CD pipelines, understood kubernetes pods, and could write their own distributed training versions of their pytorch models, but division-of-labor is a thing for a reason, and I don't expect them to nor need them to.
On a side note, I believe it is an individual's responsibility to find the coolness in their project. What's the fun of building a dashboard that I have done a thousand times? What's the fun of carrying out a routine that does not challenge me? But solving a problem in a most rigorous and generalized way? That is something in which an engineer can find some fun. Or maybe it's just me.
I also write a book on the topic which is focused a code and example approach. It's available for open access here. https://bayesiancomputationbook.com
You need 16 times the sample size to estimate an interaction than to estimate a main effect https://statmodeling.stat.columbia.edu/2018/03/15/need16/
Debate over effect of reduced prosecutions on urban homicides; also larger questions about synthetic control methods in causal inference. https://statmodeling.stat.columbia.edu/2023/10/12/debate-ove...
Bayesians moving from defense to offense: “I really think it’s kind of irresponsible now not to use the information from all those thousands of medical trials that came before. Is that very radical?” https://statmodeling.stat.columbia.edu/2023/12/23/bayesians-...
https://hn.algolia.com/?q=statmodeling.stat.columbia.edu
- https://statmodeling.stat.columbia.edu/2025/08/25/what-writi...
- https://statmodeling.stat.columbia.edu/2025/09/04/assembling...
Dead Comment
https://mlu-explain.github.io/linear-regression/
It cited Regression and Other Stories (though not the Bayesian chapters, which I'm now inspired to dig into before checking this out).