There is an analogue of the CLT for extreme values. The Fisher–Tippett–Gnedenko theorem is the extreme-values analogue of the CLT: if the properly normalized maximum of an i.i.d. sample converges, it must be Gumbel, Fréchet, or Weibull—unified as the Generalized Extreme Value distribution. Unlike the CLT, whose assumptions (in my experience) rarely hold in practice, this result is extremely general and underpins methods like wavelet thresholding and signal denoising—easy to demonstrate with a quick simulation.
There's also a more conservative rule similar to the CLT that works off of the definition of variance, and thus rests on no assumptions other than the existence of variance. Chebyshev's inequality tells us that the probability that any sample is more than k standard deviations away is bounded by 1/k².
In other words, it is possible (given sufficiently weird distributions) that not a single sample lands inside one standard deviation, but 75% of them must be inside two standard deviations, 88% inside three standard deviations, and so on.
There's also a one-sided version of it (Cantelli's inequality) which bounds the probability of any sample by 1/(1+k)², meaning at least 75 % of samples must be less than one standard deviation, 88% less than two standard deviations, etc.
Think of this during the next financial crisis when bank people no doubt will say they encountered "six sigma daily movements which should happen only once every hundred million years!!" or whatever. According to the CLT, sure, but for sufficiently odd distributions the Cantelli bound might be a more useful guide, and it says six sigma daily movements could happen as often as every fifty days.
I highly doubt the finance bros pretend distributions are normal or don't know Chebychev, vs. not having enough data to obtain the covariance structure (for rare events) to properly bound even with Chebychev.
> It’s very subjective, but I think the uniform stsrts looking reasonably good at a sample size of 8. The exponential however takes much longer to converge to a normal.
That's a good observation. The main idea behind the Central Limit Theorem is to take the Fourier Transform, operate and then go back. After that, after normalization the result is that the new distribution for the sum of N variables is something like
Highly entertaining, here a little fun fact: there exist a generalisation of the central limit theorem for distributions without find out variance.
For some reasons this is much less known, also the implications are vast. Via the detour of stable distributions and limiting distributions, this generalised central limit theorem plays an important role in the rise of power laws in physics.
3blue1brown has a great series of videos on the central limit theorem, and it makes me wish there were something similar covering the generalised form in a similar format. I have a textbook on my reading list that covers it, unfortunately I'm I can't seem to find it or the title right now. (edit: it's "The Fundamentals of Heavy Tails" by Nair, Wierman, and Zwart from 2022)
Do you have any good sources for the physics angle?
I thought the rise of power laws in physics is predominantly attributed to Kesten's law concerning multiplicative processes, e.g. https://arxiv.org/pdf/cond-mat/9708231
Yes, came here to say the same thing. Telling people that the CLT makes strong assumptions is important.
Otherwise, they might end up underestimating rare events, with potentially catastrophic consequences. There are also CLTs for product and max operators, aside from the sum.
The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation discusses these topics in a rigorous way, but without excessive mathematics. See: https://adamwierman.com/book
I love the simulations. They are such a good way to learn STATS... you can still look at the theorem using math notation after, but if you've seen it work first using simulated random samples, then the math will make a lot more sense.
This is a very neat illustration, but I want to leave a reminder that when we cherry-pick well-behaved distributions for illustrating the CLT, people get unrealistic expectations of what it means: https://entropicthoughts.com/it-takes-long-to-become-gaussia...
The article you link is not using the CLT correctly.
The CLT gives a result about a recentered and rescaled version of the sum of iid variates. CLT does not give a result about the sum itself, and the article is invoking such a result in the “files” and “lakes” examples.
I’m aware that it can appear that CLT does say something about the sum itself. The normal distribution of the recentered/rescaled sum can be translated into a distribution pertaining to the sum itself, due to the closure of Normals under linear transformation. But the limiting arguments don’t work any more.
What I mean by that statement: in the CLT, the errors of the distributional approximation go to zero as N gets large. For the sum, of course the error will not go to zero - the sum itself is diverging as N grows, and so is its distribution. (The point of centering and rescaling is to establish a non-diverging limit distribution.)
So for instance, the third central moment of the Gaussian is zero. But the third central moment of a sum of N iid exponentials will diverge quickly with N (it’s a gamma with shape parameter N). This third-moment divergence will happen for any base distribution with non-zero skew.
The above points out another fact about the CLT: it does not say anything about the tails of the limit distribution. Just about the core. So CLT does not help with large deviations or very low-probability events. This is another reason the post is mistaken, which you can see in the “files” example where it talks about the upper tail of the sum. The CLT does not apply there.
Postscript: looking at the lesswrong link referenced by the post above, you will notice that the “eyeball metric” density plots happen to be recentered and scaled so that they capture the mass of the density. This is the graphical counterpart of the algebraic scaling and centering needed in the CLT.
There's an interesting extension of the Central Limit Theorem called the Edgeworth Series. If you have a large but finite sample, the resulting distribution will be approximately Gaussian, but will deviate from a Gaussian distribution in a predictable way described by Hermite polynomials.
The intuition behind it is that when we take batches of samples from some arbitrarily shaped distribution, and we summarize the information by looking at the mean values of the batches of samples, we find that those mean values are moving away from the arbitrarily shaped distribution. The larger are the batches, the more those means approach a normal distribution.
In other words, the means of large batches of samples from some funny shaped distribution themselves constitute a sequence of numbers, and that sequence follow a normal distribution. Or closer and closer to one the larger the batches are.
This observation legitimizes our uses of statistical inference tools derived from the normal distribution, like confidence intervals, provided we are working with large enough batches of samples.
The definition under "A Brief Recap" seems incorrect. The sample size doesn't approach infinity, the number of samples does. I'm in a similar situation to the author, I skipped stats, so I could be wrong. Overall great article though.
Yes indeed, if the sample size approached infinity (and not the number of samples), you would essentially just be calculating the mean of the original distribution.
In other words, it is possible (given sufficiently weird distributions) that not a single sample lands inside one standard deviation, but 75% of them must be inside two standard deviations, 88% inside three standard deviations, and so on.
There's also a one-sided version of it (Cantelli's inequality) which bounds the probability of any sample by 1/(1+k)², meaning at least 75 % of samples must be less than one standard deviation, 88% less than two standard deviations, etc.
Think of this during the next financial crisis when bank people no doubt will say they encountered "six sigma daily movements which should happen only once every hundred million years!!" or whatever. According to the CLT, sure, but for sufficiently odd distributions the Cantelli bound might be a more useful guide, and it says six sigma daily movements could happen as often as every fifty days.
This means as little as 50% can be less than one standard deviation, as little as 80% below two standard deviations, etc.
That's a good observation. The main idea behind the Central Limit Theorem is to take the Fourier Transform, operate and then go back. After that, after normalization the result is that the new distribution for the sum of N variables is something like
Where "Skewness" is a number defined in https://en.wikipedia.org/wiki/SkewnessThe uniform distribution is symmetric, so skewness=0 and the correction decrease like 1/N^2.
The exponential distribution is very asymmetrical and and skewness!=0, so the main correction is like 1/N and takes longer to dissapear.
For some reasons this is much less known, also the implications are vast. Via the detour of stable distributions and limiting distributions, this generalised central limit theorem plays an important role in the rise of power laws in physics.
Do you have any good sources for the physics angle?
Otherwise, they might end up underestimating rare events, with potentially catastrophic consequences. There are also CLTs for product and max operators, aside from the sum.
The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation discusses these topics in a rigorous way, but without excessive mathematics. See: https://adamwierman.com/book
Finite?
Here is a notebook with some more graphs and visualizations of the CLT: https://nobsstats.com/site/notebooks/28_random_samples/#samp...
runnable link: https://mybinder.org/v2/gh/minireference/noBSstats/main?labp...
The CLT gives a result about a recentered and rescaled version of the sum of iid variates. CLT does not give a result about the sum itself, and the article is invoking such a result in the “files” and “lakes” examples.
I’m aware that it can appear that CLT does say something about the sum itself. The normal distribution of the recentered/rescaled sum can be translated into a distribution pertaining to the sum itself, due to the closure of Normals under linear transformation. But the limiting arguments don’t work any more.
What I mean by that statement: in the CLT, the errors of the distributional approximation go to zero as N gets large. For the sum, of course the error will not go to zero - the sum itself is diverging as N grows, and so is its distribution. (The point of centering and rescaling is to establish a non-diverging limit distribution.)
So for instance, the third central moment of the Gaussian is zero. But the third central moment of a sum of N iid exponentials will diverge quickly with N (it’s a gamma with shape parameter N). This third-moment divergence will happen for any base distribution with non-zero skew.
The above points out another fact about the CLT: it does not say anything about the tails of the limit distribution. Just about the core. So CLT does not help with large deviations or very low-probability events. This is another reason the post is mistaken, which you can see in the “files” example where it talks about the upper tail of the sum. The CLT does not apply there.
https://en.wikipedia.org/wiki/Edgeworth_series
In other words, the means of large batches of samples from some funny shaped distribution themselves constitute a sequence of numbers, and that sequence follow a normal distribution. Or closer and closer to one the larger the batches are.
This observation legitimizes our uses of statistical inference tools derived from the normal distribution, like confidence intervals, provided we are working with large enough batches of samples.
https://en.wikipedia.org/wiki/Central_limit_theorem