Ideas in statistics that have powered AI

> Generative adversarial networks, or GANs, are a conceptual advance that allow reinforcement learning problems to be solved automatically. They mark a step toward the longstanding goal of artificial general intelligence while also harnessing the power of parallel processing so that a program can train itself by playing millions of games against itself. At a conceptual level, GANs link prediction with generative models.

What? Every sentence here is so wrong I have a hard time seeing what kind of misunderstanding would lead to this.

GAN's are a conceptual advance of generative models (i.e. models that can generate more, similar data). Reinforcement learning is a separate field. Parallel processing is ubiquitous, and has nothing to do with GANs or reinforcement learning (they are both usually pretty parallellized). Self-play sounds like they wanted to talk about the alphago/alphazero papers? And GANs are infamously not really predictive/discriminative. If anything, they thoroughly disconnected predicition from generative models.

gyom · 4 years ago

You're right that this is spectacularly wrong.

I dare not even read the rest of the page just in case my brain accidentally absorbs other bad information like that paragraph about GANs.

whimsicalism · 4 years ago

> GAN's are a conceptual advance of generative models (i.e. models that can generate more, similar data).

This is something I've long had confusion with, coming from a probabilistic perspective.

How does a GAN model the joint probability of the data? My understanding was that was what a generative model does. There doesn't seem to be a clear probabilistic interpretation of a GAN whatsoever.

gyom · 4 years ago

Part of the cleverness of GANs was to have found a way to train a neural network that generates data without explicitly modeling the probability density.

In a stats textbook, when you know that your training data comes from a normal distribution, you can maximize the MLE wrt the parameters, and then use that for sampling. That's basic theory.

In practice, it was very hard to learn a good pdf for experimental data when you had a training set of images. GANs provided a way to bypass this.

Of course, people could have said "hey let's generate samples without maximizing a loglikelihood first", but they didn't know how to do it properly, how to train the network in any other way besides minimizing cross-entropy (which is equivalent to maximizing loglikelihood).

Then GANs actually provided a new loss function that could be trained. Total paradigm shift!

nancarrow · 4 years ago

The generator implicitly models a joint probability of data by being a generative process that one can draw samples from. GAN training (at least under certain simplifying assumptions) minimizes the JS divergence between the generator distribution and the data distribution.

Deleted Comment

How have they attributed GANs and Deep Learning to Statistics? I thought Goodfellow was doing an AI PhD and that Hinton is a biologically inspired / neuroscience fellow?

fighterpilot · 4 years ago

See the table in this link for a humorous comparison by Tibshirani:

https://brenocon.com/blog/2008/12/statistics-vs-machine-lear...

MAXPOOL · 4 years ago

Deep learning machine learning models are statistical and probabilistic models. You can categorize Deep Learning under both computer science and statistics. For example stat.ML and cs.LG in Arxiv.

Machine learning and statistics are closely related fields, both historically and in current practice and methodology.

sgt101 · 4 years ago

My working definition is that statisticians choose and engineer models while machine learning searches a vast space of models.

6gvONxR4sf7o · 4 years ago

The only easy real division between stats and ML is in universities, where it's just a question of which department. If it's CS, then it's ML or AI. If it's stats, it's stats.

If it's industry, it's whatever the marketing department decides, inevitably AI :(

sjg007 · 4 years ago

Or data science.

xvilka · 4 years ago

ML, DL, etc have nothing to do with biology or neurology. Neurons and brains work completely differently.

317070 · 4 years ago

hyttioaoa · 4 years ago

"Generalized adversarial networks, or GANs, are a conceptual advance that allow reinforcement learning problems to be solved automatically." -

"Generalized" :D Also the description is nonsense. This has nothing to do with reinforcement learning. Makes me wonder about the rest.

cscurmudgeon · 4 years ago

Now, if a press release from a top univ is so wrong on something that is easily checkable, how accurate are other forms of news?

nerdponx · 4 years ago

Think of "press release wrongness" with a probability distribution. Some press releases are really good, some are really bad. A sensible prior would be somewhere in the middle. If you start to see a lot of bad press releases, then you can update your posterior towards "I can't trust any of these."

totoglazer · 4 years ago

The paper has it right, at least.

bjornsing · 4 years ago

I’m sorely missing Maximum Likelihood Estimation (MLE). It’s a statistical technique that goes back to Gauss and Laplace but was popularized by Fisher. In AI/ML it’s often referred to as “minimizing cross-entropy loss”, but this is just a misappropriation / reinvention of the wheel. The math is the same and MLE is a much more sane theoretical framework.

MontyCarloHall · 4 years ago

“Cross entropy” specifically refers to the log-likelihood function of a binary random variable, and is only used as the cost function for binary classifiers. It does not refer to likelihood functions in general.

ansk · 4 years ago

Do people not google terms before trying to speak authoritatively on a topic they aren't familiar with? The original commenter is correct, cross entropy is a generic measure of two probability distributions - in the case of maximum likelihood estimation, these are the data distribution and the distribution of the learned model.

What are you missing about it? MLE is the bread and butter of any deep learning architecture - it is how you train the network!

e: Ah I'm a dunce - missing from the article.

ehw3 · 4 years ago

> 2. John Tukey (1977). Exploratory Data Analysis.

> This book has been hugely influential and is a fun read that can be digested in one sitting.

Wow. The PDF is over 700 pages. That seems fairly impressive for single-sitting digestion.

heinrichhartman · 4 years ago

Out of the 10 papers I am able to download 3 of them freely.

- For the papers I am quoted 26EUR - 39EUR

- For the books I am quoted 129EUR - 133EUR

This is audacious. Some of these papers are form the 70ies. And I highly doubt that the authors get any royalties from those sales.

the_svd_doctor · 4 years ago

Authors never get _any_ royalties from paper sales as far as I know :) (for books maybe).

shakow · 4 years ago

> (for books maybe).

We do. I don't know if it's the general rule, but for the one I partook in, we get ~20 €cents.sale-1.author-1.

nolroz · 4 years ago

gwern · 4 years ago

I'm not sure what you mean. I went through the list myself, and while the books are obviously only on Libgen, the only one I didn't find readily available in Google Scholar was the AIC paper (https://www.gwern.net/docs/statistics/decision/1998-akaike.p...), and you can safely assume any paper in GS is in SH too.

ur-whale · 4 years ago

sci-hub FTW

why would you want to feed the parasites?

vcdimension · 4 years ago

I'm surprised they didn't mention support vector machines and the kernel trick which was discovered by statisticians.

srean · 4 years ago

Although Vapnik's treatise is called Statistical Learning Theory, neither statisticians nor he himself identifies himself as a statistician. In fact his proposals were quite radically different from the established norm in contemporary statistics. The same holds for Corrina Cortes.

Kernel 'trick', representer theorem etc are far older and have their origins in functional analysis

JHonaker · 4 years ago

I highly doubt that a person with a PhD in statistics doesn't identify as a statistician.

eachro · 4 years ago

Putting aside discovery, I've always considered SVMs to be in the realm of optimization rather than ML or statistics (though I suppose you could then also put modern deep learning under optimization too).

dkshdkjshdk · 4 years ago

Why? No one uses SVM as a solver/optimization method (though you do need a solver/optimization method to train a SVM).

Same with "modern deep learning" (whatever that may be): just because you need to optimize something doesn't make the field "optimization". Just because I'm using stochastic gradient descent (or some other optimization method) in the course of my work, doesn't mean that I'm working in the field of Optimization.

bmc7505 · 4 years ago

https://statmodeling.stat.columbia.edu/2020/12/09/what-are-t...