> Generative adversarial networks, or GANs, are a conceptual advance that allow reinforcement learning problems to be solved automatically. They mark a step toward the longstanding goal of artificial general intelligence while also harnessing the power of parallel processing so that a program can train itself by playing millions of games against itself. At a conceptual level, GANs link prediction with generative models.
What? Every sentence here is so wrong I have a hard time seeing what kind of misunderstanding would lead to this.
GAN's are a conceptual advance of generative models (i.e. models that can generate more, similar data). Reinforcement learning is a separate field. Parallel processing is ubiquitous, and has nothing to do with GANs or reinforcement learning (they are both usually pretty parallellized). Self-play sounds like they wanted to talk about the alphago/alphazero papers? And GANs are infamously not really predictive/discriminative. If anything, they thoroughly disconnected predicition from generative models.
> GAN's are a conceptual advance of generative models (i.e. models that can generate more, similar data).
This is something I've long had confusion with, coming from a probabilistic perspective.
How does a GAN model the joint probability of the data? My understanding was that was what a generative model does. There doesn't seem to be a clear probabilistic interpretation of a GAN whatsoever.
Part of the cleverness of GANs was to have found a way to train a neural network that generates data without explicitly modeling the probability density.
In a stats textbook, when you know that your training data comes from a normal distribution, you can maximize the MLE wrt the parameters, and then use that for sampling. That's basic theory.
In practice, it was very hard to learn a good pdf for experimental data when you had a training set of images. GANs provided a way to bypass this.
Of course, people could have said "hey let's generate samples without maximizing a loglikelihood first", but they didn't know how to do it properly, how to train the network in any other way besides minimizing cross-entropy (which is equivalent to maximizing loglikelihood).
Then GANs actually provided a new loss function that could be trained. Total paradigm shift!
The generator implicitly models a joint probability of data by being a generative process that one can draw samples from. GAN training (at least under certain simplifying assumptions) minimizes the JS divergence between the generator distribution and the data distribution.
Think of "press release wrongness" with a probability distribution. Some press releases are really good, some are really bad. A sensible prior would be somewhere in the middle. If you start to see a lot of bad press releases, then you can update your posterior towards "I can't trust any of these."
I’m sorely missing Maximum Likelihood Estimation (MLE). It’s a statistical technique that goes back to Gauss and Laplace but was popularized by Fisher. In AI/ML it’s often referred to as “minimizing cross-entropy loss”, but this is just a misappropriation / reinvention of the wheel. The math is the same and MLE is a much more sane theoretical framework.
“Cross entropy” specifically refers to the log-likelihood function of a binary random variable, and is only used as the cost function for binary classifiers. It does not refer to likelihood functions in general.
Do people not google terms before trying to speak authoritatively on a topic they aren't familiar with? The original commenter is correct, cross entropy is a generic measure of two probability distributions - in the case of maximum likelihood estimation, these are the data distribution and the distribution of the learned model.
I'm not sure what you mean. I went through the list myself, and while the books are obviously only on Libgen, the only one I didn't find readily available in Google Scholar was the AIC paper (https://www.gwern.net/docs/statistics/decision/1998-akaike.p...), and you can safely assume any paper in GS is in SH too.
Although Vapnik's treatise is called Statistical Learning Theory, neither statisticians nor he himself identifies himself as a statistician. In fact his proposals were quite radically different from the established norm in contemporary statistics. The same holds for Corrina Cortes.
Kernel 'trick', representer theorem etc are far older and have their origins in functional analysis
Putting aside discovery, I've always considered SVMs to be in the realm of optimization rather than ML or statistics (though I suppose you could then also put modern deep learning under optimization too).
Why? No one uses SVM as a solver/optimization method (though you do need a solver/optimization method to train a SVM).
Same with "modern deep learning" (whatever that may be): just because you need to optimize something doesn't make the field "optimization". Just because I'm using stochastic gradient descent (or some other optimization method) in the course of my work, doesn't mean that I'm working in the field of Optimization.
How have they attributed GANs and Deep Learning to Statistics? I thought Goodfellow was doing an AI PhD and that Hinton is a biologically inspired / neuroscience fellow?
Deep learning machine learning models are statistical and probabilistic models.
You can categorize Deep Learning under both computer science and statistics. For example stat.ML and cs.LG in Arxiv.
Machine learning and statistics are closely related fields, both historically and in current practice and methodology.
The only easy real division between stats and ML is in universities, where it's just a question of which department. If it's CS, then it's ML or AI. If it's stats, it's stats.
If it's industry, it's whatever the marketing department decides, inevitably AI :(
What? Every sentence here is so wrong I have a hard time seeing what kind of misunderstanding would lead to this.
GAN's are a conceptual advance of generative models (i.e. models that can generate more, similar data). Reinforcement learning is a separate field. Parallel processing is ubiquitous, and has nothing to do with GANs or reinforcement learning (they are both usually pretty parallellized). Self-play sounds like they wanted to talk about the alphago/alphazero papers? And GANs are infamously not really predictive/discriminative. If anything, they thoroughly disconnected predicition from generative models.
I dare not even read the rest of the page just in case my brain accidentally absorbs other bad information like that paragraph about GANs.
This is something I've long had confusion with, coming from a probabilistic perspective.
How does a GAN model the joint probability of the data? My understanding was that was what a generative model does. There doesn't seem to be a clear probabilistic interpretation of a GAN whatsoever.
In a stats textbook, when you know that your training data comes from a normal distribution, you can maximize the MLE wrt the parameters, and then use that for sampling. That's basic theory.
In practice, it was very hard to learn a good pdf for experimental data when you had a training set of images. GANs provided a way to bypass this.
Of course, people could have said "hey let's generate samples without maximizing a loglikelihood first", but they didn't know how to do it properly, how to train the network in any other way besides minimizing cross-entropy (which is equivalent to maximizing loglikelihood).
Then GANs actually provided a new loss function that could be trained. Total paradigm shift!
Deleted Comment
"Generalized" :D Also the description is nonsense. This has nothing to do with reinforcement learning. Makes me wonder about the rest.
Deleted Comment
Deleted Comment
e: Ah I'm a dunce - missing from the article.
> This book has been hugely influential and is a fun read that can be digested in one sitting.
Wow. The PDF is over 700 pages. That seems fairly impressive for single-sitting digestion.
- For the papers I am quoted 26EUR - 39EUR
- For the books I am quoted 129EUR - 133EUR
This is audacious. Some of these papers are form the 70ies. And I highly doubt that the authors get any royalties from those sales.
We do. I don't know if it's the general rule, but for the one I partook in, we get ~20 €cents.sale-1.author-1.
why would you want to feed the parasites?
Kernel 'trick', representer theorem etc are far older and have their origins in functional analysis
Same with "modern deep learning" (whatever that may be): just because you need to optimize something doesn't make the field "optimization". Just because I'm using stochastic gradient descent (or some other optimization method) in the course of my work, doesn't mean that I'm working in the field of Optimization.
https://brenocon.com/blog/2008/12/statistics-vs-machine-lear...
Machine learning and statistics are closely related fields, both historically and in current practice and methodology.
If it's industry, it's whatever the marketing department decides, inevitably AI :(