Readit News logoReadit News
rugatelstvo · 10 years ago
I am under the impression that to learn statistics one must first have a working knowledge of probability theory which rests upon grad level math analysis. Can machine learning be studied without any of that?
RogerL · 10 years ago
A lot of this is done in discrete math. You know, the actual probability is defined by this integral, but there is no closed form solution to the integral, so we do sums to find the approximate answer. Anyone can understand sums. And, it's probabilities, so the sums must equal one. Not that hard, right ;)

It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.

You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.

Jump in, the water is fine!

[1] http://web.stanford.edu/~hastie/pub.htm

huac · 10 years ago
If you're serious about the math, read "The Elements of Statistical Learning' instead. Same guys, just as much R code, but harder.

http://statweb.stanford.edu/~tibs/ElemStatLearn/

sarwechshar · 10 years ago
Thanks a lot for this! I didn't realize probabilities would be so important but I've been working with conditional expectations (not sure if it is relevant in machine learning) but it was an eye opener.

Another great introduction are the descriptive and inferential statistics courses on Udacity!

btown · 10 years ago
Like most fields, it depends on your definition of "studied." If you want to push the envelope in theoretical non-applied research, you're going to want to learn analysis & measure theoretic probability theory. If you want to apply existing techniques, read (well-written) papers and code up the algorithms you find there, you can get away with undergraduate-level linear algebra & probability knowledge - Bayes' rule, expectations, independence, the general ability to think about random variables (and matrices thereof) as values that can be transformed and combined. And of course, you can fire up a classifier in SciPy without knowing any of this at all. But that's stretching the definition of "studied" quite a bit!

I personally went into a graduate-level probabilistic machine learning course with probability knowledge consisting of an undergraduate course that followed Ross http://www.amazon.com/Introduction-Probability-Models-Tenth-... - so there's certainly no need to have been a math major. But if you've never dealt with random variables whatsoever, you'll hit a wall following research from the last 20 years.

compbio · 10 years ago
There is applied machine learning (using machine learning to solve business problems) and theoretical machine learning (Optimization bounds, proofs, algorithm design).

With applied machine learning it is certainly possible to quickly get a working knowledge without too much reliance on statistics or difficult theory. You can compare this a bit with using a sorting function without knowing exactly how it works (but you know how fast it is and when to use it).

If you have an engineering background, take a look at the wide array of high-quality ML code and tools. Study trendy and powerful tools like XGBoost.

pvnick · 10 years ago
What do you mean, grad level math analysis? Much of probability theory can be learned with basic multivariate calculus. (Perhaps there's a terminology misunderstanding here - when I see "grad level" I think "grad school," ie masters/phd). Certainly basic probability theory is a plus.
rugatelstvo · 10 years ago
By grad level analysis I mean analysis based on Measure Theory. Here's where I got the idea(last comment in the linked thread): https://www.physicsforums.com/threads/what-is-the-most-usefu...
solomatov · 10 years ago
I think, you can get away without mathematical analysis (however, it's not that complex). However, basic probability theory is a must have.
eli_gottlieb · 10 years ago
Elementary (ie: math-speak for "undergrad-level") probability theory is quite accessible to someone with only a computer scientist's math classes. You really don't need real analysis until you start reading research papers on probability and they drop down into measure theory for this-and-that.
misiti3780 · 10 years ago
i would say you can can certainly get away with applying machine learning techniques without knowledge of probability theory, but if you want to do stuff like compare models, compare results, determine accuracy of your model, etc., you are going to quickly have to dive into basic statistics (bayes + frequentist)
gamapuna · 10 years ago
ojaved · 10 years ago
This is the 2013 course. The original post is for the spring 2015 machine learning course
bcheung · 10 years ago
The videos don't seem to work.
joshvm · 10 years ago
Some nice courses there, also check out Dan Cremer's lectures on variational methods for computer vision if you're interested in that sort of thing. There's also a nice series on computer vision for special effects.

http://www.computervisiontalks.com/variational-methods-for-c...

anacleto · 10 years ago
That's really nice. Dan Cremer is impressive.

Here's a great Laboratory on Amazon ML for Human Activity Recognition (w/ Python). https://cloudacademy.com/amazon-web-services/labs/aws-machin...

Totally worth a look.

yla92 · 10 years ago
A bit off topic : what are the best recommended way/resources to learn linear algebra and basic probability and statistics ?
stdbrouw · 10 years ago
For linear algebra, I like the "No BS guide to linear algebra" (https://gumroad.com/l/noBSLA) which also includes a high school math refresher for people who need it (I did).

For probability, "Probability Demystified" is a good basic intro.

For statistics, I would really recommend Allen Downey's Think Stats (http://greenteapress.com/thinkstats2/index.html), especially if you're coming from a programming background. Most introductions to statistics focus heavily on the mathematics needed to enable certain analytical approximations to difficult probabilistic calculations (e.g. the t-test), whereas Think Stats just bites that bullet and focuses on simulation / brute force so you can spend more time on the actual fundamental theory behind statistics.

Brian Blais' "Statistical Inference for Everyone" (http://web.bryant.edu/~bblais/statistical-inference-for-ever...) also looks really good, but haven't had a chance to review it in depth.

delluminatus · 10 years ago
Depends on how basic you're imagining. Khan Academy [0] is a fairly well-regarded free resource for high-school and undergraduate level mathematics video lectures. They have probability and statistics as well as linear algebra courses.

If you prefer textbooks, I have heard good things about "Linear Algebra Done Right," [1] but I would not recommend it unless you are "math literate" at an undergraduate level already.

[0] https://www.khanacademy.org/math/probability

[1] http://www.amazon.com/dp/3319110799

smockman36 · 10 years ago
I think this is a good resource to start: http://betterexplained.com/articles/linear-algebra-guide/
id_ris · 10 years ago
For linear algebra check out Prof. Gilbert Strand's course on MIT OCW. He's great at explaining the material and the course resources are comprehensive.

http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-...

chrisdbaldwin · 10 years ago
Some of the videos in the link are cut short, and the full videos are much better. Here's a link to a playlist of the full lectures: https://www.youtube.com/playlist?list=PLZSO_6-bSqHQmMKwWVvYw...
ojaved · 10 years ago
The videos on computervisiontalks.com are exactly the same as videos on youtube because the site is pulling these videos from youtube. The post points to the spring 2015 lectures. You are pointing to earlier lectures in 2014,2013
phunehehe0 · 10 years ago
Just want to shout about this very comprehensive course by Caltech professor Yaser S. Abu-Mostafa http://work.caltech.edu/telecourse.html
btown · 10 years ago
I like his teaching style, but it seems some of the lecture videos (1.3, for example) are cut off - very frustrating! For anyone watching nonetheless, I recommend going into YouTube and changing the speed to 1.5x.
ojaved · 10 years ago
The lectures on computervisiontalks are directly being taken from youtube (but tags, navigation, bookmarking and in-video search capability is added). The lecture 1.3 (for spring 2015 class) is exactly of the same length. However on youtube, the lectures for machine learning class 2013 (also by Alex Smola) are available which are of a different length.