I would love to get started with data analysis. Could anyone point me to a reliable guide that gets me started on the bascis. I'm pretty well versed with Python and have used Pandas before as part of some side projects.
This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.
As a rookie trying to get into the field myself, I think there are quite a few ways to start about it.
The programming part with R, python, julia etc., seems to get the most attention here. I think the most important part here is to learn how to load datasets into your system of choice and work with them to get some nice plots out. The book "R for data science"[1] seems like a good intro for this with R and tidyverse.
Somewhat more overlooked here, are the statistical models. I second the recommendation of "Introduction to Statistical Learning"[2], possibly supplemented with it's big brother "Elements of Statistical Learning"[3] if you're more mathematically inclined and want more details. I like their emphasis on starting with simple models and working your way up. I also found their discussion on how to go from data to a mathematical model very lucid.
While a really cool approach I would highly discourage a beginner to get started with this. Bayesian methods are not what most people use and the book does not teach any of the necessary basics in terms of analysis of real world data
there's a pretty good book out there "Data Science from Scratch" it's all in python and really lays out the workings of popular data science / analysis algorithms without relying heavily on external libraries.
Thanks!
http://www-bcf.usc.edu/~gareth/ISL/
This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.
Python
https://www.edx.org/micromasters/data-science
The programming part with R, python, julia etc., seems to get the most attention here. I think the most important part here is to learn how to load datasets into your system of choice and work with them to get some nice plots out. The book "R for data science"[1] seems like a good intro for this with R and tidyverse.
Somewhat more overlooked here, are the statistical models. I second the recommendation of "Introduction to Statistical Learning"[2], possibly supplemented with it's big brother "Elements of Statistical Learning"[3] if you're more mathematically inclined and want more details. I like their emphasis on starting with simple models and working your way up. I also found their discussion on how to go from data to a mathematical model very lucid.
[1] http://r4ds.had.co.nz/
[2] http://www-bcf.usc.edu/~gareth/ISL/
[3] http://web.stanford.edu/~hastie/ElemStatLearn/
Some links: http://p.migdal.pl/2016/03/15/data-science-intro-for-math-ph...