My Data Science Roadmap
Posted by signal on August 3rd, 2012
I have set a goal to learn Data Analytics and began this journey a while back. One means which I am learning Data Science by is EMC’s Data Science Training. They succinctly outline the skills I am looking to master for building a practical foundation of analytics:
Problem | Category of Techniques | Methods to Learn |
Group items by similarity Find structure and commonalities in the data | Clustering | K-means clustering |
Discover relationships between actions or items | Association Rules | Apriori |
Discover relationships between the outcome and input variables | Regression | Linear Regression Logistic Regression |
Assign (known) labels to objects | Classification | Naïve Bayes Decision Trees |
Find the structure in a temporal process Forecast the behavior of a temporal process | Time Series Analysis | ACF, PACF, ARIMA |
Analyze text data | Text Analysis | Regular Expressions, Document representation (Bag of Words), TF-IDF |
In addition to the above I plan to approach with foundation knowledge in Mathematics, Computer Science, Machine Learning, Artificial Intelligence, Predictive Analytics and Life Science. Some of this will be via my degree program at Harvard, however the program I am in, Information Technology, only gives some courses that are useful in Data Science. Other knowledge will come from additional courses I will take outside of my degree program, books, and possibly even the pursuit of another graduate degree specific to Data Analytics.
A few degree programs that look very attractive are below. The prerequisites are what prevent me from pursuing one of these programs at this time. I have significant amount of work I need to do to get my Mathematics and Life Sciences foundations built up before I would be able to be admitted. My background is in technology and computer science, which is very useful to Data Science, but only one part of a much larger domain of knowledge.
Master of Science in Bioinformatics – John Hopkins University
Master of Science in Analytics – North Carolina State University
Master of Science in Analytics – Northwestern University
Master of Science in Predictive Analytics – Northwestern University
Mining Massive Data Sets Graduate Certificate – Stanford University
MSc Machine Learning – University of London
Master of Science in Data Mining – Central Connecticut State University
Master of Science Biomedical Informatics
College Courses I will take outside of Harvard (all of the below have co-requisite labs as well):
Biology I
Biology II
Chemistry I
Chemistry II
Organic Chemistry I
Organic Chemistry II
Courses I am taking or have taken at Harvard that will help in Data Science:
Introduction to Statistics
Java for Distributed Computing
Oracle Database Administration
Visualization
Computing Foundations for Computational Science
Books I will be working through:
R
Data Mining with R: Learning with Case Studies (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
The R Book
Data Mashups in R
R in a Nutshell: A Desktop Quick Reference
R Cookbook (O’Reilly Cookbooks)
Getting Started with RStudio
Parallel R
Statistics
Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics)
Think Stats
Statistics in a Nutshell: A Desktop Quick Reference (In a Nutshell (O’Reilly))
Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds
Linear Algebra
Introduction to Linear Algebra, Fourth Edition
Machine Learning
Machine Learning in Action
Machine Learning for Hackers
Data Mining
Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
21 Recipes for Mining Twitter
Big Data Glossary
Data Analysis with Open Source Tools
Visualization
Designing Data Visualizations
Now You See It: Simple Visualization Techniques for Quantitative Analysis
Beautiful Visualization: Looking at Data through the Eyes of Experts (Theory in Practice)
Visualize This: The FlowingData Guide to Design, Visualization, and Statistics
Hadoop
Hadoop: The Definitive Guide
HBase: The Definitive Guide
Programming Pig
Cassandra: The Definitive Guide
There is much I have left out, I am sure, and if anyone has any good books to recommend please do. I have found the Quora fourms to be particularly helpful in networking with others about Data Science.
June 21st, 2013 at 5:08 pm
Hi Brian:
Could you let me know the source of the ‘data brain’ graphic ?
Thank you.
June 21st, 2013 at 5:25 pm
I am not sure. If you download the graphic and goto tineye.com you can upload it and you will see its everywhere. Who to attribute it to, I don’t know.