Data Science is a exiting field and Data Scientists need to be have a broad skillset. To strengthen my understanding in various concepts, as well as get more hands on experience, I have persued many Data Science related MOOCs. This directory contains description of various courses i persued along with the certificates I gained.
I divide essential skillsets into these following 4 categories and persue courses in areas where I need improvement.
Maths
- Probability and Statistics
- Linear Algibra
- Calculus
Programming Skills
- Python, R programming languages
- Data Structures and Algorithms
Systems
- Relational Databases, SQL
- Hadoop Ecosystem - HDFS, MapReduce, PIG, Hive
- NoSQL systems - HBase, MongoDB, Cassandra
- Cloud Computing concepts - Amazon Web Services
Data Science Core
- Data Wrangling, Cleaning, Manipulation and Exploratory Data Analytics
- Machine Learning
- Information Retrieval
- Data Mining
- Data Visualization
The Data Scientist’s Toolbox | Certificate
R Programming | Certificate
Getting and Cleaning Data | Certificate
Exploratory Data Analysis | Certificate
Reproducible Research | Certificate
Statistical Inference | Certificate
Regression Models | Certificate
Practical Machine Learning | Certificate
Developing Data Products | Certificate
Data Science Capstone | Currently persuing
- This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.
- Lot of practice in R programming and packages like dplyr, data.table, Hmisc, reshape2, ggplot2, lattice, caret, randomForest, tm, rmarkdown, shiny, sqldf, rsqlite, stringr, lubridate
- Final Capstone Project to apply the skills learned by building a data product using real-world data.
The Analytics Edge | Course Link
Certificate
- An applied understanding of many different analytics methods, including linear regression, logistic regression, CART, clustering, data visualization and mathematical optimization. Uses R programming language.
- Lot of real world analytics projects serving as examples and excersises.
Statistical Learning | Course Link
Certificate
- Introductory Machine Learning course with focus on supervised learning meathods like regression and classification. Uses R programming language.
- Companion book An Introduction to Statistical Learning
Introduction to Computer Science and Programming Using Python | Course Link
Certificate
- Object Oriented Programming, Data Structures, Testing and debugging, Basic Algorithms.
- Challenging programming assignments
Algorithms: Design and Analysis | Course link
Certificate
- Standard course covering concepts of fundemental algorithm design.
- Challenging exercises.
Design of Computer Programs | Course link
- Covers new concepts, patterns, and methods that will expand coding abilities. Uses Python programming language.
- Very challenging excersises
Machine Learning | Course link
- A broad introduction to machine learning, datamining, and statistical pattern recognition with emphasis on practical application of techniques. Excersises in MATLAB
Mining Massive Datasets | Course link
- Covers several topics like MapReduce, Link Analysis, Locality-Sensitive Hashing, Data Stream Mining, Recommender Systems, Dimensionality Reduction, Clustering, Support-Vector Machines, Decision Trees
- Companion book Mining Massive Datasets
- Very challening excersises.
Cloud Computing Applications | Course link
- Basic concepts underlying cloud services
- Data Analytics using colud services(Amazon Web Services). Covers many concepts underlying systems like Hadoop(HDFS, PIG, Hive), YARN, NoSQL databases(Hbase, Cassandra), Spark, GraphX, Manhot
###Some good books
Advanced R | link
An Introduction to Statistical Learning | link
The Elements of Statistical Learning | link
Mining of Massive Datasets | link
Hadoop: The Definitive Guide | [Not free]