Introductory

Start Intermediate Advanced Projects Intern Activities

Getting Started with Python

This is an introductory level of Python which includes fundamentals of programming with concepts in object-oriented programming and introductory data-structures for example list, tuple and dictionary. More concepts in programming is presented in Algorithm and data structure section.

Lectures: Python for Everybody | Python Programming: A Concise Introduction | Python for Research

Tutorials: Python | From Python to Numpy | Python Numpy Tutorial | Python Course | Numpy Quickstart | Pandas | Scipy Lectures | Learn Python | Python, MySQL, and MongoDB

Cheat Sheets: Numpy Basics | Pandas Basics | Matplotlib Basics

Getting Started with Linux, Git, and GitHub

Why Linux for a data scientist? Linux is the best operating system for computing. Most of the servers are implemented on Linux computers. You should be familiar with basic Linux commands to run program, install packages, download milliomns of documents in a remote cloud/server etc. GitHub and Git are the best applications to keep track of your versions of programming files. These are called version control software.

Lectures: Introduction to Linux | Linux Command Line Basics | Version Control with Git | How to Use Git and GitHub

Applied Statistics: Probability, Inferential & Bayesian Statistics

Applied Statistics is a fundamental requirement for Data Science. The basic statistical analysis for example underlying distribution, correlation, sampling, hypothesis test, etc. are used to preprocess, feature engineering and model tuning. More detail of Bayesian learning is available in the advanced level.

Lectures : Statistics and R | Basic Statistics | Introduction to Probability and Data | Inferential Statistics | Bayesian Statistics |

Applied Math : Linear Algebra, Advanced Calculus & Optimization

Vectors and matrices are the leading players in the playground! All data samples are coordinate in n-dimensional vector space whereas features are degrees of freedom of the system. During feature engineering and model selection, your central struggle is to reduce the degrees of freedom because high degrees of freedom implies more complexity of the model and requires more computational power. Most of the algorithms in Machine Learning implements optimization techniques to train the model. To understand Optimization you need multivariable calculus. Which means you are taking derivatives of those aforementioned vectors and matrices. Deep learning even requires tensors..

Lectures: Essential Math for Machine Learning: Python Edition | Linear Algebra | Multivariate Calculus | Principle Component Analysis (PCA) | Optimization Methods for Business Analytics | Discrete Optimization

Introduction to Algorithm and Data Structure

If you write a program, the very first question is how much time it takes to get the result and how much space in the hard disk(stack-heap) it keeps busy while running your program - called space-time complexity. So writing a program for completing a task could be a requirement but not a sufficient approach when you implement the same code for BigData. There are a variety of algorithms and data structures with different space and time complexities. In the distributed/parallel computing domain, similar to space-time complexity, there are latency and throughput among the processing in separate lots.

Lectures: Algorithmic Toolbox | Data Structures | Algorithms and Data Structures

Tutrials: Algorithms and Data Structures from geeksforgeeks

Data Visualization (Static)

Data visualization is the way to advertise the power of data science. There are two ways to present data: as a static view (png, jpg, pdf, etc) and as an interactive app in the website using javascript, plotly, tablue, etc. In static data visualization, it produces a permanent figure without user interaction.

Lectures: Data Visualization in R | Applied Plotting, Charting & Data Representation in Python | Fundamentals of Visualization with Tableau

Tutrials: matplotlib | pandas | seaborn | Tidyverse

Cheat Sheets: ggplot part-I | Cheat Sheets: ggplot part-II