Introduction to Machine Learning - Resources & References
This is a comprehensive collection of external learning materials, datasets, tools, and references.
Core Course Materials
Essential Textbooks (Free Online)
- Artificial Intelligence: A Modern Approach Free online
- Hastie, Tibshirani, and Friedman. The Elements of Statistical Learning Free online
- Murphy, K. P. (2022). Probabilistic machine learning: an introduction. Free online
Main Online Courses (in addition to course lectures)
- Stanford CS229: Machine Learning - Primary reference for mathematical foundations
- CS231n: Convolutional Neural Networks - Useful reference for deep learning and computer vision
- CS230: Deep Learning - Useful reference for deep learning
- FastAI - Practical deep learning for coders
Python Programming Resources
- CS231n Python/NumPy Tutorial - Essential for course notebooks
- Python Data Science Handbook - Free online book
- Pandas Documentation - Data manipulation
- Matplotlib Tutorials - Data visualization
Datasets for Learning and Projects
Lebanese Data Sources
- AUB Faculty of Health Sciences - Public Health Datasets
- Open Map Lebanon - Open Data
- Central Administration of Statistics - Lebanon
General Machine Learning Datasets
- UCI Machine Learning Repository - Classic ML datasets
- Kaggle Datasets - Competitions and community datasets
- Google Dataset Search - Search across millions of datasets
- Papers with Code Datasets - Research datasets with benchmarks
Scientific and Research Data
- The Well: 15TB of Physics Simulations - Large-scale physics simulations
- NASA Open Data - Space and Earth science datasets
- NOAA Climate Data - Weather and climate data
- European Space Agency Data - Satellite and Earth observation data
Economics and Social Data
- World Bank Open Data - Global development indicators
- UN Data - United Nations statistical databases
- OECD Data - Economic and social statistics
- Gapminder - Global development trends
Specialized Domains
- ImageNet - Large-scale image database
- Common Crawl - Web crawl data for NLP
- Million Song Dataset - Music information retrieval
- Yelp Open Dataset - Business and review data
Course-Specific Practice Datasets
For Linear Regression & Statistics
- Boston Housing (via scikit-learn) - Real estate price prediction
- California Housing (via scikit-learn) - Geographic price modeling
- Student Performance Dataset - Educational analytics
For Classification
- Iris Dataset (via scikit-learn) - Classic multiclass classification
- Credit Card Fraud Detection - Imbalanced classification
- Titanic Dataset - Binary classification with mixed data types
For Time Series
- Air Quality Dataset - Environmental monitoring
- Energy Consumption - Temporal patterns
Programming Tools & Libraries
Core Python Libraries
# Data Science Stack
import numpy as np # Numerical computing
import pandas as pd # Data manipulation
import matplotlib.pyplot as plt # Basic plotting
import seaborn as sns # Statistical visualization
import scipy as sp # Scientific computing
# Machine Learning
import sklearn # Classical machine learning
import torch # PyTorch deep learning
import tensorflow as tf # TensorFlow/Keras
Development Environment
- Anaconda - Python distribution with ML packages
- Jupyter Lab - Interactive development environment
- Google Colab - Free cloud-based notebooks
- VS Code - Code editor with Python extensions
- Cursor - AI-powered code editor
Specialized ML Tools
- Weights & Biases - Experiment tracking
- MLflow - ML lifecycle management
- Streamlit - Quick ML web apps
- Plotly - Interactive visualizations
Mathematical Foundations
Linear Algebra Resources
- 3Blue1Brown: Essence of Linear Algebra - Visual explanations
- Khan Academy Linear Algebra - Step-by-step tutorials
- MIT 18.06: Linear Algebra - Complete course
Statistics and Probability
- StatQuest YouTube Channel - Clear statistical explanations
- Think Stats (Free Book) - Statistics for programmers
- Seeing Theory - Interactive probability visualizations
Calculus for ML
- 3Blue1Brown: Essence of Calculus - Visual calculus
- Khan Academy Multivariable Calculus - For optimization
Video Learning Resources
Essential YouTube Channels
- Course Playlist - Course lectures**
- StatQuest with Josh Starmer - ML concepts clearly explained
- 3Blue1Brown - Mathematical intuition
- Two Minute Papers - Latest AI research
- Andrej Karpathy - Deep learning from scratch
Lecture Series
- Stanford CS229 Lectures - Andrew Ng’s machine learning course
- MIT 6.034 Artificial Intelligence - Comprehensive AI course
- Fast.ai Practical Deep Learning - Hands-on approach
Staying Current
News and Trends
- AI Research Blog (Google) - Latest research developments
- OpenAI Blog - Cutting-edge AI developments
- Towards Data Science (Medium) - Community articles
- The Batch (DeepLearning.AI) - Weekly AI newsletter
Academic Sources
- ArXiv.org - Preprint research papers
- Papers with Code - Papers with implementation
- Distill.pub - Clear explanations of ML research
- Nature Machine Intelligence - High-impact ML research
Practice and Competitions
Coding Practice
- LeetCode - Algorithm and data structure problems
- HackerRank - AI and ML challenges
- Codewars - Python programming practice
ML Competitions
- Kaggle Competitions - Data science competitions
- DrivenData - Social good competitions
- Zindi - African data science competitions
Project Ideas
- Beginner Projects:
- Predict house prices using regression
- Classify emails as spam/not spam
- Analyze social media sentiment
- Intermediate Projects:
- Build a recommendation system
- Create a chatbot using NLP
- Time series forecasting
- Advanced Projects:
- Computer vision for medical imaging
- Deep learning for scientific discovery
- Reinforcement learning for games
Career Resources
Portfolio Building
- GitHub - Code repository and portfolio
- Kaggle Profile - Competition track record
- Google Scholar - Academic publications
- Personal Website Templates - Showcase projects
Interview Preparation
- Machine Learning Interview Guide - Comprehensive prep
- Cracking the Coding Interview - Algorithm questions
- Elements of Programming Interviews - Technical interviews
Community and Support
Online Communities
- Reddit r/MachineLearning - Research discussions
- Stack Overflow - Technical Q&A
- Cross Validated - Statistics and ML theory
- Discord ML Community - Real-time discussions
Quick Reference Sheets
Python Cheat Sheets
ML Algorithm Selection
- Scikit-learn Algorithm Cheat Sheet - When to use which algorithm
- ML Algorithms Comparison - Pros and cons
Scientific Machine Learning
Physics-Informed ML
- Physics-Informed Neural Networks (PINNs) - Original implementation
- DeepXDE - Library for scientific ML
- SciML Ecosystem - Julia-based scientific ML tools
Research Papers
- Physics-informed machine learning (2021) - Comprehensive review
- Scientific discovery in the age of artificial intelligence (2023) - AI for science
- Machine learning for molecular and materials science (2018) - Materials applications
Topic-Specific Deep Learning Resources
Deep Learning Fundamentals
- MIT Introduction to Deep Learning - 2023 - Comprehensive course
- 3Blue1Brown: Neural Networks - Visual explanations
- Introduction to Graph Neural Networks - Graph-based learning
- Andrej Karpathy’s Neural Networks from Scratch - Hands-on approach
Advanced Architectures
- Introduction to Diffusion Models - Generative models
- Variational Autoencoders - Arxiv Insights - VAE explanation
- Introduction to Graph Neural Networks - Microsoft - GNN fundamentals
Time Series and Sequences
- Introduction to RNNs by StatQuest - Sequential data
- Decoder-only foundation model for time series - Google Research
Transformers and Language Models
- But what is a GPT? - 3Blue1Brown - Visual transformer explanation
- Transformer Neural Networks - StatQuest - Clear explanations
- Introduction to Large Language Models - Andrej Karpathy - LLM overview
- Geoffrey Hinton on Intelligence and LLMs (2024) - Pioneer perspectives
- ChatGPT: 30 Year History - Historical context
- The Transformer - Yannic Kilcher - Paper review
This resource list is continuously updated. Suggest additions by opening an issue in the course repository or proposing them on Slack.