Data Science & Machine Learning¶
Comprehensive reference covering statistics, machine learning, deep learning, computer vision, NLP, and applied data science. From mathematical foundations through production deployment.
Foundations¶
- math precalculus - number systems, equations, functions, sets, combinatorics
- math logic - propositional logic, first-order logic, proof techniques, computability
- math for ml - calculus, optimization, gradient descent, backpropagation
- math linear algebra - vectors, matrices, eigenvalues, SVD
- math probability statistics - probability theory, estimation, MLE, confidence intervals
Statistics & Probability¶
- descriptive statistics - central tendency, spread, shape, correlation, z-scores
- probability distributions - Bernoulli, binomial, Poisson, normal, exponential, CLT
- hypothesis testing - A/B testing, statistical tests, CUPED, experiment design
- causal inference - DiD, propensity score matching, synthetic control, DAGs
- bias variance tradeoff - overfitting, underfitting, regularization, ensemble tradeoffs
Tools & Languages¶
- python for ds - Python fundamentals for data science, Jupyter/Colab
- numpy fundamentals - array operations, linear algebra, random generation
- pandas eda - DataFrame manipulation, groupby, filtering, EDA workflow
- data visualization - matplotlib, seaborn, plotly, chart selection
- sql for data science - queries, window functions, CTEs, analytics patterns
Classical Machine Learning¶
- linear models - linear/logistic regression, gradient descent, regularization
- gradient boosting - CatBoost, XGBoost, LightGBM, Random Forest, hyperparameters
- knn and classical ml - KNN, SVM, decision trees, algorithm selection guide
- unsupervised learning - K-Means, DBSCAN, PCA, t-SNE, UMAP, SVD
- bayesian methods - Bayes' theorem, Naive Bayes, Bayesian inference
Deep Learning¶
- neural networks - architecture, training, activation functions, optimizers, regularization
- cnn computer vision - convolutions, architectures (ResNet, YOLO), detection, segmentation
- nlp text processing - tokenization, TF-IDF, embeddings, transformers, BERT
- rnn sequences - LSTM, GRU, bidirectional, sequence-to-sequence
- generative models - GANs, VAEs, diffusion models, CycleGAN
- transfer learning - pre-trained models, fine-tuning strategies, domain adaptation
- data augmentation - image/text/tabular augmentation, SMOTE
Techniques & Evaluation¶
- feature engineering - scaling, encoding, imputation, selection, pipelines
- model evaluation - metrics (MAE, ROC AUC, F1), cross-validation, confusion matrix
- time series analysis - stationarity, ARIMA, seasonality, feature engineering for time
- monte carlo simulation - simulation, portfolio optimization, risk metrics
- recommender systems - collaborative filtering, content-based, evaluation
Applied & Production¶
- ds workflow - end-to-end project methodology, pitfalls, reproducibility
- bi dashboards - BI systems, dashboard design, KPIs, analytics SQL
- ml production - model serialization, serving, monitoring, drift detection
- financial data science - portfolio theory, derivatives, risk metrics, financial ratios
- ai video production - AI video pipeline, tool chain, prompt engineering for video
Cross-Topic Links¶
- [[python:python-fundamentals]] - general Python beyond DS
- [[sql-databases:sql-fundamentals]] - database theory and administration
- [[algorithms:algorithm-complexity]] - computational complexity
- [[data-engineering:etl-pipelines]] - data pipeline infrastructure
- [[llm-agents:prompt-engineering]] - prompt engineering for LLMs