Learn · Machine Learning

Machine Learning

Classical ML and deep learning from first principles through production. Loss functions, regularization, feature engineering, and debugging — theory plus code plus failure modes.

53
guides

Machine Learning58 min

Anomaly Detection: Isolation Forest, LOF, ECOD, and Production

ML interview: unsupervised and semi-supervised anomaly detection for tabular, logs, and monitoring — Isolation Forest path length, LOF, autoencoder reconstruction, ECOD tail-scores, PyOD, contamination, concept drift, and precision@K when ground-truth labels are rare.

Anomaly DetectionIsolation ForestLOFAutoencoder+11

Intermediate

8 questions

Machine Learning50 min

Decision Trees: CART, Splitting Criteria, and Pruning

Decision trees from first principles — CART's greedy recursive binary splitting, Gini vs entropy vs MSE split criteria, cost-complexity pruning with cross-validation, surrogate splits for missing values, feature importance bias (Strobl 2007) and the TreeSHAP fix, and why single trees are always ensembled in production. Covers ID3, C4.5, CART, and oblivious trees (CatBoost).

Decision TreesCARTGini ImpurityEntropy+10

Intermediate

8 questions

Machine Learning45 min

ML Evaluation Metrics: The Complete Guide

Know exactly which metric to use for which problem type, and why. Covers Precision, Recall, F1, ROC-AUC, PR-AUC, NDCG, calibration, regression metrics, and when each is misleading. 10 hard interview questions with detailed answers.

PrecisionRecallF1AUC+5

Intermediate

8 questions

Machine Learning55 min

K-Means Clustering: Complete Guide

K-Means from first principles — the algorithm and convergence proof, K-Means++ initialization, choosing K with elbow/silhouette/gap statistics, Mini-Batch variants, distributed K-Means, and when the algorithm breaks down. 10 hard interview questions with detailed answers.

K-MeansClusteringUnsupervised LearningK-Means+++4

Intermediate

10 questions

Machine Learning60 min

Linear & Logistic Regression: From OLS to FTRL

Master linear and logistic regression from first principles — OLS derivation via normal equation and MLE, Gauss-Markov (BLUE), why MSE fails for classification, IRLS, L1 vs L2 geometry, and production patterns like FTRL-Proximal used at Google Ads and Facebook CTR.

Machine Learning

Anomaly Detection: Isolation Forest, LOF, ECOD, and Production

Decision Trees: CART, Splitting Criteria, and Pruning

ML Evaluation Metrics: The Complete Guide

K-Means Clustering: Complete Guide

Linear & Logistic Regression: From OLS to FTRL

PCA & Dimensionality Reduction: PCA, t-SNE, UMAP, Autoencoders

Random Forest & Ensemble Methods

Computer Vision Fundamentals: CNNs, ResNet, ViT, and Production Transfer Learning

Knowledge Distillation: Temperature, Soft Targets, and Students

NLP Fundamentals: Tokenization, Embeddings, BERT vs GPT, and Fine-Tuning

Attention Mechanisms: From Intuition to Transformer-Scale Reasoning

Bias-Variance Tradeoff & ML Debugging

Cross-Validation Strategies: K-Fold, Time Series, Nested CV, and Leakage-Proof Pipelines

Feature Engineering: Leakage-Safe Encoding, Interactions, Temporal, and Production Parity

Hyperparameter Tuning: Search Strategy, Budgeting, and Production Discipline

Imbalanced Classification: Metrics, Class Weights, SMOTE, and Threshold Tuning

Loss Functions: Choosing the Right Objective for Every ML Problem

Metric Design for Data Scientists: North Star Metrics, Guardrails, and Causal Attribution

Optimization & Training: SGD to AdamW, Learning Rate Scheduling, and Gradient Flow

Probability Calibration: When Your Model's Probabilities Actually Mean Something

Recommendation Fundamentals: Retrieval, Ranking, and Evaluation Basics

Regularization in ML: Controlling Variance Without Killing Signal

Root Cause Analysis Framework: Investigating Metric Drops and Production Incidents

How to Approach an ML Interview Round at FAANG

A/B Testing & Experimentation at Scale

Bootstrap & Resampling — Uncertainty for Arbitrary Statistics

Probability Distributions: The Production ML Engineer's Reference

Hypothesis Testing for Data Scientists: p-values, Type I/II, Multiple Testing

Regression Statistics: Coefficients, Confidence Intervals, and Diagnostics

Statistical Power, Sample Size & Experiment Design: The Complete Guide

Statistics & Probability Foundations

Time Series Forecasting: ARIMA, Prophet, LightGBM, and Deep Learning

Multiple Testing Corrections: FWER, FDR, Bonferroni, Benjamini–Hochberg, and When Each Fails

Non-parametric Tests: Mann–Whitney U, Kruskal–Wallis, Permutation Tests, and When Normality Fails

Practical vs Statistical Significance: MDE, Cohen's d, Confidence Intervals, and Business Loss

SVM & Kernel Methods: Maximum Margin, Duals, and the Kernel Trick

XGBoost: Gradient Boosting Deep Dive

DDPM Foundations: ELBO, Score Matching, DDIM, and CFG

Graph Neural Networks: Message Passing, GCN, GAT, GraphSAGE & Production GNNs

Mixture of Experts (MoE): Sparse Scaling Behind GPT-4 & Mixtral

Neural Networks: Backpropagation, Activations & Training

Normalization Deep-Dive: BatchNorm, LayerNorm, GroupNorm & RMSNorm

Reinforcement Learning for ML Systems: Bandits, RLHF, PPO, and DPO

RNNs, LSTMs & GRUs: Sequence Models Before Transformers

Transformers: Self-Attention, Architecture & Modern LLMs

Continual & Online Learning: Catastrophic Forgetting, EWC, Replay Buffers, and Streaming ML Tradeoffs

Multi-Task & Transfer Learning: Shared Representations, Negative Transfer, and Fine-Tuning Strategy

How to Structure ML Interview Answers — Derivations and Debugging

Bayesian Inference: Priors, Posteriors, MCMC, and Variational Inference

Causal Inference: DiD, Instrumental Variables, RDD, and When A/B Tests Fail

ML Math Foundations

Bayesian A/B Testing vs Frequentist: Priors, Posteriors, Probability of Superiority, and Expected Loss

Sequential Testing & the Peeking Problem: Alpha Spending, SPRT, and Always-Valid Inference