Machine Learning
Classical ML and deep learning from first principles through production. Loss functions, regularization, feature engineering, and debugging — theory plus code plus failure modes.
guides
Anomaly Detection: Isolation Forest, LOF, ECOD, and Production
ML interview: unsupervised and semi-supervised anomaly detection for tabular, logs, and monitoring — Isolation Forest path length, LOF, autoencoder reconstruction, ECOD tail-scores, PyOD, contamination, concept drift, and precision@K when ground-truth labels are rare.
Decision Trees: CART, Splitting Criteria, and Pruning
Decision trees from first principles — CART's greedy recursive binary splitting, Gini vs entropy vs MSE split criteria, cost-complexity pruning with cross-validation, surrogate splits for missing values, feature importance bias (Strobl 2007) and the TreeSHAP fix, and why single trees are always ensembled in production. Covers ID3, C4.5, CART, and oblivious trees (CatBoost).
ML Evaluation Metrics: The Complete Guide
Know exactly which metric to use for which problem type, and why. Covers Precision, Recall, F1, ROC-AUC, PR-AUC, NDCG, calibration, regression metrics, and when each is misleading. 10 hard interview questions with detailed answers.
K-Means Clustering: Complete Guide
K-Means from first principles — the algorithm and convergence proof, K-Means++ initialization, choosing K with elbow/silhouette/gap statistics, Mini-Batch variants, distributed K-Means, and when the algorithm breaks down. 10 hard interview questions with detailed answers.
Linear & Logistic Regression: From OLS to FTRL
Master linear and logistic regression from first principles — OLS derivation via normal equation and MLE, Gauss-Markov (BLUE), why MSE fails for classification, IRLS, L1 vs L2 geometry, and production patterns like FTRL-Proximal used at Google Ads and Facebook CTR.
PCA & Dimensionality Reduction: PCA, t-SNE, UMAP, Autoencoders
PCA from first principles — variance maximization, eigendecomposition, SVD formulation, and when SVD beats eigendecomposition of the covariance matrix. Plus kernel PCA, t-SNE pitfalls, UMAP, autoencoders, and embeddings as the modern alternative. 10 hard interview questions with production-grade answers.
Random Forest & Ensemble Methods
Random Forest from first principles — bootstrap aggregating, the bias-variance decomposition of ensembles, feature importance via Gini/permutation, out-of-bag error, and when to choose Random Forest vs XGBoost vs GBM. 7 hard interview questions with detailed answers.
Computer Vision Fundamentals: CNNs, ResNet, ViT, and Production Transfer Learning
The core computer vision concepts every ML engineer needs: convolution mechanics, why ResNet's skip connections solved deep network training, the inductive bias tradeoffs between CNNs and Vision Transformers (ViT), and a production-grade transfer learning guide. Covers CNN architectures from LeNet to EfficientNet, object detection (YOLO, R-CNN family), and when CNNs still beat ViTs.
Knowledge Distillation: Temperature, Soft Targets, and Students
ML interview: Hinton 2015 soft labels, softmax temperature T and T² scaling, dark knowledge, FitNets, DistilBERT, student–teacher training, pruning/quantization stacks, and when distillation fails versus pruning.
NLP Fundamentals: Tokenization, Embeddings, BERT vs GPT, and Fine-Tuning
The NLP concepts every ML engineer must know to work with language models. Covers subword tokenization (BPE, WordPiece, SentencePiece), static vs contextual embeddings, BERT vs GPT architectural differences, fine-tuning strategies from full fine-tuning to LoRA, and the practical tradeoffs that come up in every production NLP system.
Attention Mechanisms: From Intuition to Transformer-Scale Reasoning
Understand attention as dynamic relevance weighting, not just a formula. Covers scaled dot-product attention, multi-head attention, failure modes, and production tradeoffs for sequence modeling.
Bias-Variance Tradeoff & ML Debugging
The single most important ML concept for interviews. Master the formal bias-variance decomposition, learning curves, double descent, how to diagnose high bias vs high variance from real signals, and the exact fixes for each case. 8 hard interview questions with detailed answers.
Cross-Validation Strategies: K-Fold, Time Series, Nested CV, and Leakage-Proof Pipelines
The definitive guide to cross-validation for ML interviews. Covers stratified/group/time-series K-fold, nested CV for hyperparameter search, purged and embargoed CV (López de Prado), bootstrap .632+, and the leakage traps that silently inflate offline scores. Includes production-grade sklearn Pipelines and a from-scratch purged CV implementation.
Feature Engineering: Leakage-Safe Encoding, Interactions, Temporal, and Production Parity
The production-grade feature engineering playbook. Covers categorical encoding by cardinality (one-hot, target/mean with K-fold CV smoothing, hashing, embeddings, CatBoost ordered stats), numerical transforms, interactions, cyclical temporal features, the 4 sources of data leakage with concrete fixes (ColumnTransformer + Pipeline), missing-data strategies (MCAR/MAR/MNAR), feature selection (permutation importance vs biased gain importance), and the training-serving skew that silently destroys models in production.
Hyperparameter Tuning: Search Strategy, Budgeting, and Production Discipline
Learn how to tune ML models with budget-aware strategy: random search, Bayesian optimization, and early-stopping schedulers. Covers leakage pitfalls, reproducibility, and practical tuning playbooks.
Imbalanced Classification: Metrics, Class Weights, SMOTE, and Threshold Tuning
The complete decision framework for imbalanced classification — fraud, rare disease, ad CTR. Covers why accuracy and ROC-AUC lie under imbalance, when SMOTE hurts rather than helps on tabular data, focal loss vs class weights, Pozzolo's calibration correction after undersampling, and why threshold tuning is the single most under-used technique in production ML.
Loss Functions: Choosing the Right Objective for Every ML Problem
The most underestimated ML interview topic. Covers regression losses (MSE, MAE, Huber), classification losses (cross-entropy, focal loss), ranking and embedding losses (triplet, InfoNCE), and the exact decision framework for choosing which loss to use — and why the wrong choice silently destroys model quality.
Metric Design for Data Scientists: North Star Metrics, Guardrails, and Causal Attribution
Master the 3-layer metric hierarchy used at top tech companies — from selecting a North Star and guardrails to diagnosing metric drops through top-down decomposition. Covers Goodhart's Law, Simpson's Paradox, and the classic trade-off questions that appear in every DS and PM+data interview.
Optimization & Training: SGD to AdamW, Learning Rate Scheduling, and Gradient Flow
The mechanics behind every successful model training run. Covers SGD with momentum, Adam, AdamW, and their mathematical differences; learning rate warmup and cosine decay schedules (with production evidence); gradient clipping; mixed-precision training; and the most common training failure modes with their exact fixes.
Probability Calibration: When Your Model's Probabilities Actually Mean Something
A senior-level ML differentiator most prep resources skip. Covers why calibration matters for expected-value decisions (ad bidding, fraud risk, medical scoring), how to measure miscalibration (ECE, Brier, reliability diagrams), calibration methods (Platt, isotonic, temperature scaling), and why modern deep networks are systematically overconfident.
Recommendation Fundamentals: Retrieval, Ranking, and Evaluation Basics
Build a strong foundation in recommendation systems: candidate retrieval, ranking, exploration-exploitation, and offline/online evaluation. Designed for ML and product-system interviews.
Regularization in ML: Controlling Variance Without Killing Signal
A production-first guide to L1, L2, Elastic Net, dropout, and early stopping. Learn the derivation intuition, failure modes, and how to choose regularization under different data regimes and model families.
Root Cause Analysis Framework: Investigating Metric Drops and Production Incidents
The 5-step RCA framework used by senior data scientists at top tech companies — from data quality audit to discriminating hypothesis tests and structured PSA communication. Covers the full diagnostic process for DAU drops, engagement declines, and production anomalies, with worked examples from real interview scenarios.
How to Approach an ML Interview Round at FAANG
Mindset and signal management for the classical ML and deep learning interview rounds. Covers what interviewers grade, the intuition plus math plus practical-experience triple, time budgeting, recovery patterns when you blank on a derivation, and how the ML scientist round differs from the ML engineer round.
A/B Testing & Experimentation at Scale
End-to-end A/B testing framework used at top tech companies — from experiment design and sample size calculation to statistical analysis, multiple comparisons, novelty effects, and causal inference when randomization isn't possible.
Bootstrap & Resampling — Uncertainty for Arbitrary Statistics
Master Efron's bootstrap, BCa confidence intervals, permutation tests, block bootstrap for time series, and jackknife. Covers when bootstrap fails (extremes, dependent data), production use at Netflix/Stripe, and the bagging connection to ML.
Probability Distributions: The Production ML Engineer's Reference
The 10 distributions that appear in every ML system — not as textbook formulas but as modeling tools. Covers when each distribution arises naturally, its connection to ML algorithms (Bernoulli→logistic regression, Poisson→NLP count models, Gaussian→linear regression, Log-normal→revenue modeling, Beta→Thompson Sampling, Dirichlet→LDA). Includes the Central Limit Theorem, heavy tails, and the Maximum Likelihood Estimation framework.
Hypothesis Testing for Data Scientists: p-values, Type I/II, Multiple Testing
The complete hypothesis testing framework for machine learning interviews — p-value interpretation, Type I/II error trade-offs, when to use z-tests vs t-tests vs chi-square, Bonferroni and BH-FDR corrections, and effect size. Covers the most common interview traps candidates fail silently.
Regression Statistics: Coefficients, Confidence Intervals, and Diagnostics
The complete regression statistics reference for data science interviews — OLS assumptions (LINE acronym), coefficient interpretation for continuous and categorical predictors, confidence vs prediction intervals, R² and its limitations, residual diagnostics (Q-Q plots, heteroscedasticity, Cook's distance), multicollinearity and VIF, and logistic regression odds ratios. Covers the most common interviewer traps around R² and model validation.
Statistical Power, Sample Size & Experiment Design: The Complete Guide
The math every ML engineer and data scientist must know to design experiments that actually detect real effects. Covers Type I/II errors, statistical power, the sample size formula and why MDE matters quadratically, CUPED variance reduction (used at Netflix, Booking.com, Airbnb), multiple comparisons corrections, sequential testing for early stopping, and the most common experiment design mistakes.
Statistics & Probability Foundations
Master the statistical concepts that underpin every data science and ML interview — from distributions and hypothesis testing to A/B testing and causal inference.
Time Series Forecasting: ARIMA, Prophet, LightGBM, and Deep Learning
Master time series forecasting for ML interviews — stationarity, ARIMA/ETS, Prophet, gradient-boosted lag features (the M5 winner), DeepAR/N-BEATS/TFT/PatchTST, MASE evaluation, hierarchical reconciliation, and why classical methods still beat neural nets on small data.
Multiple Testing Corrections: FWER, FDR, Bonferroni, Benjamini–Hochberg, and When Each Fails
Running twenty metrics at α=0.05 each does not leave your program at a 5% false positive rate — family-wise error explodes. This guide covers Bonferroni, Holm, Benjamini–Hochberg FDR, false discovery proportion intuition, and how Meta-style experimentation teams pair primary-metric discipline with exploratory FDR on secondary reads.
Non-parametric Tests: Mann–Whitney U, Kruskal–Wallis, Permutation Tests, and When Normality Fails
Revenue, dwell time, and latency are skewed — t-tests on raw values assume fragile things. This guide covers rank-based tests (Mann–Whitney, Kruskal–Wallis), exact permutation logic, median vs mean hypotheses, ties, and when robust alternatives (bootstrap, trimmed means) beat ranks for product A/B analysis.
Practical vs Statistical Significance: MDE, Cohen's d, Confidence Intervals, and Business Loss
At large n, trivial lifts become p less than 0.001. Interviewers expect you to separate statistical evidence from business value using minimum detectable effect (MDE), Cohen's d, absolute vs relative lifts, and confidence interval width. This topic ties power analysis to engineering cost and revenue translation — the bar senior DS candidates clear at Stripe, Uber, and DoorDash.
SVM & Kernel Methods: Maximum Margin, Duals, and the Kernel Trick
Master Support Vector Machines from first principles: hard/soft margin primals, the Lagrangian dual, KKT conditions, Mercer's theorem, the kernel trick, RBF/polynomial kernels, SMO training, Platt calibration, and when SVMs still beat XGBoost and deep nets in 2026.
XGBoost: Gradient Boosting Deep Dive
Master XGBoost from first principles — gradient boosting intuition, the regularized objective with full derivations, split-finding algorithms, histogram approximations, SHAP values, hyperparameter tuning, and production patterns.
DDPM Foundations: ELBO, Score Matching, DDIM, and CFG
ML interview theory: Ho et al. 2020 DDPM, variational lower bound, ε-prediction and L_simple, denoising score matching, DDIM fast sampling, classifier-free guidance training. Complements `genai-diffusion-models` (serving); this is the derivation-first track.
Graph Neural Networks: Message Passing, GCN, GAT, GraphSAGE & Production GNNs
Deep dive into Graph Neural Networks for FAANG ML interviews. Covers message passing (MPNN, Gilmer 2017), GCN (Kipf 2017), GraphSAGE (Hamilton 2017), GAT (Velickovic 2018), GIN, Graphormer, neighbor sampling for scale, 1-WL expressiveness limits, oversmoothing and oversquashing, and production systems (Pinterest PinSage, Google Maps ETA, Uber fraud). 7 hard interview questions with answers.
Mixture of Experts (MoE): Sparse Scaling Behind GPT-4 & Mixtral
The sparse activation architecture that powers GPT-4, Mixtral 8x7B, and DeepSeek-V3. Covers top-k gating math, router training with load-balancing losses, capacity factor, expert-choice vs token-choice routing, expert parallelism with all-to-all communication, and why MoE gives 10x parameters at constant FLOPs per token. Includes 8 hard interview questions.
Neural Networks: Backpropagation, Activations & Training
Deep neural network fundamentals for FAANG ML interviews. Covers backpropagation derivation with chain rule, activation functions and their gradients, Batch/Layer Normalization, vanishing/exploding gradients, weight initialization (He/Xavier), and practical debugging. 9 hard interview questions with answers.
Normalization Deep-Dive: BatchNorm, LayerNorm, GroupNorm & RMSNorm
Deep comparison of BatchNorm, LayerNorm, GroupNorm, InstanceNorm, and RMSNorm for FAANG deep learning interviews. Covers the axis of normalization, why transformers and modern LLMs (LLaMA, GPT, PaLM) use LayerNorm/RMSNorm over BatchNorm, Pre-LN vs Post-LN stability, BN-fold-into-Conv inference trick, production failure modes, and the Santurkar 2018 loss-landscape-smoothing explanation that overturned the internal covariate shift hypothesis.
Reinforcement Learning for ML Systems: Bandits, RLHF, PPO, and DPO
RL concepts that directly appear in production ML interviews: multi-armed bandits for exploration in recommenders, the RLHF pipeline powering ChatGPT and Claude (SFT → reward model → PPO), the PPO objective with KL divergence penalty, DPO as a simpler RLHF alternative, and contextual bandits for content ranking. Focused on practical RL for ML engineers, not robotics.
RNNs, LSTMs & GRUs: Sequence Models Before Transformers
Deep dive into recurrent neural networks for FAANG ML interviews. Covers vanilla RNN recurrence and BPTT, vanishing/exploding gradients (Pascanu 2013), LSTM cell state and gates (Hochreiter & Schmidhuber 1997), GRU (Cho 2014), seq2seq with Bahdanau attention, why transformers replaced RNNs in 2017, and where RNN-shaped models still win (streaming inference, Mamba 2023). 8 interview questions with answers.
Transformers: Self-Attention, Architecture & Modern LLMs
The architecture that powers all modern LLMs. Covers self-attention derivation with complexity analysis, multi-head attention, positional encodings (absolute, RoPE, ALiBi), encoder vs decoder architectures, modern improvements (GQA, RMSNorm, SwiGLU), and how to count parameters and FLOPs. 8 hard interview questions.
Continual & Online Learning: Catastrophic Forgetting, EWC, Replay Buffers, and Streaming ML Tradeoffs
Production models face drifting data — ads, fraud, search — yet naive fine-tuning forgets old tasks. This guide covers catastrophic forgetting, elastic weight consolidation (EWC), experience replay, dark knowledge retention, warm-start vs cold-start, and when Netflix-style batch retraining beats true online gradients.
Multi-Task & Transfer Learning: Shared Representations, Negative Transfer, and Fine-Tuning Strategy
Sharing encoders across tasks can improve data efficiency — or hurt if tasks conflict. This guide covers hard parameter sharing, soft sharing (cross-stitch, sluice), adapter layers (LoRA-style intuition), negative transfer diagnostics, and when Google-style pretrain-finetune beats training from scratch on tabular vs vision vs NLP.
How to Structure ML Interview Answers — Derivations and Debugging
Execution playbook for FAANG ML interview rounds. Covers the model selection framework, canonical whiteboard derivations (bias-variance, softmax cross-entropy gradient, backprop through an MLP), the model debugging playbook, and practical patterns for imbalanced classification, calibration, and cross-validation.
Bayesian Inference: Priors, Posteriors, MCMC, and Variational Inference
The Bayesian reasoning framework that underpins Thompson Sampling, Bayesian A/B testing, uncertainty-aware ML, and Bayesian optimization. Covers Bayes' theorem from first principles, conjugate priors, MCMC (Metropolis-Hastings, NUTS), variational inference (ELBO), and when Bayesian methods outperform frequentist approaches in production ML systems.
Causal Inference: DiD, Instrumental Variables, RDD, and When A/B Tests Fail
The toolkit every senior data scientist needs when A/B tests aren't possible. Covers DiD, Instrumental Variables (IV), RDD, Propensity Score Matching, and Double ML for machine learning interviews — the exact methods used at Airbnb and Microsoft to estimate causal effects from observational data when randomization is impossible.
ML Math Foundations
The essential mathematics behind machine learning: gradient descent derivation, cost functions, regularization, and the bias-variance decomposition with full mathematical proofs.
Bayesian A/B Testing vs Frequentist: Priors, Posteriors, Probability of Superiority, and Expected Loss
Bayesian experimentation reports P(treatment beats control | data) and expected regret — intuitive for executives — but priors, ROPE, and MCMC diagnostics create new failure modes. This guide contrasts Thompson sampling, Beta-Binomial conjugate updates, decision rules based on expected loss, and when frequentist fixed-n tests remain the compliance-safe choice.
Sequential Testing & the Peeking Problem: Alpha Spending, SPRT, and Always-Valid Inference
Product teams peek at A/B tests daily — but naive repeated significance testing inflates Type I error from ~5% to ~30% or higher. This topic covers alpha spending functions, group sequential designs, SPRT intuition, and production platforms (Optimizely Stats Engine, Statsig, Eppo) that deliver always-valid confidence sequences so you can monitor experiments without lying about significance.