Skip to main content
Learn · ML System Design

ML System Design

Recommendation, ranking, search, fraud detection — end-to-end ML systems with serving architectures, feature stores, training pipelines, and online/offline evaluation.

44
guides
ML System Design35 min

A/B Testing for ML Systems: Design, Statistical Rigor & Production Pitfalls

How top ML teams run experiments that actually produce trustworthy conclusions — sample size calculation, randomization units, guard rails, CUPED variance reduction, network effects, and the organizational mistakes that make most A/B tests misleading.

A/B TestingStatistical SignificanceHypothesis TestingSample Size+9
Intermediate
6 questions
ML System Design30 min

Data Pipelines for ML: Batch, Streaming, and Event Architecture

How production ML data pipelines are actually built — Kafka for event collection, Spark for batch feature engineering, Flink for real-time aggregations, and the architectural decisions that determine whether your model trains on fresh or stale data.

KafkaSparkFlinkData Pipeline+8
Intermediate
6 questions
ML System Design35 min

Embeddings & Vector Databases: ANN Search at Scale

How embeddings power search, recommendations, and retrieval — and how to build the index that serves them at millisecond latency. Covers HNSW vs IVF+PQ, tuning M and ef parameters, billion-scale architecture, and when to use Pinecone vs FAISS vs pgvector.

EmbeddingsVector DatabaseFAISSHNSW+9
Intermediate
6 questions
ML System Design35 min

ML Model Evaluation & Production Monitoring: Shadow Mode, A/B Testing & Rollback

Production ML evaluation is fundamentally different from offline evaluation. Covers shadow deployment, champion-challenger A/B testing, canary rollouts, SLO design for ML systems, rollback triggers, and the metrics that reveal model degradation before users notice. The end-to-end playbook for safely deploying and monitoring ML models.

Shadow DeploymentChampion-ChallengerA/B TestingML Monitoring+10
Intermediate
6 questions
ML System Design25 min

Experiment Tracking & Model Registry: The Version Control for ML

How production ML teams manage the model lifecycle from experiment to production — MLflow vs Weights & Biases, what metadata a model must carry, promotion workflows with gated approvals, model lineage for debugging, and the rollback mechanism that makes safe deployments possible.

Experiment TrackingModel RegistryMLflowWeights & Biases+8
Intermediate
6 questions
ML System Design35 min

Feature Stores: Online/Offline Architecture & Training-Serving Consistency

Deep dive into feature store architecture — the infrastructure every production ML system needs but most candidates can't explain. Covers the two-tier design, point-in-time correct joins, training-serving skew, and how to choose between Feast, Tecton, and cloud-managed options.

Feature StoreTraining-Serving SkewRedisPoint-in-Time Join+8
Intermediate
6 questions
ML System Design25 min

ML Pipelines & Orchestration: Airflow, Kubeflow, and CI/CD for Models

How production ML teams automate the full model lifecycle — from data ingestion through training, evaluation, and deployment. Covers Airflow vs Kubeflow Pipelines, containerized training steps, automated model validation gates, and the CI/CD practices that separate mature ML teams from ad-hoc ones.

ML PipelinesAirflowKubeflowOrchestration+9
Intermediate
6 questions
ML System Design35 min

ML Model Deployment Fundamentals: Shipping Safely in Production

A practical foundation for deploying ML models: packaging, serving topologies, rollout strategies, and post-deploy monitoring. Covers shadow mode, canary releases, drift detection, and rollback design.

Model DeploymentMLOpsCanary ReleaseShadow Mode+5
Intermediate
5 questions
ML System Design30 min

Model Serving Architectures: Batch vs Real-Time, Shadow Deployments & Latency Budgets

How to design the serving layer for ML models in production — when to use batch pre-computation vs real-time inference, how to safely deploy new models via shadow and canary patterns, and how to structure a multi-stage serving pipeline within a latency budget.

Model ServingReal-Time InferenceBatch ServingShadow Deployment+8
Intermediate
6 questions
ML System Design30 min

ML Monitoring & Drift Detection: Keeping Models Healthy in Production

Production ML models fail silently. This guide covers the three-layer monitoring stack (data drift, concept drift, output drift), PSI thresholds, KL divergence, distinguishing data drift from concept drift (they require different fixes), and how to build retraining triggers that aren't noisy.

ML MonitoringData DriftConcept DriftPSI+9
Intermediate
6 questions
ML System Design30 min

Offline vs Online Evaluation: Why Metrics Disagree and What to Do About It

The most common ML interview trap: candidates optimize offline metrics but can't explain why they diverge from online results. Covers AUC vs CTR, NDCG vs session length, position bias, novelty effects, counterfactual evaluation, and the right metric for each stage of an ML system.

AUC-ROCNDCGMAPPrecision@K+10
Intermediate
6 questions
ML System Design35 min

Distributed Training: Data Parallelism, Model Parallelism, and FSDP

How to scale model training from a single GPU to thousands. Covers data parallelism with Ring AllReduce, model/tensor/pipeline parallelism for LLMs, PyTorch DDP vs FSDP2, and how to choose the right strategy based on model size vs data volume.

Distributed TrainingData ParallelismModel ParallelismRing AllReduce+8
Advanced
1 questions
ML System Design30 min

GPU Infrastructure for ML Serving: Quantization, Batching & Inference Optimization

The engineering decisions that determine whether your model serves at 10ms or 200ms — GPU selection, quantization (INT8/FP16/FP8), dynamic batching, KV cache management, and when to use Triton vs vLLM vs TensorRT-LLM.

GPUNVIDIAA100H100+11
Advanced
1 questions
ML System Design40 min

Two-Stage Retrieval & Ranking: The Architecture Behind Every Large-Scale Recommender

The dominant architecture powering Google, YouTube, TikTok, Pinterest, and Spotify — two-tower retrieval followed by multi-stage ranking. Covers the fundamental constraint that makes this necessary, in-batch negatives, hard negative mining, and the full 4-stage production pipeline.

Two-Tower ModelRetrievalRankingFAISS+8
Advanced
1 questions
ML System Design35 min

Vector Search at Scale: HNSW, IVF-PQ, FAISS, and Production ANN Systems

Approximate Nearest Neighbor (ANN) search is the retrieval backbone of RAG, recommendation systems, semantic search, and visual similarity. Master HNSW graph construction, IVF-PQ compression, FAISS vs Qdrant vs pgvector selection, recall-latency tradeoffs, and hybrid dense+sparse search. Includes production sizing and indexing strategy for 1B+ vector corpora.

Vector SearchANNHNSWIVF-PQ+11
Advanced
1 questions
ML System Design56 min

ML System Design: Abuse Detection — Account Takeover, Bots, and Velocity Beyond Spam

Design a cross-product abuse platform distinct from content spam — credential stuffing, account takeover (ATO), synthetic accounts, scraping, and collusion rings. Covers device graphs, velocity features in Redis, challenge escalation (CAPTCHA, step-up auth), feedback loops when labels are delayed, and why Meta-style integrity teams separate abuse from policy-violating content classifiers.

Abuse DetectionAccount TakeoverBot DetectionCredential Stuffing+8
Advanced
6 questions
ML System Design55 min

MLSD Case Study: Ad Click Prediction at Marketplace Scale

Design a production CTR prediction system for ads at Meta/Google scale. Covers COEC calibration, delayed conversions, feature engineering for sparse+dense signals, multi-stage serving under strict latency budgets, and the failure loops that most interview answers miss.

CTR PredictionAds RankingCOECDelayed Feedback+7
Advanced
1 questions
ML System Design55 min

MLSD Case Study: Churn Prediction with Survival and Uplift Modeling

Design a Spotify/Netflix-style churn prevention system using survival models, causal uplift targeting, and intervention policy optimization. Covers label definition, intervention economics, and production monitoring pitfalls.

Churn PredictionSurvival AnalysisUplift ModelingRetention Systems+5
Advanced
1 questions
ML System Design60 min

MLSD Case Study: Multimodal Content Moderation Systems

Design a TikTok/YouTube/Meta-style content moderation stack with multimodal models, policy-aware inference, human-in-the-loop review, and continuous policy evolution. Covers latency tiers, precision/recall tradeoffs by harm class, and model-policy coupling.

Content ModerationMultimodal MLHuman-in-the-LoopSafety Policies+5
Advanced
1 questions
ML System Design55 min

ML System Design: Customer LTV, Survival Modeling, and Uplift for Treatment Targeting

Design a production customer lifetime value (CLV) system for subscription and marketplace businesses — from probabilistic churn (BG/NBD, Gamma-Gamma spend), survival curves with censoring, to uplift modeling for CRM campaigns. Covers why naive regression on historical LTV leaks future information, how Shopify-style merchants use CLV for acquisition bids, and evaluation with calibrated dollar errors plus policy simulation.

Customer Lifetime ValueBG NBD ModelGamma Gamma ModelSurvival Analysis+8
Advanced
6 questions
ML System Design55 min

ML System Design: Demand Forecasting System

Design the demand forecasting system used by Uber, Lyft, DoorDash, and Amazon — end-to-end. Covers the three hard problems that make this uniquely challenging: spatial ML (demand is correlated across geography, not independent per zone), online learning (marketplace conditions change faster than batch retraining), and the feedback loop where demand forecasts drive pricing which affects actual demand. Includes H3 spatial graphs, temporal GNNs, online adaptation with drift detection, and why WMAPE beats MAPE for imbalanced demand.

Demand ForecastingSpatial MLGraph Neural NetworksOnline Learning+8
Advanced
10 questions
ML System Design58 min

MLSD Case Study: Document Understanding & Enterprise NLP Classification

Production document AI for invoices and contracts: OCR, layout-aware encoders, calibrated per-field extraction, human review routing, and template drift. Covers hybrid rule/ML cascades, per-field F1 under imbalance, and audit-grade logging — the system design interviewers expect beyond flatten-all-text BERT.

Document AIOCRLayoutLMLayoutLMv3+10
Advanced
1 questions
ML System Design58 min

ML System Design: Dynamic Pricing & Surge (Marketplace)

Design a production dynamic pricing system like Uber surge or DoorDash peak pricing — end-to-end. Covers the three problems most prep skips: pricing is a closed-loop control problem (price moves both supply and demand), the correct ML framing is contextual bandits or MDPs — not plain regression on historical fares — and guardrails (caps, fairness, emergency optics) that constrain the learned policy. Includes elasticity estimation, exploration budgets, and why naive demand forecasting without supply response mis-prices the marketplace.

Dynamic PricingSurge PricingContextual BanditsMarketplace ML+10
Advanced
1 questions
ML System Design60 min

ML System Design: E-commerce Recommendation System

Design Amazon/Alibaba-scale product recommendation end-to-end — from session-aware candidate retrieval across billions of products to multi-task neural ranking optimizing for click, add-to-cart, and purchase simultaneously. Covers the exact architecture used in production: real-time session modeling, co-purchase graph retrieval, cold start for new products and new users, Thompson sampling for category discovery, and the unique constraints of e-commerce (inventory, price, return rate, margin). Includes latency budget analysis, training data construction with purchase attribution, and what each level (mid/senior/staff) must cover.

E-commerce RecommendationsTwo-Tower NetworksSession-Based RecommendationsCold Start+9
Advanced
1 questions
ML System Design50 min

ML System Design: ETA Prediction System

Design the ETA prediction system used by Uber (DeepETA), Lyft, and DoorDash — end-to-end. Covers the two architectural insights that define this problem: the physics-first hybrid approach (ML refines a routing engine, not replaces it) and the online-offline feature split that makes sub-10ms inference possible. Includes H3 geohashing for spatial features, the Linear Transformer trick for latency, quantile regression for uncertainty, and the compounding error problem in multi-stop predictions.

ETA PredictionSpatiotemporal FeaturesOnline FeaturesRouting Engine+7
Advanced
1 questions
ML System Design60 min

ML System Design: Real-Time Fraud Detection

Design a production fraud detection system used by Stripe, PayPal, and Visa — end-to-end. Covers the three hard problems nobody teaches: extreme class imbalance (typically ~0.1% fraud rate), cost-sensitive learning where FN ≠ FP, and multi-stage inference under 100ms latency. Includes the optimal threshold formula, graph neural networks for fraud rings, adversarial drift, and the suppression bias feedback loop that silently kills deployed fraud models.

Fraud DetectionClass ImbalanceCost-Sensitive LearningCascade Inference+8
Advanced
1 questions
ML System Design60 min

ML System Design: Instagram Feed Ranking System

The second canonical ML system design problem. Design Instagram's personalized feed ranking end-to-end — from multi-source candidate aggregation (friends, follows, recommended) to multi-task neural ranking predicting 10+ user actions simultaneously. Covers the exact architecture Meta uses in production, the value model that combines action predictions into a single score, integrity signals, SEV-driven guardrails, cold start for social graphs, and what each level (mid/senior/staff) must cover to pass.

Feed RankingMulti-Task LearningValue ModelSocial Graph+10
Advanced
2 questions
ML System Design58 min

ML System Design: Job Recommendation (LinkedIn-Style Marketplace)

Design a production job recommendation system for a professional marketplace — end-to-end. Covers cold-start and short-lived job postings, the LinkSAGE / LiGNN pattern: **nearline** GNN embeddings as **features** into a low-latency two-tower ranker (not full GNN on every list load), position bias and delayed labels (apply → hire), eligibility hard-filters, and two-sided evaluation with employer quality guardrails. Citations: LiGNN and LinkSAGE (arXiv 2024) for large-scale job graph learning at LinkedIn.

Job RecommendationTwo-TowerGraph Neural NetworksLinkSAGE+10
Advanced
1 questions
ML System Design70 min

ML System Design: LLM Serving Systems

Design a production LLM serving system from first principles — covering PagedAttention and KV cache management, continuous batching for 2-5× throughput gains, multi-LoRA serving with S-LoRA and InfiniLoRA, the full RLHF pipeline (SFT → reward model → PPO vs DPO vs GRPO), and cost-per-token engineering. Includes break-even analysis for self-hosting vs cloud, failure mode catalog, and what each interview level must cover.

LLM ServingvLLMPagedAttentionContinuous Batching+11
Advanced
1 questions
ML System Design65 min

ML System Design: Music Recommendation System

Design Spotify/Apple Music-scale music recommendation end-to-end — from audio-aware product retrieval to sequential session modeling that captures the unique temporal dynamics of music consumption. Covers the production architectures behind Discover Weekly (batch exploration), Radio (session exploitation), and Release Radar (cold start for new tracks). Deep dives into GRU4Rec and SASRec for session modeling, ACARec for artist-catalog-based cold start, contextual bandits on the homepage, skip-rate debiasing, and the listen-skip paradox. Includes what each level (mid/senior/staff) must cover.

Music RecommendationsSequential ModelsExplore-ExploitCold Start+11
Advanced
1 questions
ML System Design55 min

ML System Design: Notification Ranking System

Design the notification ranking system used by LinkedIn, Instagram, and Reddit — end-to-end. Covers the three problems that make this uniquely hard: multi-objective optimization (engagement vs fatigue vs retention), user fatigue modeling with adaptive per-user budgets, and why the budget constraint is more important than the ranking model. Includes Instagram's diversity-aware demotion framework, LinkedIn's Decision Transformer for sequential notification policy, and the suppression feedback loop from sending too many notifications.

Notification RankingMulti-Objective OptimizationUser FatigueBudget Constraints+7
Advanced
1 questions
ML System Design52 min

ML System Design: Query Understanding — Rewriting, Expansion, Classification, and Spell Correction

Design the query understanding stack behind web search, e-commerce search, and internal enterprise retrieval — tokenization, spelling, intent classification, synonym expansion, PII redaction, and safe query rewriting for vector + lexical hybrid retrieval. Covers how Amazon-style search decomposes the problem into cascaded lightweight models under single-digit millisecond budgets before heavy ranking.

Query UnderstandingSearch RankingQuery RewritingSpell Correction+8
Advanced
1 questions
ML System Design55 min

ML System Design: Real-Time Anomaly Detection at Scale

Design a production real-time anomaly detection system for metrics, logs, and business KPIs — end-to-end. Covers the three gaps in most answers: pointwise z-scores miss *multivariate* failures, unsupervised models cause *alert fatigue* without severity and incident context, and streaming state (Flink keyed windows) must respect *event time* and *exactly-once* semantics for financial or SRE use cases. Includes Isolation Forest + robust baselines, suppression and correlation grouping, and wiring to paging with SLO burn.

Anomaly DetectionApache FlinkKafkaStreaming ML+10
Advanced
1 questions
ML System Design55 min

MLSD Case Study: End-to-End Recommender System

Design a production recommender stack from candidate generation to ranking, re-ranking, experimentation, and monitoring. Covers retrieval-ranking tradeoffs, feature freshness, exploration, and feedback-loop mitigation.

ML System DesignRecommender SystemRetrieval RankingTwo-Tower+5
Advanced
1 questions
ML System Design65 min

ML System Design: Real-Time Bidding Optimization

Design a production DSP bidding system under 50ms — covering contextual bandits (UCB vs Thompson Sampling), budget pacing from PID controllers to RL, distributed budget state with token buckets, and ultra-low-latency hot-path engineering. Includes the auction bias problem with IPS correction, bid shading for first-price auctions, and failure mode analysis for production RTB systems.

RTBReal-Time BiddingDSPBandits+11
Advanced
1 questions
ML System Design50 min

MLSD Case Study: Search Ranking System

Design web/ecommerce search ranking with lexical + vector retrieval, multi-stage ranking, and freshness-aware indexing. Covers query understanding, relevance labels, and online experimentation.

Search RankingLearning to RankBM25ANN Retrieval+5
Advanced
1 questions
ML System Design60 min

ML System Design: Social Feed Ranking System

Design a production-grade social feed ranking system from scratch — the architecture powering Twitter/X, LinkedIn, Reddit, and Threads. Covers multi-source candidate retrieval (in-network + out-of-network), multi-task value model predicting 10+ user actions, recency engineering, echo chamber feedback loops, counterfactual logging, and the exact latency budget for a <200ms feed load. Includes the open-sourced X (Twitter) algorithm analysis, SimCluster-based out-of-network discovery, and what each level (mid/senior/staff) must cover.

Feed RankingMulti-Task LearningValue ModelRetrieval-Ranking+10
Advanced
1 questions
ML System Design50 min

MLSD Case Study: Graph-Aware Spam Detection

Design a Gmail/LinkedIn-style spam detection system combining content models, graph-based abuse signals, and velocity features. Covers adversarial adaptation, streaming detection, and class-specific action policies.

Spam DetectionAbuse DetectionGraph SignalsVelocity Features+5
Advanced
1 questions
ML System Design65 min

ML System Design: Video Recommendation System

The canonical ML system design problem. Design YouTube's video recommendation engine end-to-end — from billion-scale candidate retrieval to transformer-based multi-task ranking to A/B experimentation. Covers the exact architecture used in production, latency budget breakdowns, negative sampling, feedback loop pathologies, and what each level (mid/senior/staff) must cover to pass.

Recommendation SystemsTwo-Tower NetworksMulti-Task LearningCandidate Generation+9
Advanced
1 questions
ML System Design60 min

ML System Design: Visual Search at Billion Scale

Design Pinterest's visual search system end-to-end — from contrastive learning with hard negative mining to billion-scale ANN retrieval with HNSW, ScaNN, and DiskANN. Covers the multi-stage retrieval-ranking funnel, index update strategies for live catalogs, latency budget engineering, and the failure modes that production systems hit. Includes company comparisons across Pinterest, Google Lens, and Amazon.

Visual SearchContrastive LearningCLIPSigLIP+11
Advanced
1 questions
ML System Design40 min

How to Approach an ML System Design Interview

The mindset, signal management, and time strategy for ML system design interviews. Covers the offline-to-online metric translation, training-serving skew awareness, monitoring mindset, and the failure modes that cause strong ML engineers to underperform on production ml interview loops.

MLSD FrameworkML System Design InterviewFeature StoreTraining-Serving Skew+10
Advanced
1 questions
ML System Design45 min

How to Design at MLSD: Blank Whiteboard to Production ML

The mechanical playbook for ml system design interview execution. Covers product-to-ML translation, candidate-ranking funnels, two-tower retrieval, feature store architecture, model serving (Triton, vLLM), monitoring (PSI, drift), and reference designs for feeds, fraud, and search.

MLSD DesignTwo-TowerFAISSLightGBM+11
Advanced
1 questions
ML System Design40 min

ML System Design: 6-Step Framework

The definitive framework for ML system design interviews. Covers all 6 steps with exact timing, what interviewers look for at each step, and how to stand out from other candidates.

MLSD FrameworkML System Design InterviewProblem FramingFeature Engineering+8
Advanced
1 questions
ML System Design40 min

ML Fairness and Bias: Metrics, Trade-offs, and Mitigation Strategies

Fairness in ML systems is a first-class engineering problem, not just a policy concern. This guide covers the four main fairness definitions (demographic parity, equalized odds, calibration, individual fairness), their mathematical incompatibility, bias sources across the ML pipeline, and practical mitigation strategies — tested increasingly at Google, Meta, Microsoft, and AI-first companies in senior ML system design rounds.

ML FairnessAlgorithmic BiasDemographic ParityEqualized Odds+8
Advanced
1 questions