ML System Design

A/B TestingStatistical SignificanceHypothesis TestingSample Size+9

A/B Testing for ML Systems: Design, Statistical Rigor & Production Pitfalls

How top ML teams run experiments that actually produce trustworthy conclusions — sample size calculation, randomization units, guard rails, CUPED variance reduction, network effects, and the organizational mistakes that make most A/B tests misleading.

KafkaSparkFlinkData Pipeline+8

Data Pipelines for ML: Batch, Streaming, and Event Architecture

How production ML data pipelines are actually built — Kafka for event collection, Spark for batch feature engineering, Flink for real-time aggregations, and the architectural decisions that determine whether your model trains on fresh or stale data.

EmbeddingsVector DatabaseFAISSHNSW+9

Embeddings & Vector Databases: ANN Search at Scale

How embeddings power search, recommendations, and retrieval — and how to build the index that serves them at millisecond latency. Covers HNSW vs IVF+PQ, tuning M and ef parameters, billion-scale architecture, and when to use Pinecone vs FAISS vs pgvector.

Shadow DeploymentChampion-ChallengerA/B TestingML Monitoring+10

ML Model Evaluation & Production Monitoring: Shadow Mode, A/B Testing & Rollback

Production ML evaluation is fundamentally different from offline evaluation. Covers shadow deployment, champion-challenger A/B testing, canary rollouts, SLO design for ML systems, rollback triggers, and the metrics that reveal model degradation before users notice. The end-to-end playbook for safely deploying and monitoring ML models.

Experiment TrackingModel RegistryMLflowWeights & Biases+8

ML System Design25 min

Experiment Tracking & Model Registry: The Version Control for ML

How production ML teams manage the model lifecycle from experiment to production — MLflow vs Weights & Biases, what metadata a model must carry, promotion workflows with gated approvals, model lineage for debugging, and the rollback mechanism that makes safe deployments possible.

Feature StoreTraining-Serving SkewRedisPoint-in-Time Join+8

Feature Stores: Online/Offline Architecture & Training-Serving Consistency

Deep dive into feature store architecture — the infrastructure every production ML system needs but most candidates can't explain. Covers the two-tier design, point-in-time correct joins, training-serving skew, and how to choose between Feast, Tecton, and cloud-managed options.

ML PipelinesAirflowKubeflowOrchestration+9

ML System Design25 min

ML Pipelines & Orchestration: Airflow, Kubeflow, and CI/CD for Models

How production ML teams automate the full model lifecycle — from data ingestion through training, evaluation, and deployment. Covers Airflow vs Kubeflow Pipelines, containerized training steps, automated model validation gates, and the CI/CD practices that separate mature ML teams from ad-hoc ones.

Model DeploymentMLOpsCanary ReleaseShadow Mode+5

ML Model Deployment Fundamentals: Shipping Safely in Production

A practical foundation for deploying ML models: packaging, serving topologies, rollout strategies, and post-deploy monitoring. Covers shadow mode, canary releases, drift detection, and rollback design.

5 questions

Model ServingReal-Time InferenceBatch ServingShadow Deployment+8

Model Serving Architectures: Batch vs Real-Time, Shadow Deployments & Latency Budgets

How to design the serving layer for ML models in production — when to use batch pre-computation vs real-time inference, how to safely deploy new models via shadow and canary patterns, and how to structure a multi-stage serving pipeline within a latency budget.

ML MonitoringData DriftConcept DriftPSI+9

ML Monitoring & Drift Detection: Keeping Models Healthy in Production

Production ML models fail silently. This guide covers the three-layer monitoring stack (data drift, concept drift, output drift), PSI thresholds, KL divergence, distinguishing data drift from concept drift (they require different fixes), and how to build retraining triggers that aren't noisy.

AUC-ROCNDCGMAPPrecision@K+10

Offline vs Online Evaluation: Why Metrics Disagree and What to Do About It

The most common ML interview trap: candidates optimize offline metrics but can't explain why they diverge from online results. Covers AUC vs CTR, NDCG vs session length, position bias, novelty effects, counterfactual evaluation, and the right metric for each stage of an ML system.