ML Infrastructure Engineer
Platform and serving — DSA, distributed HLD, ML system design, GenAI stack awareness, SQL, stats, scenarios, plus production engineering and craft from the Learn library.
This course combines 206 concept guides (Learn library) with 151 practice problems across 9 modules (ML theory + coding are merged so guides are not duplicated). Work each module left to right: study the guides, then drill the problems.
Practice: 0 / 151 solved · ~175h estimated · 206 guides to read
206
73 free · 133 premium in library
151
13 unlocked on your plan · 138 upgrade or preview
9
Modules below (incl. craft / product / analytics)
Premium unlocks every guide and problem in this path. Free tier uses path limits + per-track previews.
Compare plansCurriculum by track
Each module pairs Learn guides with in-app problems where that track has a practice surface (SQL, scenarios, DSA, etc.). Learn-only pillars (craft, product engineering, analytics) are reading-first. Open the Practise arena for the full multi-track dashboard.
Data Structures & Algorithms
Learning: 23 guides (23 free · 0 pro) · Practice: 45 problems (4 unlocked · 41 gated)
- Arrays and Strings: Core Interview Patterns and Execution StrategyFree
- Backtracking: Permutations, Combinations, Subsets & Word SearchFree
- Binary Search Patterns: 3 Templates and Bisect-on-AnswerFree
- Binary Trees & BST: Traversals, LCA, and Classic PatternsFree
- Bit Manipulation: XOR Tricks, Bit Masking & Power-of-Two PatternsFree
- DP on Strings: LCS, Edit Distance, and Regex MatchingFree
- Dynamic Programming: From Recursion to OptimizationFree
- Graph Algorithms: BFS, DFS & Topological SortFree
- Greedy Patterns: Interval Scheduling, Jump Game & Activity SelectionFree
- Heaps & Priority Queues: Top-K, Merge K Lists, and Two-Heap PatternsFree
- How to Approach a DSA Coding InterviewFree
- How to Design a DSA Solution: From Problem to Clean CodeFree
- Interval Patterns: Merge Intervals, Meeting Rooms & Sweep LineFree
- Knapsack DP: 0/1, Unbounded, and Subset Sum VariantsFree
+9 more in the Learn library
- EasyClimbing Stairs
- EasyMaximum Depth of Binary Tree
- EasyReverse Linked List
- EasySingle Number
- EasyTwo Sum
- EasyValid Parentheses
- MediumAccounts Merge
- MediumClone Graph
- MediumCoin Change
- MediumContainer With Most Water
- MediumCourse Schedule (Topological Sort)
- MediumFind All Anagrams in a String
- MediumFind Minimum in Rotated Sorted Array
- MediumGroup Anagrams
- MediumImplement Trie (Prefix Tree)
- MediumJump Game II
- MediumKth Largest Element in an Array
- MediumLinked List Cycle Detection
- MediumLongest Palindromic Subsequence
- MediumLongest Substring Without Repeating Characters
- MediumLowest Common Ancestor of a Binary Search Tree
- MediumLRU Cache
- MediumMeeting Rooms II
- MediumMerge Intervals
- MediumMin Stack
- MediumNetwork Delay Time
- MediumNext Greater Element
- MediumNumber of Connected Components in Undirected Graph
- MediumNumber of Islands
- MediumProduct of Array Except Self
- MediumRotate Image
- MediumSpiral Matrix
- MediumTop K Frequent Elements
- MediumValidate Binary Search Tree
- HardBinary Tree Maximum Path Sum
- HardCheapest Flights Within K Stops
- HardCoin Change II
- HardEdit Distance
- HardLargest Rectangle in Histogram
- HardMedian of Two Sorted Arrays
- HardSerialize and Deserialize Binary Tree
- HardSliding Window Maximum
- HardTrapping Rain Water
- HardWord Break II
- HardWord Search II (Trie + Backtracking)
High-Level System Design
Learning: 42 guides (11 free · 31 pro) · Practice: 20 problems (2 unlocked · 18 gated)
- Design a Chat System (WhatsApp)Free
- Design a Stock Exchange (Order Book & Matching Engine)Pro
- Design a URL Shortener (TinyURL)Free
- Design a Web Crawler at Google ScalePro
- Design an Event Ticketing System (BookMyShow / Ticketmaster)Pro
- Design Google Docs (Real-Time Collaborative Editing)Pro
- Design Google Maps (Routing, ETA & Map Tile Serving)Pro
- Design Netflix (Video Streaming)Pro
- Design Proximity Services (Yelp / Google Places)Pro
- Design Twitter/X Feed (News Feed)Pro
- Design Uber (Ride-Sharing Platform)Pro
- Distributed Transactions: 2PC, Saga Pattern, and Compensating TransactionsPro
- Event Sourcing and CQRS: Audit Logs, Temporal Queries, and Read/Write SeparationPro
- API Gateway Design: Auth, Rate Limiting, Routing, and BFF PatternsFree
+28 more in the Learn library
- MediumDesign a Distributed Rate Limiter
- MediumDesign a Search Typeahead / Autocomplete
- MediumDesign a Web Crawler
- MediumDesign URL Shortener
- HardDesign a Distributed Cache (Redis-like)
- HardDesign a Distributed Task Queue (Celery at Scale)
- HardDesign a Distributed Task Scheduler
- HardDesign a Live Streaming Platform (Twitch)
- HardDesign a Payment System / Ledger
- HardDesign Airbnb (Search + Booking)
- HardDesign an API Gateway (Auth + Rate Limiting + Routing + Circuit Breaking)
- HardDesign Google Drive
- HardDesign Google Maps (Routing + ETA + Real-time Traffic)
- HardDesign Netflix CDN (ABR Streaming + Edge Caching)
- HardDesign Slack (Real-time Messaging at Enterprise Scale)
- HardDesign Ticketmaster
- HardDesign Twitter / X
- HardDesign Uber / Lyft
- HardDesign WhatsApp / Messenger
- HardDesign YouTube / Netflix
ML System Design
Learning: 48 guides (13 free · 35 pro) · Practice: 19 problems (1 unlocked · 18 gated)
- ML Model Deployment Fundamentals: Shipping Safely in ProductionPro
- A/B Testing for ML Systems: Design, Statistical Rigor & Production PitfallsPro
- Cold Start: Full Architecture for New Users and New ItemsPro
- Counterfactual Evaluation: IPS, SNIPS, and Doubly Robust EstimatorsPro
- Data Pipelines for ML: Batch, Streaming, and Event ArchitecturePro
- Distributed Training: Data Parallelism, Model Parallelism, and FSDPPro
- Embeddings & Vector Databases: ANN Search at ScaleFree
- Experiment Tracking & Model Registry: The Version Control for MLPro
- Feature Stores: Online/Offline Architecture & Training-Serving ConsistencyPro
- GPU Infrastructure for ML Serving: Quantization, Batching & Inference OptimizationPro
- How to Approach an ML System Design InterviewFree
- How to Design at MLSD: Blank Whiteboard to Production MLFree
- ML Model Evaluation & Production Monitoring: Shadow Mode, A/B Testing & RollbackFree
- ML Monitoring & Drift Detection: Keeping Models Healthy in ProductionPro
+34 more in the Learn library
- MediumDesign an A/B Testing Platform for ML Models
- MediumML Model Monitoring & Observability
- HardDesign a Content Moderation System
- HardDesign a Fraud Detection System
- HardDesign a Production Feature Store
- HardDesign a Real-Time Fraud Detection System
- HardDesign a Recommendation System
- HardDesign a Recommendation System (Netflix/Spotify)
- HardDesign a Search Ranking System
- HardDesign a Search Ranking System (Google-scale)
- HardDesign Abuse/Spam Detection Pipeline
- HardDesign Ads Click-Through Rate (CTR) Prediction System
- HardDesign an Ads Click-Through Rate (CTR) Prediction System
- HardDesign an End-to-End ML Training Pipeline
- HardDesign an LLM Inference Serving Platform
- HardDesign Content Moderation System
- HardDesign ETA Prediction System
- HardDesign News Feed Ranking System
- HardDesign Query Understanding & Search Intent Classification
GenAI & LLMs
Learning: 39 guides (10 free · 29 pro) · Practice: 7 problems (1 unlocked · 6 gated)
- Chain-of-Thought, Test-Time Compute & Multi-Step ReasoningPro
- Diffusion Models for Images — DDPM, Latent Diffusion, CFG, Stable TrainingPro
- LLM Evaluation & Benchmarking — HELM, MMLU, MT-Bench, Arena, LLM-as-JudgePro
- LLM Fundamentals — Transformers, Attention & ArchitecturePro
- Long-Context LLMs — Lost in the Middle, RAG vs. Natively Long, KV Cache & PackingPro
- Multimodal LLMs — CLIP, Vision-Language Models & Production Vision APIsPro
- Structured Output, Function & Tool Calling — JSON Schema, Strict Mode, Agent SafetyPro
- Tokenization — BPE, WordPiece, SentencePiece & Production ArtifactsFree
- Embeddings — From word2vec to Instruction-Tuned Vectors & Production RAGFree
- Positional Encoding — Sinusoidal, RoPE, ALiBi & Context Length ExtrapolationPro
- Prompt Engineering: From Zero-Shot to Production SystemsFree
- Vector Search for GenAI: HNSW, IVF-PQ, FAISS, and ScaNN in ProductionPro
- Advanced RAG: Hybrid Retrieval, Reranking, and Production ArchitecturePro
- How to Approach a GenAI / LLM System InterviewFree
+25 more in the Learn library
SQL Practice
Learning: 5 guides (2 free · 3 pro) · Practice: 20 problems (2 unlocked · 18 gated)
- SQL Foundations for Data & ML Interviews: JOINs, Aggregations, and Window FunctionsFree
- SQL Indexes and Query Performance: B-Tree, Composite, and Covering IndexesPro
- Subqueries and CTEs: WITH Clauses, Correlated Subqueries, and Recursive PatternsPro
- SQL Query Optimization: Indexes, Query Plans, and Performance at ScalePro
- Window Functions for Analytics: ROW_NUMBER, RANK, LAG/LEAD, and Running TotalsFree
- EasyCustomers Who Never Ordered
- EasyFind and Deduplicate Records
- EasySecond Highest Salary
- MediumAttribution Modeling (Last-Touch vs First-Touch)
- MediumCompute DAU, WAU, MAU Metrics
- MediumCompute Median and Percentiles Without Built-ins
- MediumFunnel Analysis — Checkout Conversion
- MediumPivot — Monthly Revenue by Product Category
- MediumRunning Total and Moving Average
- MediumSelf-Referential Hierarchy (Manager Chain Depth)
- MediumTop N Records Per Group
- HardA/B Test Analysis in SQL
- HardCohort LTV Analysis
- HardExperiment Novelty Effect Analysis
- HardFeature Engineering for ML Models in SQL
- HardLongest Streak of Consecutive Active Days
- HardOrg Chart Traversal with Recursive CTE
- HardQuery Optimization Challenge
- HardUser Retention Cohort Analysis
- HardUser Session Detection with LAG
Scenarios
Learning: 9 guides (3 free · 6 pro) · Practice: 25 problems (2 unlocked · 23 gated)
- How to Approach Data & Product Scenario QuestionsFree
- Scenario Walkthrough: Engagement vs Revenue — Guardrails & HorizonFree
- Scenario Walkthrough: Marketplace Supply–Demand Imbalance — Liquidity FirstPro
- Scenario Walkthrough: Payment Service Returning 500s in ProductionPro
- Scenario Walkthrough: Post-Launch — Was This Feature a Success?Pro
- Scenario Walkthrough: Recommendation Model CTR Dropped 15% OvernightPro
- Scenario Walkthrough: The A/B Test Went Wrong — SRM, Peeking, and InterferencePro
- Scenario Walkthrough: Trust & Safety Escalation — Abuse Signals & ResponsePro
- Scenario Walkthrough: Why Is DAU Dropping?Free
- MediumConvince VP to Fund 6-Month ML Infrastructure Rebuild
- MediumDescribe a System You're Most Proud Of — and What You'd Do Differently
- MediumDesign an A/B Test for a New Ranking Model
- MediumHow Do You Grow From Senior to Staff Engineer?
- MediumHow Would You Define Success Metrics for a New Search Feature?
- MediumLeadership Wants to Know Why Your Model Works — How Do You Explain It?
- MediumOrg Reorg: Your Team Absorbed, Priorities Unclear, Half the Team Leaving
- MediumPartner Team Missed Critical API Deadline — Launch in 2 Weeks
- MediumPrioritize Technical Debt vs. Feature Delivery
- MediumTell Me About a Time You Disagreed With Your Manager
- HardA/B Test Is Positive But a Guardrail Metric Degraded — Do You Ship?
- HardCold-Start Launch: New Country, Zero Historical Data
- HardData Access Conflict: Privacy Team Blocks ML Training Data
- HardDAU Dropped 15% Overnight — Diagnose It
- HardExperiment Trade-Off: Engagement +8%, Retention -3%
- HardFeature Store is Adding 80ms to Real-Time Inference — What Are Your Options?
- HardML Model Accuracy Degraded 8% in Production — What Do You Do?
- HardModel Performs Well Offline (0.92 AUC), Poorly Online (CTR -10%)
- HardP0 at 3am: Payment Service Timing Out
- HardProduction Model Degrading — Rollback or Emergency Retrain?
- HardSilent Data Pipeline Failure — Models Degraded 7 Days Later
- HardStaff-Level Influence: Align 4 Teams Without Formal Authority
- HardWrite a Postmortem for an ML Model That Served Stale Predictions for 6 Hours
- HardYour Experiment Shows Novelty Effect — How Do You Detect and Correct For It?
- HardYour Recommendation Model Has a Fairness Problem
Statistics & A/B Testing
Learning: 7 guides (1 free · 6 pro) · Practice: 15 problems (1 unlocked · 14 gated)
- Statistics & Probability FoundationsPro
- A/B Testing & Experimentation at ScaleFree
- Sequential Testing & the Peeking Problem: Alpha Spending, SPRT, and Always-Valid InferencePro
- Multiple Testing: FWER, FDR, Bonferroni, Holm, and When Each FailsPro
- Bayesian A/B Testing vs Frequentist: Priors, Posteriors, Probability of Superiority, and Expected LossPro
- Practical vs Statistical Significance: MDE, Cohen's d, Confidence Intervals, and Business LossPro
- Non-parametric Statistics: Mann–Whitney U, Kruskal–Wallis & Permutation TestsPro
- MediumBootstrap Confidence Intervals
- MediumHypothesis Testing & Statistical Inference
- MediumImplement Power Analysis & Sample Size Calculation
- MediumMultiple Testing Correction (Bonferroni + Benjamini-Hochberg)
- MediumNovelty Effect Detection (Holdback Analysis)
- MediumProbability Distributions & Statistical Foundations
- HardBayesian Inference & Bayesian A/B Testing
- HardCausal Inference & Observational Studies
- HardDesign a Production A/B Testing Framework
- HardDifference-in-Differences Estimator
- HardHeterogeneous Treatment Effects (CATE)
- HardImplement CUPED Variance Reduction
- HardNetwork Effects & Cluster Randomization
- HardRegression Discontinuity Design (RDD)
- HardSequential Testing & Always-Valid P-Values
Engineering Craft
Learning: 13 guides (3 free · 10 pro) · Learn-first pillar — use Practise arena for cross-track drills
- Data Modeling for Product and Analytics SystemsPro
- Engineering Strategy: Turning Technical Direction into Business OutcomesPro
- How to Approach Craft Interviews: Behavioral, Incident, and Technical CommunicationPro
- How to Be a 10X Engineer: Leverage, Reliability, and Team MultiplicationFree
- Leadership Influence for Engineers: Driving Outcomes Without AuthorityPro
- Mentoring and Growth in Engineering TeamsPro
- Product Metrics & North Star: How Engineers Define and Own SuccessPro
- Staff+ Engineering Interviews: Strategy, Ambiguity, and Org-Level Technical LeadershipPro
- CI/CD Pipelines: Designing Safe, Fast Delivery for ML and SDE SystemsFree
- Data Engineering Pipelines: Reliability, Quality, and EvolutionPro
- STAR Behavioral Interview Stories: Structure, Archetypes, and Leveling SignalsFree
- Backend Engineer Interview Prep PathPro
- Data Scientist Interview Prep PathPro
Production Engineering
Learning: 20 guides (7 free · 13 pro) · Learn-first pillar — use Practise arena for cross-track drills
- Metric Anomaly Triage: Is This a Real Problem or an Instrumentation Bug?Pro
- Kubernetes Operations in Production: Safe Rollouts, Resource Controls, and Cluster GuardrailsPro
- On-Call Incident Response: The First 30 MinutesPro
- SLO Design: Error Budgets, Burn Rate Alerts, and the Reliability TradeoffFree
- Writing the Blameless Postmortem: RCAs That Actually Drive ChangePro
- A/B Test Critique: Finding Flaws in Experiment DesignsPro
- Cloud-Native Production Patterns: Stateless Services, Regions, and Cost-Aware ResiliencePro
- Feature Flags: Safe Rollouts, Kill Switches, and the Dark Launch PatternFree
- Distributed Systems Debugging: Causality, Partial Failures, and Tracing-Driven Root CausePro
- Rewrite vs. Refactor: How to Make the Call Without Destroying the BusinessPro
- AI-Assisted Development and Vibe Coding: Fast Output Without Quality CollapseFree
- Code Review Excellence: The Craft That Most Engineers Never LearnPro
- GitHub End-to-End Workflow: From Issue to Safe Production MergePro
- What Good Code Actually Looks Like: Engineering Craft Beyond the LinterPro
+6 more in the Learn library
Finish every module: read the guides, then solve problems in order. Use the global Practise hub for streaks and cross-track progress.