Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
ML System Design: Demand Forecasting System
ML System Design
ML System Design: Real-Time Fraud Detection
ML System Design
MLSD Case Study: End-to-End Recommender System
ML System Design
A/B Testing for ML Systems: Design, Statistical Rigor & Production Pitfalls
ML System Design
Model Serving Architectures: Batch vs Real-Time, Shadow Deployments & Latency Budgets
ML System Design
ML System Design: Dynamic Pricing & Surge (Marketplace)
Design a production dynamic pricing system like Uber surge or DoorDash peak pricing — end-to-end. Covers the three problems most prep skips: pricing is a closed-loop control problem (price moves both supply and demand), the correct ML framing is contextual bandits or MDPs — not plain regression on historical fares — and guardrails (caps, fairness, emergency optics) that constrain the learned policy. Includes elasticity estimation, exploration budgets, and why naive demand forecasting without supply response mis-prices the marketplace.
Why Dynamic Pricing Is Not 'Predict the Multiplier'
Interview candidates often describe surge as: forecast open requests, divide by open drivers, run regression for surge multiplier. That misses the problem structure.
Price simultaneously affects both sides of the market. A higher fare depresses rider booking probability (demand) while increasing driver willingness to enter the area (supply). The marketplace clears where these curves meet. A model that predicts booking rate conditional on a price without modeling driver supply response to the same price optimizes a fiction: it assumes supply is exogenous. Production systems reason about elasticity and clearing — how much does demand drop and supply rise for a 1.1× vs 1.4× multiplier — often estimated from randomized price experiments and instrumental approaches, not from correlational ride logs alone.
The decision is sequential and partially observed. The platform chooses a price before it sees the full counterfactual of what would have happened at other prices. This is a bandit/MDP: context (geography, time, weather, event signals) → action (multiplier or discrete price bucket) → reward (contribution margin, wait-time SLA, or a composite). Academic work on ride-hailing pricing frequently formalizes the problem as a Markov decision process; scalable RL approaches (e.g. off-policy learning from logged prices) appear in the transportation literature for zone-level policies.
Hard product and regulatory constraints dominate the objective. Even if uncapped revenue optimization favors 3.0×, caps, "upfront" fare transparency, and emergency or disaster optics force policy guardrails — the engineering system must output feasible prices, not raw argmax. Interviewers at Uber, Lyft, DoorDash, Instacart, and airline yield-management teams probe whether you know these are first-class design inputs, not afterthoughts.
Production references: peer-reviewed work on driver response to surge (Chen & Sheldon, 2016, on Uber driver behavior) establishes that surge materially affects supply; ride-hailing MDP formulations (e.g. Lei & Ukkusuri, 2023) discuss deterministic optimal pricing policies in stylized settings. For marketplace intuition, Uber Engineering and Lyft's pricing research publish at high level; treat exact proprietary multipliers as confidential — use typical public ranges in discussion.
What Interviewers Are Evaluating
Mid-level: States supply/demand imbalance causes surge. Proposes features (time, weather, open drivers). Suggests regression for multiplier.
Senior-level: Frames bandit/MDP. Separates prediction (elasticity, ETAs) from decision (price). Mentions exploration and guardrails. Names Kafka/Flink for real-time state, feature store for zone aggregates, and A/B or switchback tests for price experiments.
Staff-level: Explains why historical fare data is off-policy for learning optimal prices. Proposes randomization (geo-holdouts, time-rotating buckets) to estimate elasticities. Designs multi-objective reward: GMV, take rate, rider wait p95, driver earnings floor. Maps fairness (protected geography) and cap constraints into the action space. Describes v1 rules → v2 ML elasticity → v3 contextual bandit with safety layers.