Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

ML System Design: Dynamic Pricing & Surge (Marketplace)

Design a production dynamic pricing system like Uber surge or DoorDash peak pricing — end-to-end. Covers the three problems most prep skips: pricing is a closed-loop control problem (price moves both supply and demand), the correct ML framing is contextual bandits or MDPs — not plain regression on historical fares — and guardrails (caps, fairness, emergency optics) that constrain the learned policy. Includes elasticity estimation, exploration budgets, and why naive demand forecasting without supply response mis-prices the marketplace.

58 min read 3 sections 1 interview questions
Dynamic PricingSurge PricingContextual BanditsMarketplace MLTwo-Sided PlatformPrice ElasticityThompson SamplingLinUCBReinforcement LearningCausal InferenceGuardrail MetricsUberDoorDashLyft

Why Dynamic Pricing Is Not 'Predict the Multiplier'

Interview candidates often describe surge as: forecast open requests, divide by open drivers, run regression for surge multiplier. That misses the problem structure.

Price simultaneously affects both sides of the market. A higher fare depresses rider booking probability (demand) while increasing driver willingness to enter the area (supply). The marketplace clears where these curves meet. A model that predicts booking rate conditional on a price without modeling driver supply response to the same price optimizes a fiction: it assumes supply is exogenous. Production systems reason about elasticity and clearing — how much does demand drop and supply rise for a 1.1× vs 1.4× multiplier — often estimated from randomized price experiments and instrumental approaches, not from correlational ride logs alone.

The decision is sequential and partially observed. The platform chooses a price before it sees the full counterfactual of what would have happened at other prices. This is a bandit/MDP: context (geography, time, weather, event signals) → action (multiplier or discrete price bucket) → reward (contribution margin, wait-time SLA, or a composite). Academic work on ride-hailing pricing frequently formalizes the problem as a Markov decision process; scalable RL approaches (e.g. off-policy learning from logged prices) appear in the transportation literature for zone-level policies.

Hard product and regulatory constraints dominate the objective. Even if uncapped revenue optimization favors 3.0×, caps, "upfront" fare transparency, and emergency or disaster optics force policy guardrails — the engineering system must output feasible prices, not raw argmax. Interviewers at Uber, Lyft, DoorDash, Instacart, and airline yield-management teams probe whether you know these are first-class design inputs, not afterthoughts.

Production references: peer-reviewed work on driver response to surge (Chen & Sheldon, 2016, on Uber driver behavior) establishes that surge materially affects supply; ride-hailing MDP formulations (e.g. Lei & Ukkusuri, 2023) discuss deterministic optimal pricing policies in stylized settings. For marketplace intuition, Uber Engineering and Lyft's pricing research publish at high level; treat exact proprietary multipliers as confidential — use typical public ranges in discussion.

IMPORTANT

What Interviewers Are Evaluating

Mid-level: States supply/demand imbalance causes surge. Proposes features (time, weather, open drivers). Suggests regression for multiplier.

Senior-level: Frames bandit/MDP. Separates prediction (elasticity, ETAs) from decision (price). Mentions exploration and guardrails. Names Kafka/Flink for real-time state, feature store for zone aggregates, and A/B or switchback tests for price experiments.

Staff-level: Explains why historical fare data is off-policy for learning optimal prices. Proposes randomization (geo-holdouts, time-rotating buckets) to estimate elasticities. Designs multi-objective reward: GMV, take rate, rider wait p95, driver earnings floor. Maps fairness (protected geography) and cap constraints into the action space. Describes v1 rules → v2 ML elasticity → v3 contextual bandit with safety layers.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.