Design the ETA prediction system used by Uber (DeepETA), Lyft, and DoorDash — end-to-end. Covers the two architectural insights that define this problem: the physics-first hybrid approach (ML refines a routing engine, not replaces it) and the online-offline feature split that makes sub-10ms inference possible. Includes H3 geohashing for spatial features, the Linear Transformer trick for latency, quantile regression for uncertainty, and the compounding error problem in multi-stop predictions.

50 min read 3 sections 1 interview questions

ETA PredictionSpatiotemporal FeaturesOnline FeaturesRouting EngineH3 GeohashingDeepETALinear TransformerQuantile RegressionReal-Time MLFeature FreshnessResidual Learning

Why ETA Prediction Is More Than a Regression Problem

ETA prediction looks deceptively simple: given origin, destination, and current conditions, predict how many minutes a trip will take. But in production at Uber's scale (100M+ trips per day, global coverage, <10ms per prediction), it's one of the highest-QPS ML inference problems in existence.

The scale is extreme. Uber's ETA model is the highest QPS (queries per second) ML model they run — called for every price estimate, every driver-rider matching decision, every ETD (Estimated Time of Dropoff) calculation, every "when will my order arrive?" query. A single ETA model services mobility (rides), Eats (food delivery), Freight, and other business lines simultaneously with a single architecture. Millions of calls per minute.

The physics is complex and partially available. Road networks, traffic signals, speed limits, and historical traffic patterns are partially encoded in map data. This is "physical model knowledge" that pure ML ignores at its peril. The correct architecture is not "replace the routing engine with ML" but "use ML to correct the routing engine's systematic errors."

The features are spatiotemporally non-stationary. A feature that was predictive at this road segment at 8 AM on a Tuesday is not predictive at 8 PM on a Friday. A feature useful in San Francisco is not useful in São Paulo. The model must learn spatial and temporal patterns simultaneously without memorizing region-specific noise.

The latency budget is brutal. ETA is called during interactive user sessions (user waiting to see ride price), during driver matching (latency adds to wait time), and during batch optimization (route planning). The budget is 5–15ms end-to-end for the ML inference step. This rules out any architecture that requires sequential network calls or expensive attention over large sequences.

This problem is asked at Uber, Lyft, DoorDash, Instacart, Amazon Logistics, FedEx, Google Maps, Apple Maps, and any company with a time-sensitive delivery or transportation product.

TIP

What Interviewers Are Evaluating

Mid-level: Knows ETA is a regression problem. Can list relevant features (distance, time of day, traffic). Knows you need temporal features (rush hour, day of week). Can describe a GBDT baseline.

Senior-level: Understands the physics-first hybrid architecture (routing engine + ML post-processor). Designs the online-offline feature split correctly (routing engine output is online; historical speed profiles are offline). Knows H3 geohashing for spatial feature encoding. Understands why the transformer must be made linear for latency. Can design the latency budget for the ML step (<10ms).

Staff-level: Identifies the compounding error problem in multi-stop ETAs and proposes end-to-end training. Addresses uncertainty quantification (quantile regression for P10/P50/P90 instead of point estimate). Designs the feedback loop: ETA prediction drives pricing → pricing affects driver supply → supply affects ETA → model must avoid learning from its own distortions. Proposes the cross-city generalization strategy.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade