Deep-dive system design for Google Maps covering map tile serving at CDN scale, Contraction Hierarchies-based routing on a 36M-node road graph, and real-time ETA prediction from 100M GPS probe devices — with production numbers, algorithm tradeoffs, and what senior engineers get wrong.

55 min read 4 sections 1 interview questions

Google Maps System DesignContraction HierarchiesMap Tile XYZ SchemeRouting Graph AlgorithmETA Prediction GNNCDN Tile ServingPostGIS Spatial QueryMapbox Vector Tiles MVTGPS Probe Data PipelineBidirectional DijkstraRoad Graph PreprocessingReal-Time Traffic FlinkGeospatial Systems Design

Why Google Maps Is Three Hard Problems, Not One

Most candidates treat Google Maps as a single system — it is not. It is three fundamentally different engineering problems bundled into one product, each with distinct scale characteristics, data models, and latency requirements. Conflating them in an interview signals you have not thought deeply about the system. Problem 1: Map tile serving. The visual map is composed of millions of static image tiles. At zoom level 14, the world is divided into ~268M tiles. Serving these tiles to 1B+ monthly active users at negligible latency is a CDN caching problem, not a database problem — >99% of tile requests must be cache hits at zoom levels below 12. Problem 2: Routing on a massive graph. The road network for the contiguous US (CONUS) alone has ~36M nodes and ~58M directed edges. Computing the shortest path from New York to Los Angeles on this graph naively takes minutes with Dijkstra. Google routes it in under 10ms — not by using a faster computer, but by running a fundamentally different algorithm: Contraction Hierarchies (CH), a preprocessing technique that computes shortcut edges offline so online queries skip irrelevant portions of the graph. This is the single most important thing most candidates don't know: routing is a preprocessing problem, not a real-time algorithm problem. Problem 3: ETA from live traffic. A route's travel time depends on current road speeds, which change every few seconds. ~100M Android/Waze devices anonymously report GPS locations; Google aggregates these into per-segment speed estimates and re-weights the routing graph in near real-time. The ETA prediction then layers ML on top — accounting for historical patterns, day-of-week, weather, and events — to achieve ~3% median error on highways and ~10% in dense cities. An interview that conflates these three problems or treats any one of them shallowly will not pass the senior bar.

IMPORTANT

What Interviewers Are Actually Testing

On Google Maps, interviewers at senior/staff level are testing three things: (1) Do you know that Dijkstra does not scale to 36M nodes at query time and can name a preprocessing-based alternative (Contraction Hierarchies, A* with landmarks, Hub Labeling)? (2) Do you understand the tile pyramid (zoom 0 = 1 tile, zoom 20 = 1 trillion tiles) and why CDN cache hit rate is the single most important metric, not origin server capacity? (3) Do you distinguish historical traffic (ML-predicted time-of-day patterns) from live traffic (probe GPS aggregated in near real-time)? Missing any one of these signals a surface-level answer. Naming Contraction Hierarchies by their inventor (Geisberger et al., 2008) and explaining the shortcut-edge mechanism in two sentences often ends the algorithm portion of the interview favorably.

Clarifying Questions to Ask Before Designing

What scope — routing only or full maps product?

Routing only (shortest path, ETA) is a much narrower problem than the full maps product (tile serving, Places API, Street View, transit). Confirm scope before drawing any boxes. This case study covers routing, ETA, and tile serving — not Places, Street View, or ride-sharing dispatch.

What navigation modes are required?

Turn-by-turn driving navigation? Walking? Cycling? Public transit? Each mode uses a different graph and different edge weights. Driving requires road-speed data. Transit requires GTFS schedule data and a completely different graph model. Confirm at minimum driving + walking.

Is real-time traffic required?

A static routing system (shortest distance, fixed speed limits) is 10x simpler than a live-traffic system. If real-time traffic is required, confirm whether the expectation is historical patterns (day/time ML) or actual probe GPS data. Assume both for a senior design.

Offline maps support?

Offline routing requires pre-downloading a compressed graph and tile set to the device. This adds a graph compression and distribution pipeline. Confirm if this is in scope — Google Maps downloads ~300MB per region for offline use.

API scale — consumer app only or third-party API too?

Google Maps Platform serves 5M+ API calls/day from third-party developers, separate from the consumer app. Third-party routing APIs add rate limiting, billing, and abuse detection complexity. Start with consumer app; mention API tier as an extension.

What is the target query latency for routing?

Turn-by-turn navigation expects route computation in under 200ms end-to-end. Cross-continent queries (New York to LA) must complete in the same window. This SLO is what rules out naive Dijkstra and forces preprocessing.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade