Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Design Search Autocomplete / Typeahead (Google-scale)

System design deep-dive for a search typeahead service delivering sub-100ms top-K suggestions at 10B queries/day using sharded tries, precomputed top-K at each node, multi-tier caching, real-time trending pipelines, distributed systems scalability, and personalization.

50 min read 3 sections 1 interview questions
Google Search TypeaheadAutocomplete DesignRadix TrieElasticsearch Prefix QueryKafkaFlink StreamingRedis Sorted SetCDN CachingSearch PersonalizationTrending QueriesConsistent Hashing ShardingSymSpell BK-Tree

Why Typeahead Is a Latency-Obsessed Read System

Search autocomplete is the inverse of a payment system — availability and latency dominate correctness. A stale suggestion is a mild annoyance; a 300ms suggestion is a product failure because users type faster than that and the UI feels laggy. Google's public latency bar for autocomplete is sub-100ms p99 end-to-end, including network, and the service itself aims for <30ms server-side. At 10B queries/day (~115K QPS average, ~500K peak), this latency budget changes what data structures are even candidates: no disk reads, no network hops between services, no database lookups per keystroke. The system has to return top-K suggestions for any prefix from tens of millions of candidate queries, and it must do so while also handling personalization, trending, spell correction, and multi-language support. The core insight senior candidates must surface: the entire working set lives in memory, the top-K is precomputed at every prefix node, and the trie is rebuilt offline — never mutated online. Candidates who propose 'query Elasticsearch on every keystroke' fail immediately; at 500K QPS, Elasticsearch would need thousands of nodes and still miss the latency bar. Candidates who say 'use a trie' without addressing how to update it without downtime, how to shard it, and how to attach top-K at each node are incomplete. The interview is testing whether you can hold in your head that typeahead is a read-only hot-path with async rebuild architecture.

Clarifying Questions to Ask First

01

What is the search corpus?

Web search (Google-style, open-ended queries), product search (Amazon, bounded catalog), internal search (Slack messages), or navigational (URL/app switcher)? The corpus size drives whether a pure trie fits in memory or needs a hybrid with inverted index.

02

What is the scale?

Queries per day, distinct queries, suggestions per prefix? Default assumption: 10B queries/day, 100M distinct queries, top-10 per prefix, p99 < 100ms end-to-end.

03

Personalized or global?

Global only is simpler. Per-user personalization requires a second layer and user-keyed storage. Typical mix: 7 global + 3 personal.

04

Multi-language and multi-region?

Each language needs its own trie (tokenization, stopwords, normalization differ). Regional variance matters: 'covid' in US vs India in 2020 had different trending candidates.

05

Trending / real-time?

Does the system reflect 'trending now' queries (breaking news, viral topics) with sub-minute freshness, or is daily/hourly rebuild acceptable? Real-time trending is a separate pipeline.

06

Spell correction?

Fuzzy matching for typos (Levenshtein distance <= 2 is typical). Adds a separate fallback path with BK-tree or SymSpell.

07

Safety and privacy?

Suggestions leaking PII (user's own past query shown to others) or violating policy (adult terms, self-harm) are a launch blocker. Blocklist is applied at query time, not just at ingest.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.