Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
ML System Design: Query Understanding — Rewriting, Expansion, Classification, and Spell Correction
Design the query understanding stack behind web search, e-commerce search, and internal enterprise retrieval — tokenization, spelling, intent classification, synonym expansion, PII redaction, and safe query rewriting for vector + lexical hybrid retrieval. Covers how Amazon-style search decomposes the problem into cascaded lightweight models under single-digit millisecond budgets before heavy ranking.
Query Understanding Sits Upstream of Retrieval Quality
Bad queries waste FAISS ANN budget and poison BM25 signals. QU stack normalizes language: Unicode, casing, spelling, language detection, intent routing (product vs support doc), expansion with controlled vocabulary, and PII stripping before logging or sending to third-party LLMs.
Interviews expect a latency cascade — microseconds for rules, sub-ms for small classifiers, 1–3 ms for compact transformers on CPU — before retrieval at ~10–30 ms typical for hybrid stacks at moderate QPS.