Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Sections

0/2

Related Guides

MLSD Case Study: Search Ranking System

ML System Design

50m

Vector Search for GenAI: HNSW, IVF-PQ, FAISS, and ScaNN in Production

GenAI & Agents

42m

RAG Architecture: From Basics to Production

GenAI & Agents

55m

← Back to Library

ML System Design·Advanced

ML System Design: Query Understanding — Rewriting, Expansion, Classification, and Spell Correction

Design the query understanding stack behind web search, e-commerce search, and internal enterprise retrieval — tokenization, spelling, intent classification, synonym expansion, PII redaction, and safe query rewriting for vector + lexical hybrid retrieval. Covers how Amazon-style search decomposes the problem into cascaded lightweight models under single-digit millisecond budgets before heavy ranking.

52 min read 2 sections 1 interview questions

Query UnderstandingSearch RankingQuery RewritingSpell CorrectionIntent ClassificationSynonym ExpansionBERTElasticSearchRetrievalPII RedactionLatency BudgetE-commerce Search

Query Understanding Sits Upstream of Retrieval Quality

Bad queries waste FAISS ANN budget and poison BM25 signals. QU stack normalizes language: Unicode, casing, spelling, language detection, intent routing (product vs support doc), expansion with controlled vocabulary, and PII stripping before logging or sending to third-party LLMs.

Interviews expect a latency cascade — microseconds for rules, sub-ms for small classifiers, 1–3 ms for compact transformers on CPU — before retrieval at ~10–30 ms typical for hybrid stacks at moderate QPS.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade