Single-shot RAG fails on multi-hop questions and self-correction. Master agentic RAG patterns — ReAct loops, Self-RAG reflection tokens, FLARE, and tool-augmented retrieval — with latency budgets and failure modes every FAANG candidate must know.

50 min read 3 sections 1 interview questions

RAGAgentic RAGReActSelf-RAGFLAREMulti-Step RetrievalLLM AgentsTool UseVector DatabaseLangChainLlamaIndexRetrieval Augmented Generation

Why Single-Shot RAG Fails at Scale

Single-shot RAG embeds the query, retrieves top-k chunks, and hands them to the LLM. For simple factual questions this is fine. But it breaks in three distinct and important ways that interviewers test specifically.

Multi-hop reasoning failures: "Who was the CEO of the company that acquired Slack?" requires first retrieving which company acquired Slack (Salesforce), then retrieving who Salesforce's CEO is (Marc Benioff). A single retrieval step can't answer this — it would need to embed "the company that acquired Slack" and hope the resulting vector happens to land near a chunk about Salesforce leadership. It won't.

Irrelevant context blindness: Standard RAG retrieves based on query-chunk similarity, but the LLM has no way to signal that the retrieved context is wrong and trigger a new retrieval. If your question about "Python GIL" retrieves chunks about "Python snake biology" due to an unfortunate embedding collision, the system hallucinates instead of re-querying.

Query-answer gap for exploratory tasks: When a user asks "Help me understand the tradeoffs of B-tree vs LSM-tree indexes," the right answer requires synthesizing multiple sub-topics. A single retrieval step either over-retrieves (diluting context) or under-retrieves (missing key sub-topics).

Agentic RAG solves this by turning retrieval into a dynamic decision process: the agent decides when to retrieve, what to query for, and whether retrieved context is sufficient before proceeding.

IMPORTANT

What Interviewers Are Testing

When you're asked to design a RAG system for a complex knowledge base, the interviewer wants to know if you'll default to naive single-shot retrieval or recognize when the query distribution demands something more sophisticated. The non-obvious signal: propose agentic RAG only when the query complexity justifies the latency cost. Defaulting to agentic RAG for a simple FAQ bot is an over-engineering red flag. Defaulting to single-shot RAG for a legal research assistant is an under-engineering red flag.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade