Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
RAG Architecture: From Basics to Production
GenAI & Agents
AI Agents & Agentic Systems Framework
GenAI & Agents
Multi-Agent Systems: Orchestration, LangGraph, and Production Patterns
GenAI & Agents
Agent Memory Systems: In-Context, Semantic, Episodic, and Procedural
GenAI & Agents
LLM & Agent Evaluation: Trajectories, RAGAS, LLM-as-Judge, and Hallucination Mitigation
GenAI & Agents
Agentic RAG: ReAct, Self-RAG, and Multi-Step Retrieval
Single-shot RAG fails on multi-hop questions and self-correction. Master agentic RAG patterns — ReAct loops, Self-RAG reflection tokens, FLARE, and tool-augmented retrieval — with latency budgets and failure modes every FAANG candidate must know.
Why Single-Shot RAG Fails at Scale
Single-shot RAG embeds the query, retrieves top-k chunks, and hands them to the LLM. For simple factual questions this is fine. But it breaks in three distinct and important ways that interviewers test specifically.
Multi-hop reasoning failures: "Who was the CEO of the company that acquired Slack?" requires first retrieving which company acquired Slack (Salesforce), then retrieving who Salesforce's CEO is (Marc Benioff). A single retrieval step can't answer this — it would need to embed "the company that acquired Slack" and hope the resulting vector happens to land near a chunk about Salesforce leadership. It won't.
Irrelevant context blindness: Standard RAG retrieves based on query-chunk similarity, but the LLM has no way to signal that the retrieved context is wrong and trigger a new retrieval. If your question about "Python GIL" retrieves chunks about "Python snake biology" due to an unfortunate embedding collision, the system hallucinates instead of re-querying.
Query-answer gap for exploratory tasks: When a user asks "Help me understand the tradeoffs of B-tree vs LSM-tree indexes," the right answer requires synthesizing multiple sub-topics. A single retrieval step either over-retrieves (diluting context) or under-retrieves (missing key sub-topics).
Agentic RAG solves this by turning retrieval into a dynamic decision process: the agent decides when to retrieve, what to query for, and whether retrieved context is sufficient before proceeding.
What Interviewers Are Testing
When you're asked to design a RAG system for a complex knowledge base, the interviewer wants to know if you'll default to naive single-shot retrieval or recognize when the query distribution demands something more sophisticated. The non-obvious signal: propose agentic RAG only when the query complexity justifies the latency cost. Defaulting to agentic RAG for a simple FAQ bot is an over-engineering red flag. Defaulting to single-shot RAG for a legal research assistant is an under-engineering red flag.