Skip to main content
GenAI & Agents·Advanced

AI Agents & Agentic Systems Framework

Comprehensive guide to building production agentic AI systems — from ReAct patterns and tool design to multi-agent orchestration, memory, and evaluation. The fastest-growing area in AI engineering.

80 min read 17 sections 10 interview questions
agentsagentic systemsLLMtool useReActmulti-agentmemoryorchestration

What is an AI Agent?

An AI agent is a system where an LLM takes actions (tool calls) based on observations, iterating until a task is complete. Unlike a simple LLM call, agents can: (1) Use external tools (search, code execution, APIs) (2) Maintain state across multiple steps (3) Make decisions dynamically based on intermediate results (4) Complete open-ended, multi-step tasks autonomously Key distinction: Chains (fixed sequences) vs Agents (dynamic decision-making). An agent decides WHICH tool to call next based on the current state.

The ReAct Pattern (Reason + Act)

01

Thought

LLM reasons about the current state and what to do next. Internal monologue not shown to user.

02

Action

LLM selects a tool and provides arguments as structured JSON.

03

Observation

Tool executes and returns results. Results are added to the context.

04

Repeat

LLM sees the observation and decides next step. Continues until task is complete (no more tool calls needed).

ReAct Agent Loop — Reason, Act, Observe, Repeat

Rendering diagram...

Tool Design Principles

Tools are the actuators of an agent — the quality of tool design determines reliability. Each tool needs: (1) Clear unambiguous name, (2) Precise description of when and how to use it, (3) Minimal typed parameters, (4) Predictable structured output. Common tool types: read_file, write_file, run_shell, web_search, database_query, send_email, call_api, run_tests, code_execution. Critical: Make tools idempotent when possible. Retrying a tool call should be safe. Avoid side effects in read tools. Scope write tools to prevent catastrophic actions.

Tool Error Contract — Recoverable vs Terminal

Most agent failures are not reasoning failures — they are tool contract failures: ambiguous names, weak schemas, and useless errors. A production tool must return a structured error contract that lets the agent reason about whether to retry, repair, or escalate. Required fields: error_code (machine-readable), error_message (human-readable cause), recoverable (boolean — can the agent fix this and retry?), and retry_after_sec (for transient failures). "Error: file not found" leaves an agent looping; "FILE_NOT_FOUND, recoverable=true, hint: use list_files() to discover paths" gives the agent a path forward.

Minimal Tool Error Contract

jsontool_error.json
{
  "error_code": "RATE_LIMITED",
  "error_message": "Too many requests for this tenant",
  "recoverable": true,
  "retry_after_sec": 2
}

Retry Classification — Three Error Classes

01

Transient (retry with backoff)

Timeout, rate-limit, network error → bounded exponential backoff with jitter. Cap at 3 retries; longer cap risks runaway cost.

02

Correctable (let the model repair args)

Schema mismatch, missing required field, invalid argument → return validation error and let the agent try once or twice with corrected arguments.

03

Terminal (no retry, escalate)

Permission denied, unknown tool, policy blocked → no retry; force alternate plan, return diagnostic, or escalate to human. Retrying terminal errors wastes cost and can amplify side effects.

04

Loop detection

Hash (tool, normalized_args). If the same failing call repeats 3+ times, abort with diagnostic. Same-args repetition is the signature of a stuck agent.

05

Idempotency for side-effects

Every mutating call carries a client-generated operation ID. The backend deduplicates repeated submissions, so retries cannot duplicate emails, charges, or deletes.

IMPORTANT

Four Non-Negotiables for Production Agents

Every production agent needs four control mechanisms, not just one or two: (1) Termination semantics — explicit done action plus hard max-steps guard; (2) Tool contract enforcement — strict schema validation before execution, machine-readable errors after; (3) State discipline — scoped memory boundary (working vs persisted) with deterministic summarization; (4) Observability + policy controls — per-step tracing, token/cost accounting, permission boundaries, HITL approval for destructive actions. Teams that skip any one of these ship fragile demos, not production systems.

Agent Patterns Comparison

PatternDescriptionWhen to UseDrawback
Sequential ChainFixed pipeline A→B→CDependent steps with known flowNo parallelism
Parallel Fan-OutMultiple agents run concurrentlyIndependent subtasksCoordination overhead
Supervisor/WorkerOrchestrator delegates to specialistsComplex tasks needing expertise routingMore complex debugging
Dynamic Graph (LangGraph)Conditional routing based on stateComplex workflows with loopsHardest to reason about
Critic/RefinementGenerator + evaluator loopQuality-sensitive outputsRisk of infinite loops

Multi-Agent Architecture — Supervisor + Specialist Workers

Rendering diagram...
TIP

When to Use Multi-Agent

Use multi-agent only when: (1) Task is too large for one context window, (2) Subtasks genuinely benefit from specialization, (3) Parallel execution provides significant wall-clock speedup. Don't add agents just because you can. A single well-prompted agent with good tools is often simpler, cheaper, and more reliable than a multi-agent system.

The Four Types of Agent Memory

01

Working Memory

Current conversation context window. Fast, limited to ~128K tokens. Gone after session.

02

Episodic Memory

Past conversations stored in vector DB. Retrieved by semantic similarity + recency. Persists weeks/months.

03

Semantic Memory

Extracted facts and preferences about users/context. Structured key-value store. Persists indefinitely.

04

Procedural Memory

Learned patterns about HOW to respond. Stored as prompt templates or fine-tuned adapters. Most durable.

Context Window Management

The core engineering challenge in agentic systems: codebases, documents, and histories are far larger than any context window. You must be surgical. Hierarchy: (1) Always-in-context: system prompt, repo map, user facts. (2) Retrieved-on-query: semantically relevant files/chunks (~20K tokens). (3) On-demand: exact file contents when tool calls read them. Avoid: putting everything in context, never summarizing, letting context grow unbounded. These lead to "lost in the middle" failures where the LLM ignores critical information because it's buried in a huge context.

Agent Failure Modes & Fixes

Failure ModeSymptomRoot CauseFix
Lost ContextRe-reads same filesWorking memory overflowStructured scratchpad + summarization
Tool HallucinationWrong tool argumentsLLM invents parametersStrict JSON schema validation with retries
Infinite LoopSame action repeatedlyNo progress detectionTrack attempted approaches; add loop detector
Over-confidenceDoesn't ask for clarificationMissing uncertainty modelingAdd "confidence" to tool outputs
Scope creepModifies unrelated filesWeak scopingExplicit allowed-path restrictions
Stale ContextActs on outdated observationsToo many steps between updatesRe-read key resources periodically

LLM Evaluation for Agents

Evaluating agents is harder than evaluating single LLM calls because: (1) Tasks are open-ended, (2) Multiple valid paths exist, (3) Non-deterministic. Key metrics: Task completion rate (do tests pass?), Turn efficiency (tool calls per task), Tool error rate (% of tool calls that fail), Quality (human or LLM-as-judge rating). Standard benchmark: SWE-bench (GitHub issues, pass associated tests). State-of-the-art: ~50-60% pass rate as of 2026.

Safety Guardrails for Production Agents

01

Input validation

Check for prompt injection attempts. Sanitize user inputs before they reach tool calls.

02

Scope limiting

Restrict tool access to allowed paths, domains, APIs. Never allow unrestricted filesystem or network access.

03

Destructive action confirmation

For irreversible actions (delete, send email, push code), require explicit human confirmation.

04

Output validation

Check agent outputs before surfacing to users. Use separate safety classifier if needed.

05

Audit logging

Log every tool call with inputs/outputs. Essential for debugging and compliance.

06

Rate limiting

Limit tool calls per session to prevent runaway agents consuming excessive resources.

Interview Questions

Click to reveal answers
Test your knowledge

Sign in to take the Quiz

This topic has 17 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.

Sign in to take quiz →
Ready to put it into practice?

Start Solving

You've covered the theory. Now implement it from scratch and run your solution against hidden test cases.

Open Coding Problem →