Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
AI Agents & Agentic Systems Framework
GenAI & Agents
HITL and Durable Agent Execution: Interrupt, Approve, Resume Safely
GenAI & Agents
Agent Governance Control Plane: Runtime Policy for LLM Tool Execution
GenAI & Agents
LLM Guardrails and Safety: Input/Output Filters, Red-Teaming, and Constitutional AI
GenAI & Agents
Multi-Agent Systems: Orchestration, LangGraph, and Production Patterns
GenAI & Agents
MCP Security and Tool Trust Boundaries for LLM Agent Systems
Design secure MCP tool integrations for LLM agents — tool poisoning, cross-server injection, TOFU pinning, AttestMCP attestation, and least-privilege tool design. Covers the threat model, layered defenses, and staff-level interview framing for production agent governance.
Why MCP Security Is a Distinct Problem from Prompt-Level Safety
The Model Context Protocol gives LLM agents a standardized way to call external tools, read resources, and execute actions across servers. That power introduces an attack surface that prompt-level guardrails cannot cover. Prompt injection defends the model's reasoning; MCP security defends the execution boundary between the model and the external world.
The core issue is architectural: MCP clients trust the servers they connect to, and servers trust the execution environment they run in. The official MCP SECURITY.md states this explicitly — it places server selection responsibility on users and administrators, not on the protocol. That means there is no built-in capability attestation, no cryptographic proof that a server's tool descriptions match its actual behavior, and no origin tagging on sampling requests. Every tool description, every response, and every resource enters the LLM context window as undifferentiated text.
This matters because tool descriptions are prompt injection vectors. Invariant Labs disclosed tool poisoning in April 2025, demonstrating SSH key exfiltration through a calculator tool's hidden description. The MCPTox benchmark (arXiv 2508.14925) tested 45 live MCP servers and found that the most capable frontier models had higher attack success rates — ~72.8% for o1-mini — because strong instruction-following makes them more compliant with poisoned metadata. A separate study found ~5.5% of publicly listed MCP servers contained tool poisoning vulnerabilities.
In interviews, the failure mode is treating MCP security as "just another prompt injection problem." It is not. MCP creates a trust propagation chain: user trusts client, client trusts server, server trusts environment. A single poisoned server contaminates every other server connected to the same client session. That cross-server trust propagation is the architectural vulnerability that protocol-level remediation must address.
What Interviewers Test on Agent Security and MCP Trust
Interviewers are testing whether you understand that MCP security is an architectural trust-boundary problem, not a prompt-filtering problem. A 9/10 answer defines the trust chain (user → client → server → environment), identifies where attestation is missing, names specific attacks (tool poisoning, cross-server injection, sampling hijack), and proposes layered defenses (TOFU pinning, response scanning, capability attestation, least-privilege tool design). A 6/10 answer says 'validate tool outputs' with no threat model.