Design secure MCP tool integrations for LLM agents — tool poisoning, cross-server injection, TOFU pinning, AttestMCP attestation, and least-privilege tool design. Covers the threat model, layered defenses, and staff-level interview framing for production agent governance.

28 min read 3 sections 1 interview questions

MCPTOFU PinningATTESTMCPTrail of Bits mcp-context-protectorTool PoisoningCross-Server InjectionMicrosoft AGTCapability AttestationMCPTox BenchmarkOWASP Agentic Top 10LLM Agent SecurityOPA Rego Policy EngineLlamaFirewallInvariant Labs

Why MCP Security Is a Distinct Problem from Prompt-Level Safety

The Model Context Protocol gives LLM agents a standardized way to call external tools, read resources, and execute actions across servers. That power introduces an attack surface that prompt-level guardrails cannot cover. Prompt injection defends the model's reasoning; MCP security defends the execution boundary between the model and the external world.

The core issue is architectural: MCP clients trust the servers they connect to, and servers trust the execution environment they run in. The official MCP SECURITY.md states this explicitly — it places server selection responsibility on users and administrators, not on the protocol. That means there is no built-in capability attestation, no cryptographic proof that a server's tool descriptions match its actual behavior, and no origin tagging on sampling requests. Every tool description, every response, and every resource enters the LLM context window as undifferentiated text.

This matters because tool descriptions are prompt injection vectors. Invariant Labs disclosed tool poisoning in April 2025, demonstrating SSH key exfiltration through a calculator tool's hidden description. The MCPTox benchmark (arXiv 2508.14925) tested 45 live MCP servers and found that the most capable frontier models had higher attack success rates — ~72.8% for o1-mini — because strong instruction-following makes them more compliant with poisoned metadata. A separate study found ~5.5% of publicly listed MCP servers contained tool poisoning vulnerabilities.

In interviews, the failure mode is treating MCP security as "just another prompt injection problem." It is not. MCP creates a trust propagation chain: user trusts client, client trusts server, server trusts environment. A single poisoned server contaminates every other server connected to the same client session. That cross-server trust propagation is the architectural vulnerability that protocol-level remediation must address.

IMPORTANT

What Interviewers Test on Agent Security and MCP Trust

Interviewers are testing whether you understand that MCP security is an architectural trust-boundary problem, not a prompt-filtering problem. A 9/10 answer defines the trust chain (user → client → server → environment), identifies where attestation is missing, names specific attacks (tool poisoning, cross-server injection, sampling hijack), and proposes layered defenses (TOFU pinning, response scanning, capability attestation, least-privilege tool design). A 6/10 answer says 'validate tool outputs' with no threat model.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade