Open Research · v1.0 · 2026-04-14

The MCP Attack Atlas

An open catalogue of attack patterns against AI agents using the Model Context Protocol. 40+ verified patterns across 14 attack families. Each pattern names the attack, shows the fixture, and names the detection angle.

40+
Published patterns
14
Attack families
1
Live CVE cited
MIT
Research license
TL;DR

AI agents that use MCP have a trust boundary with their tools, their memory, and their approval loop. Attackers exploit that boundary. The Atlas catalogues 40+ distinct ways they do it — from emoji homoglyph policy evasion to memory eviction rehydration poisoning to tool docstring directive bleed. Grouped into 14 families so defenders can think in classes, not tricks. Two patterns (state replay poisoning, tool metadata smuggling) match a real published CVE: CVE-2026-40159 / GHSA-pj2r-f9mw-vrcq. The rest are observed in internal fixture corpora or flagged as hypothesis. No benchmark theater.

✦ Honesty Notice

Every pattern in this Atlas has been fact-checked by a multi-agent audit. Claims that failed verification were removed. Patterns cite either a live external reference (CVE / GHSA / paper) or an internal fixture corpus — never uncited effectiveness numbers. When a pattern is a hypothesis without evidence, we say so. This is the anti-benchmark-theater stance we hold competitors to, applied to ourselves.

Verified external — cited CVE/GHSA/paper confirmed live
Observed internally — fixture corpus + FP/FN tested
Hypothesis — theoretical, pending validation

🔥 Top 10 Featured Patterns

Ten patterns picked by the audit for clarity, novelty, and immediate applicability. Each is publish-ready.

🛡️ Live CVE Confirmed: MCP Subprocess Env Exposure

Two patterns in the Atlas correspond to a live published vulnerability, not just internal research:

GHSA-pj2r-f9mw-vrcq · CVE-2026-40159

PraisonAI — Sensitive Env Exposure via Untrusted MCP Subprocess Execution

The MCP subprocess execution path in PraisonAI exposed sensitive environment variables when launching untrusted tool subprocesses. This real, published advisory maps cleanly to two Atlas patterns: STATE_REPLAY_POISONING and TOOL_METADATA_SMUGGLING, both of which require the subprocess isolation boundary to hold. Verified live in the GitHub Advisory Database.

View advisory on GitHub →

Verified external

🗂️ The 14 Attack Families

Each family groups patterns that share a root mechanic. Implementers should defend classes, not instances.

🎭 Identity & Role Confusion5 patterns

Exploits that confuse the agent about who it is, what role it's playing, or what sandbox it's in.

  • SIMULATION_MODE_CONFUSION · roleplay framing override
  • WORKING_DIRECTORY_CONTEXT_CONFUSION · cwd assumption drift
  • SANDBOX_BOUNDARY_CONFUSION · runtime trust-tier conflation
  • MULTI_AGENT_ROLE_BINDING_DESYNC · role-to-agent drift
  • MODEL_CAPABILITY_CLAIM_FORGERY · forged capability assertion
🛡️ Policy & Guardrail Bypass4 patterns

Attacks that invert, shadow, or bypass the agent's enforcement layer without breaking it visibly.

  • VERIFICATION_GATE_BYPASS · gate-to-deploy atomicity miss
  • ABSTENTION_SUPPRESSION_COERCION · "don't say no" framing
  • PERMISSION_SCOPE_ALIASING · canonical-vs-alias scope drift
  • POLICY_EVALUATOR_POISONING · evaluator input tampering
🔐 Evidence & Provenance4 patterns

Attacks that forge, alias, or fracture the evidence trail the agent uses to decide what's trustworthy.

  • PROVENANCE_CHAIN_FRACTURE · trust-chain inheritance break
  • EVIDENCE_HASH_COLLISION · domain-separator binding miss
  • TRUST_SIGNAL_SPOOFING · forged verifier signature
  • FRESHNESS_BADGE_SPOOFING · stale-as-new
⚖️ Decision Gating & HITL4 patterns

Exploits that target the human-in-the-loop approval layer and the post-decision audit trail.

  • APPROVAL_HASH_COLLISION · approval-vs-execution divergence
  • DECISION_TRACE_FORGERY · post-hoc rationale poisoning
  • APPROVAL_CHANNEL_DESYNC · summary ↔ payload mismatch
  • CONSENT_TOKEN_CONFUSION · scope-binding drift
🧠 Memory & Context Manipulation5 patterns

Attacks that plant, evict, compress, or replay memory to carry context across trust boundaries.

  • CONTEXT_WINDOW_RESET_POISONING · session-reset framing abuse
  • MEMORY_EVICTION_REHYDRATION_POISONING · plant-now trigger-later
  • MEMORY_POLICY_GRAFTING · persistence via policy blend
  • PROMPT_CACHE_POISONING · cache-layer injection
  • TRANSCRIPT_SUMMARIZER_AUTHORITY_FLIP · summarizer-as-author
🔧 Tool & Schema Abuse5 patterns

Attacks that ride on tool metadata, docstrings, result ordering, and schema aliasing.

  • TOOL_DOCSTRING_DIRECTIVE_BLEED · metadata-as-instructions
  • TOOL_METADATA_SMUGGLING · concealed directive in manifest (CVE-2026-40159 adjacent)
  • TOOL_OUTPUT_SHADOWING · output overriding same-name prior
  • TOOL_RESULT_PROVENANCE_FORGERY · source-tag fabrication
  • STRUCTURED_OUTPUT_SCHEMA_TRAPDOORS · schema-valid payload attack
🎛️ Control Plane & Orchestration3 patterns

Attacks on the agent's orchestrator, routing, delegation, and hand-off layer.

  • DELEGATION_ORACLE_ABUSE · sub-agent privilege inflation
  • CAPABILITY_NEGOTIATION_DOWNGRADE · forced-weak-feature pick
  • CAPABILITY_DISCOVERY_SIDECHANNELS · recon-then-exploit chain
📊 Observability / Telemetry2 patterns

Attacks that poison or suppress the monitoring layer so the attack itself goes unseen.

  • TRUST_SIGNAL_SPOOFING · green-when-red
  • PROMPT_CACHE_POISONING · telemetry inheritance from poisoned entry
🔤 Encoding / Canonicalization4 patterns

Attacks that ride on encoding mismatches between scanner and consumer.

  • EMOJI_HOMOGLYPH_POLICY_EVASION · Unicode lookalike bypass
  • MULTI_STAGE_ENCODING_CAMOUFLAGE · decode-ladder smuggling
  • REPRESENTATION_BOUNDARY_POLYGLOTS · JSON/YAML/shell polyglot
  • STREAM_REASSEMBLY_DESYNC · split-payload per-chunk bypass
🧮 Baseline / Eval Integrity1 pattern

Attacks on the calibration layer that determines "normal" vs "anomalous".

  • NEGATIVE_CONTROL_CONTAMINATION · clean-sample taint
💰 Resource & Budget Abuse1 pattern

Attacks that manipulate quota/rate/budget signals to trigger permissive fallbacks.

  • ZERO_VALUE_COERCION · numeric-coercion edge bypass
🎬 Cross-Modal / Multimodal1 pattern

Attacks that cross the image/audio/OCR boundary into the text planner channel.

  • CROSS_MODAL_BRIDGE_ABUSE · OCR-as-instructions
⏱️ Temporal / Race3 patterns

Attacks that exploit timing, staleness, and idempotency assumptions.

  • IDEMPOTENCY_REPLAY_ABUSE · safety feature weaponised
  • CANARY_ROTATION_RACE · rotation-window timing window
  • TIME_OF_CHECK_TIME_OF_USE_DESYNC · classic TOCTOU re-surfaced
🧩 State, Session & Misc3 patterns

Patterns that don't cleanly fit other buckets — state handling, session resumption, and persistence.

  • STATE_REPLAY_POISONING · session-state attack (CVE-2026-40159 adjacent)
  • SESSION_RESUMPTION_AUTHORITY_CONFUSION · resume-as-fresh-approval
  • WORKFLOW_VERSION_PINNING_ABUSE · pinned-to-vulnerable

How to use this Atlas

If you build or run an AI agent that uses MCP, read one family per week and ask: does my agent defend against this class? Patterns listed here map to detection rules in the Sunglasses scanner — install once, update regularly, and new patterns land automatically.

Install Sunglasses Source on GitHub

What's next on the Atlas

This is v1.0. The full internal research library has more pattern candidates under validation. A new Atlas entry is promoted after it passes a multi-agent fact-check audit and has at least one verifiable internal fixture or external reference. Patterns that fail verification are held, not published.

Audit history for v1.0: 5 parallel Sonnet agents scanned 169 candidate files. One claimed citation was initially flagged as hallucinated but turned out to be real (the CVE above). The audit agent's retraction is logged publicly as a matter of process. Honest > clean.