Attack Surface Manual · 1103 Patterns · 65 Categories · Open Source

AI Agent Attack Surface Manual

AI agent attacks are not one thing. They show up as prompt injection, tool-handoff abuse, callback drift, memory poisoning, scope redefinition, outbound control, and dozens of other workflow failures. This manual groups those patterns by attack family so operators can understand the shape of the problem before it becomes unsafe action.

Sunglasses uses this page to explain a simple truth: the dangerous moment is often not the first prompt. It is the later workflow step where a tool result, callback, redirect, memory artifact, package endpoint, or state sync quietly gains authority over the next action.

1103

Detection patterns

Attack categories

Authored chapters

Additional categories

Read the hardening manual Browse the chapters View JSON Start with AI Agent Security 101

Why this page exists

Most attack catalogs tell you that patterns exist. This page is here to show how those patterns cluster into the real trust failures teams hit when they deploy AI agents. A flat list can help a machine. Operators need more. They need to know what kind of attack they are looking at, why it matters specifically for agents, and what it looks like when a normal workflow quietly starts behaving like a compromised one.

How to use this manual

Open the chapter that matches the attack family you are investigating. Each chapter is its own page with authored prose, the surfaces it shows up on, and every matching Sunglasses detection pattern with a scoped search. Start with the family, then drill into the specific variants.

Attack Family Chapters

10 authored chapters grouping 1103 patterns by attack family. Each chapter is its own page.

Chapter 01 · 133 patterns

Prompt Injection

Prompt injection is the attack family where untrusted text gets treated as instructions, causing an AI agent to change what it believes, what it prioritizes, or what it does next.

Open chapter →

Chapter 02 · 141 patterns

MCP / Tool-Handoff Abuse

MCP and tool-handoff abuse happens when tool descriptions, schemas, server metadata, or handoff fields become authority-bearing instructions that steer an agent before or during tool use.

Open chapter →

Chapter 03 · 17 patterns

Callback / Redirect Trust Drift

Callback and redirect trust drift happens when an approved workflow quietly extends trust to a new destination, service hop, or retry path that never earned the same authority as the original action.

Open chapter →

Chapter 04 · 60 patterns

Outbound Endpoint Control / C2-Style Drift

Outbound endpoint control failures happen when an AI agent keeps its permissions but starts sending data, requests, or follow-up actions toward destinations that quietly reshape the workflow or leak sensitive context.

Open chapter →

Chapter 05 · 47 patterns

Policy Scope Redefinition

Policy scope redefinition is the attack family where mandatory controls get reframed as optional guidance, appendix material, or lower-priority context so the workflow remains nominally compliant while its real authority expands.

Open chapter →

Chapter 06 · 113 patterns

State Sync Poisoning

State sync poisoning happens when shared state, synchronized context, or cross-system memory carries unsafe assumptions into later decisions that the receiving agent did not independently verify.

Open chapter →

Chapter 07 · 15 patterns

Memory / Persistence Poisoning

Memory and persistence poisoning happens when saved prompts, retained instructions, sticky context, or durable workflow artifacts turn a one-time manipulation into a recurring control surface.

Open chapter →

Chapter 08 · 30 patterns

Package / Dependency / Registry Trust Abuse

Package, dependency, and registry trust abuse happens when an AI agent treats a package source, skill, connector, or update path as trustworthy operational truth and later executes that trust with high privilege.

Open chapter →

Chapter 09 · 27 patterns

Browser-Agent Navigation / Link Safety Abuse

Browser-agent navigation abuse happens when links, redirects, forms, or page cues quietly reshape what the agent trusts next, even though the browsing flow still looks normal to the operator.

Open chapter →

Chapter 10 · 213 patterns

Agent Workflow / Publish-Path Abuse

Agent workflow and publish-path abuse happens when the glue between planning, approval, scheduling, execution, review, or publishing becomes the attack surface and unsafe actions start to look like normal operations.

Open chapter →

Chapter 11 · 142 patterns

Discovery File Poisoning

Discovery file poisoning is the attack family where hostile instructions are hidden inside files agents read during site, documentation, security, app, or feed discovery, causing public metadata to behave like policy.

Open chapter →

Additional Indexed Categories

23 categories not yet assigned to a chapter. Compact index only — these categories have no dedicated page.

Model Routing Confusionmodel_routing_confusion28

Structured Metadata Poisoningstructured_metadata_poisoning17

Command Injectioncommand_injection16

Sandbox Escapesandbox_escape14

Api Descriptor Poisoningapi_descriptor_poisoning12

Build Metadata Poisoningbuild_metadata_poisoning10

Token Smugglingtoken_smuggling9

Cicd Metadata Poisoningcicd_metadata_poisoning8

Supply Chain Attestation Poisoningsupply_chain_attestation_poisoning8

Repo Metadata Poisoningrepo_metadata_poisoning6

Agent Persona Driftagent_persona_drift5

Code Switchingcode_switching4

Identity Federationidentity_federation4

Invisible Unicodeinvisible_unicode4

Error Message Leakageerror_message_leakage3

Multi Stage Encodingmulti_stage_encoding3

Rtl Obfuscationrtl_obfuscation3

Deserializationdeserialization2

Encoded Payloadencoded_payload2

Encoding Evasionencoding_evasion2

Path Traversalpath_traversal2

Unicode Evasionunicode_evasion2

Mcp Tool Injectionmcp_tool_injection1

JSON Data Endpoint

The full pattern database is available as a machine-readable JSON array. Every pattern exposes its id, name, category, severity, channel, keywords, and description. Regenerated on every version ship.

Endpoint: /patterns.json

GET https://sunglasses.dev/patterns.json

// Response shape
{
  "meta": {
    "total": 1103,
    "categories": 65,
    "generated": "2026-07-04",
    "source": "https://github.com/sunglasses-dev/sunglasses",
    "endpoint": "https://sunglasses.dev/patterns.json"
  },
  "patterns": [ ... ]  // 1103 objects, each with id/name/category/severity/channel/keywords/description
}

Download patterns.json

Sunglasses is MIT-licensed. The pattern database is open for inspection, extension, and redistribution. View on GitHub

Frequently Asked

What attacks actually work against AI agents today?

The attacks that work best against AI agents are usually the ones that cross a trust boundary quietly, such as prompt injection, poisoned retrieval, tool-metadata abuse, forged workflow receipts, unsafe callbacks, and outbound actions that inherit more authority than they should.

Is prompt injection still the main AI agent risk?

Prompt injection is still one of the core AI agent risks, but it matters most as the starting layer beneath larger workflow failures like tool abuse, retrieval poisoning, callback drift, exfiltration, and unsafe publish paths.

What is the difference between prompt injection and retrieval poisoning?

Prompt injection is the broader instruction-channel attack, while retrieval poisoning is the case where hostile or strategically ranked external content gets pulled into the agent context and then influences reasoning, citations, or downstream action.

Why are authentication and guardrails not enough for agent security?

Authentication and guardrails are necessary but not sufficient because many AI agent failures happen after access is granted, when tool outputs, callbacks, workflow artifacts, or outbound destinations quietly reshape the next action.

What is MCP security really about?

MCP security is not only about who can call a tool server; it is also about whether tool descriptions, schemas, handoff metadata, outputs, and follow-on actions are being trusted more than they deserve.

What should security teams test first in an AI agent workflow?

Security teams should first test the surfaces where context becomes action: retrieved content, tool metadata, tool outputs, callbacks, handoffs, approvals, outbound destinations, and any memory or state that gets reused later.

What should vendors measure after blocking unsafe prompts?

After blocking unsafe prompts, vendors should measure whether the agent still makes risky tool calls, follows unsafe destinations, accepts forged workflow signals, leaks sensitive data, or widens authority through later workflow steps.