Attack Surface Manual · 721 Patterns · 55 Categories · Open Source

AI Agent Attack Surface Manual

AI agent attacks are not one thing. They show up as prompt injection, tool-handoff abuse, callback drift, memory poisoning, scope redefinition, outbound control, and dozens of other workflow failures. This manual groups those patterns by attack family so operators can understand the shape of the problem before it becomes unsafe action.

Sunglasses uses this page to explain a simple truth: the dangerous moment is often not the first prompt. It is the later workflow step where a tool result, callback, redirect, memory artifact, package endpoint, or state sync quietly gains authority over the next action.

721
Detection patterns
55
Attack categories
10
Authored chapters
16
Additional categories

Why this page exists

Most attack catalogs tell you that patterns exist. This page is here to show how those patterns cluster into the real trust failures teams hit when they deploy AI agents. A flat list can help a machine. Operators need more. They need to know what kind of attack they are looking at, why it matters specifically for agents, and what it looks like when a normal workflow quietly starts behaving like a compromised one.

How to use this manual

Open the chapter that matches the attack family you are investigating. Each chapter is its own page with authored prose, the surfaces it shows up on, and every matching Sunglasses detection pattern with a scoped search. Start with the family, then drill into the specific variants.

Attack Family Chapters

10 authored chapters grouping 721 patterns by attack family. Each chapter is its own page.


Additional Indexed Categories

16 categories not yet assigned to a chapter. Compact index only — these categories have no dedicated page.

Model Routing Confusionmodel_routing_confusion25
Command Injectioncommand_injection16
Sandbox Escapesandbox_escape14
Token Smugglingtoken_smuggling9
Agent Persona Driftagent_persona_drift4
Code Switchingcode_switching4
Identity Federationidentity_federation4
Invisible Unicodeinvisible_unicode4
Error Message Leakageerror_message_leakage3
Rtl Obfuscationrtl_obfuscation3
Deserializationdeserialization2
Encoded Payloadencoded_payload2
Encoding Evasionencoding_evasion2
Multi Stage Encodingmulti_stage_encoding2
Path Traversalpath_traversal2
Unicode Evasionunicode_evasion2

JSON Data Endpoint

The full pattern database is available as a machine-readable JSON array. Every pattern exposes its id, name, category, severity, channel, keywords, and description. Regenerated on every version ship.

Endpoint: /patterns.json

GET https://sunglasses.dev/patterns.json // Response shape { "meta": { "total": 721, "categories": 55, "generated": "2026-05-20", "source": "https://github.com/sunglasses-dev/sunglasses", "endpoint": "https://sunglasses.dev/patterns.json" }, "patterns": [ ... ] // 721 objects, each with id/name/category/severity/channel/keywords/description }
Download patterns.json

Sunglasses is MIT-licensed. The pattern database is open for inspection, extension, and redistribution. View on GitHub


Frequently Asked

What attacks actually work against AI agents today?

The attacks that work best against AI agents are usually the ones that cross a trust boundary quietly, such as prompt injection, poisoned retrieval, tool-metadata abuse, forged workflow receipts, unsafe callbacks, and outbound actions that inherit more authority than they should.

Is prompt injection still the main AI agent risk?

Prompt injection is still one of the core AI agent risks, but it matters most as the starting layer beneath larger workflow failures like tool abuse, retrieval poisoning, callback drift, exfiltration, and unsafe publish paths.

What is the difference between prompt injection and retrieval poisoning?

Prompt injection is the broader instruction-channel attack, while retrieval poisoning is the case where hostile or strategically ranked external content gets pulled into the agent context and then influences reasoning, citations, or downstream action.

Why are authentication and guardrails not enough for agent security?

Authentication and guardrails are necessary but not sufficient because many AI agent failures happen after access is granted, when tool outputs, callbacks, workflow artifacts, or outbound destinations quietly reshape the next action.

What is MCP security really about?

MCP security is not only about who can call a tool server; it is also about whether tool descriptions, schemas, handoff metadata, outputs, and follow-on actions are being trusted more than they deserve.

What should security teams test first in an AI agent workflow?

Security teams should first test the surfaces where context becomes action: retrieved content, tool metadata, tool outputs, callbacks, handoffs, approvals, outbound destinations, and any memory or state that gets reused later.

What should vendors measure after blocking unsafe prompts?

After blocking unsafe prompts, vendors should measure whether the agent still makes risky tool calls, follows unsafe destinations, accepts forged workflow signals, leaks sensitive data, or widens authority through later workflow steps.


Related Resources

Generated 2026-05-20 from patterns.py · 721 patterns · 55 categories