AI Agent Attack Surface Manual
AI agent attacks are not one thing. They show up as prompt injection, tool-handoff abuse, callback drift, memory poisoning, scope redefinition, outbound control, and dozens of other workflow failures. This manual groups those patterns by attack family so operators can understand the shape of the problem before it becomes unsafe action.
Sunglasses uses this page to explain a simple truth: the dangerous moment is often not the first prompt. It is the later workflow step where a tool result, callback, redirect, memory artifact, package endpoint, or state sync quietly gains authority over the next action.
Why this page exists
Most attack catalogs tell you that patterns exist. This page is here to show how those patterns cluster into the real trust failures teams hit when they deploy AI agents. A flat list can help a machine. Operators need more. They need to know what kind of attack they are looking at, why it matters specifically for agents, and what it looks like when a normal workflow quietly starts behaving like a compromised one.
How to use this manual
Open the chapter that matches the attack family you are investigating. Each chapter is its own page with authored prose, the surfaces it shows up on, and every matching Sunglasses detection pattern with a scoped search. Start with the family, then drill into the specific variants.
Attack Family Chapters
10 authored chapters grouping 721 patterns by attack family. Each chapter is its own page.
Additional Indexed Categories
16 categories not yet assigned to a chapter. Compact index only — these categories have no dedicated page.
JSON Data Endpoint
The full pattern database is available as a machine-readable JSON array. Every pattern exposes its id, name, category, severity, channel, keywords, and description. Regenerated on every version ship.
Endpoint: /patterns.json
Sunglasses is MIT-licensed. The pattern database is open for inspection, extension, and redistribution. View on GitHub
Frequently Asked
What attacks actually work against AI agents today?
The attacks that work best against AI agents are usually the ones that cross a trust boundary quietly, such as prompt injection, poisoned retrieval, tool-metadata abuse, forged workflow receipts, unsafe callbacks, and outbound actions that inherit more authority than they should.
Is prompt injection still the main AI agent risk?
Prompt injection is still one of the core AI agent risks, but it matters most as the starting layer beneath larger workflow failures like tool abuse, retrieval poisoning, callback drift, exfiltration, and unsafe publish paths.
What is the difference between prompt injection and retrieval poisoning?
Prompt injection is the broader instruction-channel attack, while retrieval poisoning is the case where hostile or strategically ranked external content gets pulled into the agent context and then influences reasoning, citations, or downstream action.
Why are authentication and guardrails not enough for agent security?
Authentication and guardrails are necessary but not sufficient because many AI agent failures happen after access is granted, when tool outputs, callbacks, workflow artifacts, or outbound destinations quietly reshape the next action.
What is MCP security really about?
MCP security is not only about who can call a tool server; it is also about whether tool descriptions, schemas, handoff metadata, outputs, and follow-on actions are being trusted more than they deserve.
What should security teams test first in an AI agent workflow?
Security teams should first test the surfaces where context becomes action: retrieved content, tool metadata, tool outputs, callbacks, handoffs, approvals, outbound destinations, and any memory or state that gets reused later.
What should vendors measure after blocking unsafe prompts?
After blocking unsafe prompts, vendors should measure whether the agent still makes risky tool calls, follows unsafe destinations, accepts forged workflow signals, leaks sensitive data, or widens authority through later workflow steps.
Related Resources
Generated 2026-05-20 from patterns.py · 721 patterns · 55 categories