Operator-Grade · Ingestion-First · Free · Independent Research

The AI Agent Hardening Manual

An operator-grade AI agent hardening manual built around one rule: scan and enforce at the content-ingestion trust boundary before unsafe text becomes action.

Eight chapters. Each maps a threat to a control, a test, and a remediation. Updated as research lands. No sign-up. No paywall.

Last updated: April 11, 2026 · Maintained by Sunglasses

The 8 Chapters

Status: 1 chapter live, 1 in progress, 6 planned. Roadmap shifts with the research.

CHAPTER 01 Live

Foundation: AI Agent Security 101

The introductory chapter. What an AI agent is, why it can be attacked through content, and the four trust boundaries every deployment must enforce.

Read chapter →
CHAPTER 02 In Progress

The Hardening Checklist

The shipping checklist for production AI agents. Threat → control matrix, implementation steps, three real case studies, validation tests. Drops within 24 hours.

CHAPTER 03 Planned

Coding Agent Security

The unique threat model of agents that read repos, run commands, and write code. MCP boundary failures, terminal-as-attack-surface, supply-chain trust.

CHAPTER 04 Planned

Pre-Ingestion Scanning

The architecture for scanning untrusted content before it enters an agent's context window. Pattern coverage, false-positive budgets, performance trade-offs.

CHAPTER 05 Planned

Supply Chain & MCP Security

How agent toolchains get compromised. Skill registries, MCP server poisoning, prompt-injection in dependencies. Detection and isolation patterns.

CHAPTER 06 Planned

Memory & Session Boundaries

Where one user's context ends and the next begins. Cross-session leakage, persistent memory poisoning, retrieval-time injection.

CHAPTER 07 Planned

Red-Team Test Cases

The fixture suite. Reusable test cases for prompt injection, exfiltration, tool abuse, and evasion. Pattern-DB-grounded, regression-tested.

CHAPTER 08 Planned

Incident Response Runbook

What to do when a control fires. Triage, containment, evidence preservation, post-mortem template. The runbook nobody else publishes.

Preview Chapters Already Published

Existing reports and research that map directly into the manual structure.

Threat Intelligence
28K+ Requests in 9 Days
WordPress bot probes against a non-WordPress site. Maps to Chapter 04 (pre-ingestion) and Chapter 05 (supply chain).
Incident Report
Claude Code Supply Chain
Real GHSA cycle pool. Maps to Chapter 05 (supply chain & MCP security).
Malware Analysis
Axios RAT Scan
BlueNoroff/Lazarus malware caught in 3.67ms. Maps to Chapter 04 (pre-ingestion scanning).

Get notified when the next chapter ships

One email per chapter. No newsletter spam. Unsubscribe in one click.

Frequently Asked

If your question is not here, message the team via /contact.

What is the AI Agent Hardening Manual?

An operator-grade hardening manual for AI agents, built around one rule: scan and enforce at the content-ingestion trust boundary before unsafe text becomes action. Each chapter maps a threat to a control, a test, and a remediation. Independent research. Free.

Who is this manual for?

Engineers, security teams, and founders shipping AI agents to production. If you call an LLM with content you did not write, this manual is for you.

How is this different from OWASP, MITRE, NIST?

OWASP, MITRE, and NIST publish taxonomies and policy frameworks. This manual focuses on operator runbooks. We tell you what to scan, where to gate, how to test, and what to do when a control fires. We are not affiliated with any of those organizations and we encourage reading their materials alongside ours.

Is the content original and verifiable?

Yes. Every chapter is based on our own pattern database, our own scanner output, and publicly available security advisories (with citations). We do not copy proprietary content. Every quantitative claim links to its source. Comparisons to other projects reflect publicly available materials at the time of writing.

Are you affiliated with Anthropic, OpenAI, Google DeepMind, Microsoft, OWASP, MITRE, NIST, HiddenLayer, Lakera, or Protect AI?

No. Sunglasses is independent. No partnership, sponsorship, or endorsement from any of those organizations. References to their work are commentary on publicly available materials.

Is this connected to Anthropic's Mythos model or Project Glasswing?

No. We are independent. But our wedge is the same problem space: AI systems doing security work. Mythos finds vulnerabilities. Sunglasses keeps the AI agent itself from becoming the vulnerability.

How often is the manual updated?

Continuously. Every chapter has a last-updated date. New chapters ship as research lands. Pattern updates ship into the open-source scanner.

Can I contribute?

Yes. The scanner is open source on GitHub. The manual takes pull requests for new chapters, case studies, and pattern fixtures. Reach out via /contact.