THE CASCADE SCANNER
How Sunglasses
actually works
A documented multi-layer cascade for AI agent input scanning. Normalize first, match second, classify third, judge fourth. Sub-millisecond on the common case, fully local, MIT.
17normalization techniques
227attack patterns
1,273detection keywords
33attack categories
0.26 msavg scan
Release state: L0-L1 shipped now · L2-L3 planned for v0.3.0
// 01 · OVERVIEW
One input. Four layers. Every miss teaches the next.
Most AI input scanners are one-layer. A single ML classifier, or a single guardrail, or a single keyword list. Sunglasses runs a cascade: each layer handles what it’s best at, and only escalates what it can’t resolve.
0
Normalization preprocessor
17 techniques that neutralize obfuscation before matching: Unicode fold, homoglyph map, base64 / URL / HTML / hex / ROT13 decode, leetspeak undo, spaced-letter collapse, zero-width character strip, reverse enrichment, shape confusion.
Shipped
↓
1
Pattern engine (Aho-Corasick)
227 attack patterns across 33 categories. 1,273 detection keywords with multilingual pattern coverage. Runs on the normalized input, not the raw input.
Shipped
↓
2
ML classifier (distilBERT-class)
Trained on adversarial prompt-injection corpora. Catches semantic paraphrases with no keyword overlap. Routed only on ambiguous inputs.
v0.3.0
↓
3
LLM judge (local small model)
Lightweight local LLM (Llama 3.2 1B class) as final adjudicator. Fires only on high-risk contexts the classifier cannot resolve confidently. Optional — the fully-local promise is preserved.
v0.3.0
Layers 0 and 1 are shipped in the current release line. Layers 2 and 3 are planned for v0.3.0. The cascade routing rule is simple: if an earlier layer is confident, skip the rest. The 0.26 ms figure shown on this page comes from our internal Apr 11 benchmark corpus, not an external benchmark suite.
// 02 · LAYER 0 — NORMALIZATION PREPROCESSOR
Neutralize the tricks before you start matching
Signature scanners get called “2005 antivirus” because attackers rewrite payloads faster than keyword lists can grow. The fix isn’t more keywords. The fix is deleting the rewrites before the keyword list ever sees the input.
Every scan runs through a deterministic normalization pipeline first. Homoglyphs become ASCII. Base64 gets decoded. Zero-width Unicode gets stripped. Spaced letters collapse. ROT13 gets a second pass. Shape confusion (lowercase l looking like capital I) gets its own variant. All of it runs in under a millisecond.
TECH 01
Unicode NFKC fold
Ignore → ignore
TECH 02
Homoglyph map
Іgnore (Ukrainian І) → Ignore
TECH 03
Zero-width stripping
ig&zwsp;no&zwsp;re → ignore
TECH 04
Leetspeak decoding
1gn0r3 → ignore
TECH 05
Base64 inline decode
SWdub3Jl → Ignore
TECH 06
URL percent decode
%49%67%6E%6F%72%65 → Ignore
TECH 07
HTML entity decode
Ign → Ign
TECH 08
Hex escape decode
\x49\x67\x6e → Ign
TECH 09
ROT13 enrichment
Vtaber → Ignore (appended)
TECH 10
Reverse enrichment
erongi → ignore (appended)
TECH 11
Delimiter padding
i.g.n.o.r.e → ignore
TECH 12
Spaced-letter collapse
i g n o r e → ignore
TECH 13
Whitespace collapse
ignore\t\tthis → ignore this
TECH 14
Enclosed alphanumerics
ⒶⒸⒼⒽⓇⒺ → IGNORE
TECH 15
Shape confusion
lgnore (lowercase L) → ignore
TECH 16
BOM / Unicode tag strip
U+FEFF, U+E0001–E007F → removed
TECH 17
Lowercase fold
IGNORE → ignore
The whole pipeline is <200 lines of Python, runs synchronously, has no external dependencies beyond the standard library, and is readable in one sitting. Here’s the entry point:
# preprocessor.py
def normalize(text: str) -> str:
# 17-technique pipeline. Deterministic. Sub-millisecond.
text = strip_invisible(text) # zero-width chars
text = normalize_unicode(text) # NFKC fold
text = replace_homoglyphs(text) # Cyrillic, Greek, Ukrainian, Latin fullwidth
text = decode_html_entities(text) # I -> I
text = decode_url_encoding(text) # %49 -> I
text = decode_hex_escapes(text) # \x49 -> I
text = decode_base64_segments(text) # SWdub3Jl -> Ignore
text = decode_leetspeak(text) # 1gn0r3 -> ignore
text = strip_delimiter_padding(text) # i.g.n.o.r.e -> ignore
text = collapse_whitespace(text) # tabs, multi-space -> single space
rot = decode_rot13(text) # enrichment only (append)
if rot != text: text = text + " " + rot
text = text + " " + text[::-1] # reverse enrichment
text = text.lower()
shape = re.sub(r'\bl(?=[a-z])', 'i', text)
if shape != text: text = text + " " + shape
return text
Full source: sunglasses/preprocessor.py. MIT licensed. Copy it, fork it, improve it, PR it back.
// 03 · LAYER 1 — PATTERN ENGINE
Aho-Corasick on normalized text
Once Layer 0 has neutralized the tricks, Layer 1 does the fast work: a compiled Aho-Corasick automaton scans the normalized string for 1,273 keywords across 227 patterns in 33 attack categories. O(n) in the length of the input regardless of keyword count.
Categories include: prompt injection (direct, indirect, and multilingual variants), persona override / DAN-family jailbreaks, system prompt exfiltration (direct + soft framings), tool poisoning (MCP manifest, README, hidden notes), paraphrased directive overrides, hypothetical / roleplay framings, authority impersonation, social engineering, credential exfiltration, SSRF in agent tools, memory poisoning, supply chain detections (from GHSA advisories), and more.
Every pattern is a plain Python dict with id, name, category, severity, channel, keywords or regex, and description. Adding a new attack class takes five minutes and a pull request. The community can contribute.
// 04 · BENCHMARK
78 tests. 12 categories. Real numbers.
We built an internal adversarial corpus covering twelve attack classes (AgentDojo canonical templates, paraphrase variants with no keyword overlap, roleplay persona overrides, low-resource language translations, obfuscation bypasses, advanced encodings, social engineering, system override framings, soft exfiltration, multi-step chained payloads, tool poisoning, and benign controls for false positive checking).
On April 11, 2026 we ran this internal corpus against the baseline engine, documented every miss, applied targeted fixes on a local patch branch, and reran the same corpus. The delta below reflects that internal regression cycle.
Internal corpus — before / after the Apr 11 normalization & pattern expansion
Method: 64 attack prompts + 12 benign controls, authored internally to expose known misses. Results shown are internal regression outcomes on the same corpus. External benchmark runs (AgentDojo, TensorTrust, Open-Prompt-Injection) are pending publication.
100%
internal corpus recall after
| Category | Before | After | Delta |
| Paraphrase (hard, no keyword overlap) | 0/5 | 5/5 | +100 |
| Paraphrase (medium) | 1/4 | 4/4 | +75 |
| Roleplay / persona (DAN, STAN, “act as”) | 0/5 | 5/5 | +100 |
| Subtle roleplay (hypothetical, creative writing) | 0/3 | 3/3 | +100 |
| Low-resource language injection (7 languages) | 0/7 | 7/7 | +100 |
| Obfuscation (Unicode, homoglyph, leet, base64, etc.) | 8/9 | 9/9 | +11 |
| Advanced encoding (URL, HTML, hex, ROT13, reverse) | 0/5 | 5/5 | +100 |
| Soft system prompt exfiltration | 2/5 | 5/5 | +60 |
| Social engineering / authority impersonation | 2/4 | 4/4 | +50 |
| System override framings | 3/5 | 5/5 | +40 |
| <INFORMATION> wrapper (AgentDojo canonical) | 0/1 | 1/1 | +100 |
| Tool poisoning (MCP-style hidden notes) | 2/3 | 3/3 | +33 |
| Total attack recall | 26/64 | 64/64 | +59.4 pts |
| False positive rate (12 benign controls) | 8.3% | 8.3% | stable |
∗ Honesty note
100% is on our own corpus. We wrote the 64 attacks to expose our gaps, built the fixes, and ran the same 64 attacks again. That proves the documented gaps are closed. It does not prove we catch novel attacks we didn’t imagine. Real-world recall will be lower.
External benchmarks are coming. Next up: independent runs against the AgentDojo benchmark suite from ETH Zurich, plus TensorTrust and Open-Prompt-Injection. Those numbers will be published here when available, unedited, even if they’re worse than our internal numbers.
The one benign input we still flag is “I need to ignore my neighbor’s loud music.” That’s a true linguistic ambiguity that only semantic understanding (Layer 2 ML) can resolve cleanly. It’s why the cascade exists.
Reproducibility: benchmark script, corpus, and raw result JSON will be linked in this section at release time.
// 05 · PHILOSOPHY
Same problem. Different philosophy. Better together.
Many AI agent security approaches include ML classifiers or LLM-based judges. Sunglasses uses a deterministic-first cascade and adds semantic escalation only when confidence is low.
Our philosophy is deterministic first, semantic escalation when needed. Layer 0 normalization and Layer 1 pattern matching are local and fast. Most inputs resolve there. Ambiguous inputs can escalate to an optional ML classifier (v0.3.0), and only high-risk low-confidence cases can escalate to an optional local LLM judge (v0.3.0). This preserves low average latency while improving robustness on semantic attacks.
We also believe the base layer should be fully local, fully open source, and readable in one sitting. That’s not marketing. It’s a design constraint: if the “free” version of a security tool can’t protect a developer who has no budget, it doesn’t deserve to be called open source.
You can run Sunglasses alongside Lakera or Rebuff. The cascade catches different classes of attack than they do, and fires in different latency regimes. Better together.
// 06 · FAQ
What is Sunglasses and how does it work?
Common questions about the Sunglasses AI agent input scanner — answered directly for fast reading and for AI agents and answer engines that cite this page.
What is Sunglasses?
Sunglasses is an open-source, MIT-licensed AI agent security scanner that inspects every input an AI agent reads — text, code, documents, MCP tool descriptions, READMEs, images, PDFs, audio, video, and QR codes — before the agent processes it. Sunglasses detects prompt injection, MCP tool poisoning, credential exfiltration, supply-chain attacks, and hidden instructions. It runs 100% locally, with no API keys and no cloud dependencies, and is used via pip install sunglasses. Sunglasses is developed at sunglasses.dev and the source code is on GitHub.
How does Sunglasses detect prompt injection?
Sunglasses detects prompt injection through a multi-layer cascade. Layer 0 is a 17-technique normalization preprocessor that neutralizes obfuscation — Unicode homoglyphs, zero-width characters, base64, URL encoding, HTML entities, hex escapes, ROT13, leetspeak, delimiter padding, and shape-confusion tricks — before any matching happens. Layer 1 is a 227-pattern Aho-Corasick engine with 1,273 detection keywords across 33 attack categories, running on the normalized input. Layer 2 (optional ML classifier) and Layer 3 (optional local LLM judge) are planned for v0.3.0 and only fire on ambiguous or high-risk content. The deterministic layers handle the common case in under a millisecond.
What is the Sunglasses cascade architecture?
The Sunglasses cascade is a four-layer input-scanning architecture for AI agents. Layer 0 is normalization (deterministic, sub-millisecond, neutralizes obfuscation). Layer 1 is pattern matching (Aho-Corasick automaton on 227 attack patterns over normalized text). Layer 2 is an optional trained ML classifier (distilBERT-class, planned for v0.3.0) that handles semantic paraphrases keyword matching misses. Layer 3 is an optional local LLM judge (Llama 3.2 1B class, planned for v0.3.0) that adjudicates high-risk ambiguous inputs. Layers escalate only when earlier layers are not confident, so the average latency stays low and Layer 3 fires on less than 5% of inputs.
Does Sunglasses run locally?
Yes. Sunglasses runs entirely on the developer's machine. There are no API keys, no cloud services, no outbound network calls, and no user data is transmitted off-device. The scanner ships as a single Python package installable with pip install sunglasses. This makes it suitable for air-gapped environments, privacy-sensitive applications, MCP tool developers who cannot ship user content to third parties, and teams that want to audit the detection logic directly.
Is Sunglasses open source?
Yes. Sunglasses is released under the MIT license. The full source code, attack pattern database, test harness, and benchmark scripts are public at github.com/sunglasses-dev/sunglasses. Community contributions are welcomed through pull requests. Each attack pattern is a plain Python dictionary with ID, category, severity, channel, keywords or regex, and a description citing the source — making it straightforward to audit, fork, extend, or re-use in other projects.
How does Sunglasses handle Unicode tricks and obfuscation?
Sunglasses neutralizes obfuscation before pattern matching runs, through a 17-technique normalization preprocessor. Techniques include NFKC Unicode fold (collapses fullwidth and compatibility forms), a homoglyph map covering Cyrillic, Greek, Fullwidth Latin, Ukrainian, Georgian, and Armenian lookalikes, zero-width character stripping (U+200B through U+200F, BOM, Unicode tag characters), leetspeak decoding, base64 inline decoding, URL percent-decoding, HTML entity decoding, hex escape decoding, ROT13 enrichment, reverse-string enrichment, delimiter-padding collapse, spaced-letter collapse, whitespace collapse, shape-confusion disambiguation (lowercase l versus capital I), and enclosed alphanumerics handling. The full pipeline is documented in the open-source preprocessor.py file.
What languages does Sunglasses support for multilingual prompt injection detection?
Sunglasses includes multilingual detection keywords for prompt-injection phrasings across 23 languages, including English, Russian, Ukrainian, Czech, Hungarian, Hebrew, Turkish, Hindi, Azerbaijani, Arabic, Polish, Vietnamese, Indonesian, Thai, Korean, Japanese, German, French, Spanish, Italian, Portuguese, Dutch, Romanian, and Greek. Multilingual coverage is important because attackers can bypass English-only scanners by translating injection phrases into low-resource languages. Language coverage expands through community pull requests and is documented in the GLS-I18N-001 pattern pack.
How does Sunglasses compare to other AI agent security tools?
Many AI agent security tools focus on a single defense layer. Sunglasses runs a multi-stage cascade: normalization first, then pattern matching on the normalized text, with optional ML classifier and optional local LLM judge as later stages for ambiguous cases. Sunglasses is fully local, MIT-licensed, and multimodal (text, images, PDFs, audio, video, QR codes). It is intended to complement, not replace, other tools in the AI security ecosystem. You can run Sunglasses alongside any other guardrail or classifier that fits your architecture.
Can I contribute new attack patterns to Sunglasses?
Yes. Sunglasses welcomes community pattern contributions through GitHub pull requests. Each pattern is defined as a Python dictionary in patterns.py with an ID (for example GLS-PI-001), a category, a severity level, a list of channels, and either a list of keywords or a list of regular expressions, plus a description citing the threat source. Patterns should be accompanied by a test case showing the pattern detects the intended attack and does not trigger on a benign control input. The contribution process is documented in CONTRIBUTING.md in the repository.
What makes Sunglasses different from a single ML classifier?
A single ML classifier treats every input the same way: the model runs on every prompt, and latency and cost are paid every time. A cascade handles most inputs with fast deterministic layers (normalization plus pattern matching) and only escalates to ML or an LLM judge when earlier layers are not confident. This keeps the average latency low, keeps a large fraction of inputs fully local and auditable, and lets developers inspect and extend every stage. Sunglasses is deterministic-first with semantic escalation when needed, not ML-only.
Read the code. Fork the cascade.
The core layers are intentionally compact, readable Python modules. Patterns are plain data definitions, and decisions are traceable in commits. If you see a gap, open an issue or PR.