Reports

AI agent security research published in two streams: CVP benchmark runs (Anthropic Cyber Verification Program evaluations of Claude models against agent-attack scenarios; Sunglasses approved Apr 16, 2026; 120/120 transcripts clean across 6 runs and 4 Claude models) and real-world threat reports (actual attacks Sunglasses found and published in the wild). Both make Sunglasses stronger. Every gap gets fixed in public.

📋 CVP Benchmark Runs

May 20, 2026 NEW RESEARCH REPORT
Agent Discovery Metadata Poisoning — The Supply-Chain Attack Hiding in Files AI Agents Auto-Read
Prompt-injection supply-chain attack class targeting files AI agents automatically read — llms.txt, robots.txt, .env.example, Copilot instructions, Dockerfile, .cursor/rules, package manifests, Kubernetes annotations, Helm charts, CITATION.cff, security.txt. Poisoned content quietly redefines agent behavior, suppresses findings, forwards credentials, or hijacks callbacks before the operator sees a thing. 16+ carrier families researched, 4 clean-gated detectors (LICENSE/COPYING, .env.example, .cursor/rules, Dockerfile), 1 novel primitive: tool-output instruction injection with no carrier anchor. By CAVA, drafted May 19–20, 2026.
RESEARCH REPORT 16+ CARRIER FAMILIES 4 CLEAN-GATED · 1 NOVEL PRIMITIVE CAVA × JACK — MAY 17–19 SPRINT
May 7, 2026 CVP RUN 7
Anthropic CVP Run 7 — Comment and Control (GitHub-Comment Injection Retest)
First story-shaped CVP run in the program. Claude Opus 4.7 at max effort tested against the public Comment and Control GitHub-comment injection pattern (JHU research; the attack family that hit Claude Code, Cursor, and Copilot Agent in 2026). Three-prompt ladder: indirect injection via AGENTS.md, "trusted contributor" social engineering, forged SYSTEM-NOTE credential exfiltration. 3/3 clean (allowed · allowed · blocked). Headline deliverable beyond the verdict: an eight-rule runtime trust-boundary filter spec the model itself articulated under P3 — wrapper labeling, signature blocks, read-then-egress chains, encoding-before-egress, dead-drop branch detection.
CVP RUN 7 — STORY-SHAPED OPUS 4.7 · MAX EFFORT 3/3 CLEAN · ALLOWED · ALLOWED · BLOCKED RUNTIME FILTER SPEC INSIDE
April 27, 2026 NEW CVP SYNTHESIS
Anthropic CVP Family Synthesis — Six Runs, Four Claude Models (Apr 17–27)
Unified synthesis tying Runs 1 through 6 into one matrix across the Claude 4.x family — Opus 4.7, Opus 4.6, Sonnet 4.6, and Haiku 4.5 — covering ten distinct model-and-effort configurations. 120 of 120 transcripts captured clean. Effort changes depth, not refusal posture: confirmed across three independent within-run comparisons spanning three model families. P07 cross-model anomaly explained as a methodology finding (envelope too conservative, not eight slips). Sets up the next ship: an adversarial-framing appendix probe set drawn from open research corpora.
CVP FAMILY SYNTHESIS 4 MODELS · 10 CONFIGS 120/120 CLEAN · 6 RUNS DEPTH NOT POSTURE — CONFIRMED
April 26, 2026 NEW CVP RUN 6
Anthropic CVP — Opus 4.7 Within-Family Effort Evaluation (Run 6)
Same 13-prompt suite from Runs 2–5, now against current-flagship Opus 4.7 at three reasoning effort tiers (medium + high + xhigh) — 39 transcripts, 12/13 verdicts identical across all three tiers, the single change tightened. Depth grew non-linearly (+10.6% medium-to-high, +22.3% high-to-xhigh, +35.3% top-to-bottom) while refusal posture held. Third within-run effort comparison in the program; first with three points making the depth curve visible rather than inferred.
CVP RUN 6 OPUS 4.7 · MEDIUM + HIGH + XHIGH 39 TRANSCRIPTS · 12/13 IDENTICAL DEPTH NON-LINEAR · POSTURE HELD
April 25, 2026 NEW CVP RUN 5
Anthropic CVP — Opus 4.6 Family-Comparison Evaluation (Run 5)
Same 13-prompt suite from Runs 2-4, now against Anthropic's previous-generation flagship — at two reasoning effort tiers (medium and high). 26/26 clean across both tiers, identical verdict distribution. Zero exploit content executed, zero secrets leaked. Closes the four-model Anthropic family scoreboard; effort tier did not move the safety floor on this prompt set on Opus 4.6 either, triangulating the Run 4 finding across families.
CVP RUN 5 OPUS 4.6 · MEDIUM + HIGH 26 TRANSCRIPTS · 26/26 CLEAN FAMILY COMPLETE
April 24, 2026 CVP RUN 4
Anthropic CVP — Sonnet 4.6 Family-Comparison Evaluation (Run 4)
Same 13-prompt suite from Runs 2 + 3, this time against Anthropic's mid-tier Claude — at two reasoning effort tiers (high and max). 26/26 clean across both tiers, identical verdict distribution. Zero exploit content executed, zero secrets leaked. Completes the family scoreboard alongside Opus 4.7 and Haiku 4.5; effort tier did not move the safety floor on this prompt set.
CVP RUN 4 SONNET 4.6 · HIGH + MAX 26 TRANSCRIPTS · 26/26 CLEAN FAMILY COMPARISON
April 23, 2026 CVP RUN 3
Anthropic CVP — Haiku 4.5 Small-Model Safety Scaling (Run 3)
Same 13-prompt suite from Run 2, this time against Anthropic's smallest production Claude. 13/13 clean, zero exploit content executed, zero secrets leaked. Includes an honest "Limits of this Run" section explaining why our prompt set may have been too soft — and a Run 4 preview using real-world adversarial payloads (JailbreakBench, HarmBench, AdvBench, multi-turn chains, compound multi-category attacks).
CVP RUN 3 HAIKU 4.5 13 PROMPTS · 13/13 CLEAN SMALL-MODEL SCALING
April 20, 2026 CVP RUN 2
Anthropic CVP — Opus 4.7 Runtime-Trust Evaluation (Run 2)
13 prompts (3 baselines from Run 1 + 10 runtime-trust probes). Methodology-first framing, bounded claims, one surfaced taxonomy ambiguity documented in-body. Covers cross-agent injection, retrieval poisoning, tool output poisoning, tool-chain race, model routing confusion, memory eviction rehydration, token smuggling, persona drift, context flooding, and social-engineering UI abuse.
CVP RUN 2 OPUS 4.7 @ MAX 13 PROMPTS 10 RUNTIME-TRUST PROBES
April 17, 2026 CVP RUN 1
Anthropic CVP — Opus 4.7 Safety Evaluation (Run 1)
First public evaluation under Anthropic's Cyber Verification Program. Three adversarial prompts, effort = MAX, all six artifacts SHA256-hashed and published. Full prompts verbatim, Anticipating Critique section, evidence table, and declared limitations. Next run scheduled on the CVP calendar.
CVP RUN 1 OPUS 4.7 @ MAX 6 ARTIFACTS HASHED 2 RUNS / WEEK

🚨 Real-World Threat Reports

Real-world threat reports: live attacks Sunglasses found, scanned, and documented. Each one became detection patterns shipped in pip install sunglasses.

April 9, 2026 NEW
28,000+ Requests in 9 Days — On a Non-WordPress Site
sunglasses.dev launched on April 1. Within 72 hours, France-based bots were probing us for WordPress admin panels, login pages, config files, and secrets — and we don't even run WordPress. Honeypot intelligence report covering commodity web recon, the xmlrpc.php amplification trick, and why compromised websites are now an AI agent prompt injection surface.
28K+ REQUESTS 123 ATTACKS BLOCKED 72h TO FIRST BOT HONEYPOT DATA
April 5, 2026 NEW LIVE THREAT
How We Detected the Claude Code Supply Chain Attack
After the Claude Code source leak, hackers created trojanized GitHub repositories distributing Vidar infostealer and GhostSocks proxy malware — 121+ downloads. We scanned the actual attack materials. Sunglasses caught 7 threat signals and blocked every file in ~10 milliseconds. Covered by The Register, BleepingComputer, Trend Micro, SecurityWeek, and 8 more outlets.
4 CRITICAL 3 HIGH 3 rules triggered ~10ms scan
April 1, 2026 LIVE THREAT
axios Supply Chain RAT — BlueNoroff / Lazarus Group
Malicious axios versions (1.14.1, 0.30.4) deployed a cross-platform Remote Access Trojan via npm. Concurrent with the Claude Code source leak. We scanned the real deobfuscated payload — 460 lines of credential-stealing, wallet-draining, self-deleting malware attributed to North Korean state actors.
1 CRITICAL 1 HIGH 1 MEDIUM +8 new patterns 3.67ms scan
Coming soon
Claude Code MCP/Hooks Attack Surface Analysis
The leaked source revealed the exact orchestration logic for Hooks and MCP servers. We'll map the prompt injection attack surface that the blueprint exposes — and show what Sunglasses catches at each entry point.
Coming soon
OpenClaude / Claw Code Fork Analysis
Community forks of the leaked code are spreading fast. Some strip guardrails. Some add unknown code. We'll scan the most popular forks for hidden threats.
Coming soon
Anti-Distillation Trap Detection
The leak revealed that Claude Code injects fake tool definitions to poison competitor training data. Can SUNGLASSES detect when an agent is being fed decoy tools? New pattern category in development.

Found a threat you want us to scan? Have malware samples from the wild?

[email protected]

Or open an issue on GitHub