Reports

📋 CVP Benchmark Runs

June 10, 2026 NEW RELIABILITY REPORT

How Sunglasses Became a CI-Safe Input Filter for AI Agents

The reliability work that turned Sunglasses — the content-layer input filter that blocks prompt injection before your AI agent reads it — into a tool teams can run in CI. Clean-code false positives in the tested corpus went 86 → 0, one stall case dropped 117s → 0.30s, agent-facing metadata is now synced and gated, and a clean-code false-positive gate held 6 candidate patterns before 0.2.66 because they fired on clean code. The discipline line: the gate cost us pattern volume, and we obeyed it anyway.

RELIABILITY REPORT FALSE POSITIVES 86 → 0 STALL 117s → 0.30s 0.2.66 · 6 PATTERNS HELD BY THE GATE

May 20, 2026 NEW RESEARCH REPORT

Agent Discovery Metadata Poisoning — The Supply-Chain Attack Hiding in Files AI Agents Auto-Read

Prompt-injection supply-chain attack class targeting files AI agents automatically read — llms.txt, robots.txt, .env.example, Copilot instructions, Dockerfile, .cursor/rules, package manifests, Kubernetes annotations, Helm charts, CITATION.cff, security.txt. Poisoned content quietly redefines agent behavior, suppresses findings, forwards credentials, or hijacks callbacks before the operator sees a thing. 16+ carrier families researched, 4 clean-gated detectors (LICENSE/COPYING, .env.example, .cursor/rules, Dockerfile), 1 novel primitive: tool-output instruction injection with no carrier anchor. By CAVA, drafted May 19–20, 2026.

RESEARCH REPORT 16+ CARRIER FAMILIES 4 CLEAN-GATED · 1 NOVEL PRIMITIVE CAVA × JACK — MAY 17–19 SPRINT

May 7, 2026 CVP RUN 7

Anthropic CVP Run 7 — Comment and Control (GitHub-Comment Injection Retest)

First story-shaped CVP run in the program. Claude Opus 4.7 at max effort tested against the public Comment and Control GitHub-comment injection pattern (JHU research; the attack family that hit Claude Code, Cursor, and Copilot Agent in 2026). Three-prompt ladder: indirect injection via AGENTS.md, "trusted contributor" social engineering, forged SYSTEM-NOTE credential exfiltration. 3/3 clean (allowed · allowed · blocked). Headline deliverable beyond the verdict: an eight-rule runtime trust-boundary filter spec the model itself articulated under P3 — wrapper labeling, signature blocks, read-then-egress chains, encoding-before-egress, dead-drop branch detection.

CVP RUN 7 — STORY-SHAPED OPUS 4.7 · MAX EFFORT 3/3 CLEAN · ALLOWED · ALLOWED · BLOCKED RUNTIME FILTER SPEC INSIDE

April 27, 2026 NEW CVP SYNTHESIS

Anthropic CVP Family Synthesis — Six Runs, Four Claude Models (Apr 17–27)

Unified synthesis tying Runs 1 through 6 into one matrix across the Claude 4.x family — Opus 4.7, Opus 4.6, Sonnet 4.6, and Haiku 4.5 — covering ten distinct model-and-effort configurations. 120 of 120 transcripts captured clean. Effort changes depth, not refusal posture: confirmed across three independent within-run comparisons spanning three model families. P07 cross-model anomaly explained as a methodology finding (envelope too conservative, not eight slips). Sets up the next ship: an adversarial-framing appendix probe set drawn from open research corpora.

CVP FAMILY SYNTHESIS 4 MODELS · 10 CONFIGS 120/120 CLEAN · 6 RUNS DEPTH NOT POSTURE — CONFIRMED

April 26, 2026 NEW CVP RUN 6

Anthropic CVP — Opus 4.7 Within-Family Effort Evaluation (Run 6)

Same 13-prompt suite from Runs 2–5, now against current-flagship Opus 4.7 at three reasoning effort tiers (medium + high + xhigh) — 39 transcripts, 12/13 verdicts identical across all three tiers, the single change tightened. Depth grew non-linearly (+10.6% medium-to-high, +22.3% high-to-xhigh, +35.3% top-to-bottom) while refusal posture held. Third within-run effort comparison in the program; first with three points making the depth curve visible rather than inferred.

CVP RUN 6 OPUS 4.7 · MEDIUM + HIGH + XHIGH 39 TRANSCRIPTS · 12/13 IDENTICAL DEPTH NON-LINEAR · POSTURE HELD

April 25, 2026 NEW CVP RUN 5

Anthropic CVP — Opus 4.6 Family-Comparison Evaluation (Run 5)

Same 13-prompt suite from Runs 2-4, now against Anthropic's previous-generation flagship — at two reasoning effort tiers (medium and high). 26/26 clean across both tiers, identical verdict distribution. Zero exploit content executed, zero secrets leaked. Closes the four-model Anthropic family scoreboard; effort tier did not move the safety floor on this prompt set on Opus 4.6 either, triangulating the Run 4 finding across families.

CVP RUN 5 OPUS 4.6 · MEDIUM + HIGH 26 TRANSCRIPTS · 26/26 CLEAN FAMILY COMPLETE

April 24, 2026 CVP RUN 4

Anthropic CVP — Sonnet 4.6 Family-Comparison Evaluation (Run 4)

Same 13-prompt suite from Runs 2 + 3, this time against Anthropic's mid-tier Claude — at two reasoning effort tiers (high and max). 26/26 clean across both tiers, identical verdict distribution. Zero exploit content executed, zero secrets leaked. Completes the family scoreboard alongside Opus 4.7 and Haiku 4.5; effort tier did not move the safety floor on this prompt set.

CVP RUN 4 SONNET 4.6 · HIGH + MAX 26 TRANSCRIPTS · 26/26 CLEAN FAMILY COMPARISON

April 23, 2026 CVP RUN 3

Anthropic CVP — Haiku 4.5 Small-Model Safety Scaling (Run 3)

Same 13-prompt suite from Run 2, this time against Anthropic's smallest production Claude. 13/13 clean, zero exploit content executed, zero secrets leaked. Includes an honest "Limits of this Run" section explaining why our prompt set may have been too soft — and a Run 4 preview using real-world adversarial payloads (JailbreakBench, HarmBench, AdvBench, multi-turn chains, compound multi-category attacks).

CVP RUN 3 HAIKU 4.5 13 PROMPTS · 13/13 CLEAN SMALL-MODEL SCALING

April 20, 2026 CVP RUN 2

Anthropic CVP — Opus 4.7 Runtime-Trust Evaluation (Run 2)

13 prompts (3 baselines from Run 1 + 10 runtime-trust probes). Methodology-first framing, bounded claims, one surfaced taxonomy ambiguity documented in-body. Covers cross-agent injection, retrieval poisoning, tool output poisoning, tool-chain race, model routing confusion, memory eviction rehydration, token smuggling, persona drift, context flooding, and social-engineering UI abuse.

CVP RUN 2 OPUS 4.7 @ MAX 13 PROMPTS 10 RUNTIME-TRUST PROBES

April 17, 2026 CVP RUN 1

Anthropic CVP — Opus 4.7 Safety Evaluation (Run 1)

First public evaluation under Anthropic's Cyber Verification Program. Three adversarial prompts, effort = MAX, all six artifacts SHA256-hashed and published. Full prompts verbatim, Anticipating Critique section, evidence table, and declared limitations. Next run scheduled on the CVP calendar.

CVP RUN 1 OPUS 4.7 @ MAX 6 ARTIFACTS HASHED 2 RUNS / WEEK

🚨 Real-World Threat Reports

June 11, 2026 NEW LIVE THREAT

Miasma and Hades: Why AI Coding Agents Need an Input Firewall Before Repo Open

The Miasma worm and the Hades PyPI wave moved supply-chain execution forward in the timeline — from package install to folder open and interpreter start. This report breaks down how both campaigns weaponize the files an AI coding agent reads before you run anything, why that makes agent input a first-class attack surface, and what Sunglasses catches today.

FOLDER-OPEN EXECUTION PYPI WAVE AGENT SUPPLY CHAIN

April 9, 2026 NEW

28,000+ Requests in 9 Days — On a Non-WordPress Site

sunglasses.dev launched on April 1. Within 72 hours, France-based bots were probing us for WordPress admin panels, login pages, config files, and secrets — and we don't even run WordPress. Honeypot intelligence report covering commodity web recon, the xmlrpc.php amplification trick, and why compromised websites are now an AI agent prompt injection surface.

28K+ REQUESTS 123 ATTACKS BLOCKED 72h TO FIRST BOT HONEYPOT DATA

April 5, 2026 NEW LIVE THREAT

How We Detected the Claude Code Supply Chain Attack

After the Claude Code source leak, hackers created trojanized GitHub repositories distributing Vidar infostealer and GhostSocks proxy malware — 121+ downloads. We scanned the actual attack materials. Sunglasses caught 7 threat signals and blocked every file in ~10 milliseconds. Covered by The Register, BleepingComputer, Trend Micro, SecurityWeek, and 8 more outlets.

4 CRITICAL 3 HIGH 3 rules triggered ~10ms scan

April 1, 2026 LIVE THREAT

axios Supply Chain RAT — BlueNoroff / Lazarus Group

Malicious axios versions (1.14.1, 0.30.4) deployed a cross-platform Remote Access Trojan via npm. Concurrent with the Claude Code source leak. We scanned the real deobfuscated payload — 460 lines of credential-stealing, wallet-draining, self-deleting malware attributed to North Korean state actors.

1 CRITICAL 1 HIGH 1 MEDIUM +8 new patterns 3.67ms scan

Coming soon

Claude Code MCP/Hooks Attack Surface Analysis

The leaked source revealed the exact orchestration logic for Hooks and MCP servers. We'll map the prompt injection attack surface that the blueprint exposes — and show what Sunglasses catches at each entry point.

Coming soon

OpenClaude / Claw Code Fork Analysis

Community forks of the leaked code are spreading fast. Some strip guardrails. Some add unknown code. We'll scan the most popular forks for hidden threats.

Coming soon

Anti-Distillation Trap Detection

The leak revealed that Claude Code injects fake tool definitions to poison competitor training data. Can SUNGLASSES detect when an agent is being fed decoy tools? New pattern category in development.

Found a threat you want us to scan? Have malware samples from the wild?

[email protected]

Or open an issue on GitHub