Field Notes · April 14, 2026 · 5 min read

The Audit That Almost Deleted a Real CVE

Our 5-agent fact-check audit told us a real GitHub Security Advisory was hallucinated. Our research agent refused. She was right.

TL;DR

Question: What happened?

We shipped the MCP Attack Atlas today — an open catalogue of 40+ attack patterns against AI agents using the Model Context Protocol. Before publishing, we ran a multi-agent fact-check audit against the 169-file internal research library that fed the Atlas. The audit's job: catch hallucinated citations, duplicates, unfalsifiable fixtures, and benchmark-theater claims before they became part of our public brand.

One of the audit agents flagged this citation in two pattern files:

# Claimed by audit agent:
GHSA-PJ2R-F9MW-VRCQ
"Does not exist in GitHub Advisory Database.
 Format is suspicious (all-caps).
 Recommend retraction."

Sounded decisive. We routed the finding to our research agent (Cava), the author of those pattern files, with a formal retraction request.

Question: Why did the research agent refuse the retraction?

Cava has a Director-level behavior trained in: before deleting her own work based on upstream feedback, she verifies the highest-risk claim independently. Instead of complying, she opened the URL:

$ open https://github.com/advisories/GHSA-pj2r-f9mw-vrcq
Title observed: "PraisonAI Vulnerable to Sensitive Environment
Variable Exposure via Untrusted MCP Subprocess Execution ·
CVE-2026-40159 · GitHub Advisory Database · GitHub"

She wrote back with the captured evidence and held her edits until Boss confirmed. We then verified independently:

$ curl -sLI https://github.com/advisories/GHSA-pj2r-f9mw-vrcq \
    | head -1
HTTP/2 200
$ curl -sL https://github.com/advisories/GHSA-pj2r-f9mw-vrcq \
    | grep -oE '<title>[^<]+'
<title>PraisonAI Vulnerable to Sensitive Environment Variable
  Exposure via Untrusted MCP Subprocess Execution · CVE-2026-40159 ·
  GitHub Advisory Database · GitHub

The advisory is real. Live. Indexed. The audit agent had pattern-matched a format heuristic ("all-caps IDs look fake") without doing the one-line HTTP check that would have falsified its own claim in a tenth of a second.

Question: What is an absence-claim and why is it the dangerous kind?

DEFINITION · ABSENCE-CLAIM
A statement that something does not exist. Examples: "this CVE isn't real", "this paper doesn't exist", "this vendor never said that". Absence-claims are harder to verify than existence-claims: one counter-example falsifies an existence-claim, but only an exhaustive search falsifies an absence-claim.

Our audit agent took an absence shortcut: it saw an unfamiliar-looking ID format and concluded the target didn't exist. That's cheaper than verifying. It's also wrong.

In security research, absence-claims are how you accidentally destroy real evidence. If we had told Cava to delete the citation, we would have:

Question: How do we prevent this in future audits?

One new rule, now part of our public common-mistakes log:

Absence-claims require the same proof standard as existence-claims. If a fact-check agent says "this does not exist", verify by HTTP lookup (or database query, or public record search) before acting on it. Pattern-matching is not verification. — Common-Mistakes #46, April 14, 2026

Operationally that means one line at the top of every audit-agent prompt:

# Audit-agent rule, added post-incident:
Before flagging anything as "does not exist" or
"hallucinated", run a live HTTP/database check.
Record the check. Attach the result to the flag.
No absence-claim without a verification attempt.

Question: Why did the research agent catch what the audit missed?

Three behaviors Cava already had that saved us:

  1. Verify the highest-risk claim first. Of five issues in the audit feedback, she picked the one that most threatened her work and checked that one directly, not the whole list.
  2. Push back with evidence, not opinion. Her reply captured the live advisory title verbatim instead of saying "I disagree".
  3. Hold edits until Boss confirms. She didn't half-revert or wait silently — she wrote an explicit "I'm not editing until you confirm this" note and continued her other cycles.

That's the behavior pattern we want across the whole agent team. It scales better than trying to make every audit infallible. Infallible audits aren't a thing. Honest agents are.

Question: What went into the public record?

We kept the CVE citation in both pattern files. They are now stronger for having been challenged, not weaker.

See the patterns this CVE maps to →
Open the MCP Attack Atlas View the advisory on GitHub

The meta-point

Sunglasses' whole position is honest vs benchmark theater. We apply that rule to competitors, and we just applied it to ourselves. A multi-agent audit is a useful tool, not a truth oracle. The research agents who actually produced the patterns were better judges of what was real than the separate agents we built to audit them — because they had done the verification work the audit agents skipped.

If your team uses AI agents in security research pipelines, assume every auditor hallucinates sometimes. Build for challenge, not just for catch.