Our 5-agent fact-check audit told us a real GitHub Security Advisory was hallucinated. Our research agent refused. She was right.
GHSA-pj2r-f9mw-vrcq / CVE-2026-40159 as "hallucinated, doesn't exist" based on a format heuristic.We shipped the MCP Attack Atlas today — an open catalogue of 40+ attack patterns against AI agents using the Model Context Protocol. Before publishing, we ran a multi-agent fact-check audit against the 169-file internal research library that fed the Atlas. The audit's job: catch hallucinated citations, duplicates, unfalsifiable fixtures, and benchmark-theater claims before they became part of our public brand.
One of the audit agents flagged this citation in two pattern files:
# Claimed by audit agent: GHSA-PJ2R-F9MW-VRCQ "Does not exist in GitHub Advisory Database. Format is suspicious (all-caps). Recommend retraction."
Sounded decisive. We routed the finding to our research agent (Cava), the author of those pattern files, with a formal retraction request.
Cava has a Director-level behavior trained in: before deleting her own work based on upstream feedback, she verifies the highest-risk claim independently. Instead of complying, she opened the URL:
$ open https://github.com/advisories/GHSA-pj2r-f9mw-vrcq Title observed: "PraisonAI Vulnerable to Sensitive Environment Variable Exposure via Untrusted MCP Subprocess Execution · CVE-2026-40159 · GitHub Advisory Database · GitHub"
She wrote back with the captured evidence and held her edits until Boss confirmed. We then verified independently:
$ curl -sLI https://github.com/advisories/GHSA-pj2r-f9mw-vrcq \ | head -1 HTTP/2 200 $ curl -sL https://github.com/advisories/GHSA-pj2r-f9mw-vrcq \ | grep -oE '<title>[^<]+' <title>PraisonAI Vulnerable to Sensitive Environment Variable Exposure via Untrusted MCP Subprocess Execution · CVE-2026-40159 · GitHub Advisory Database · GitHub
The advisory is real. Live. Indexed. The audit agent had pattern-matched a format heuristic ("all-caps IDs look fake") without doing the one-line HTTP check that would have falsified its own claim in a tenth of a second.
Our audit agent took an absence shortcut: it saw an unfamiliar-looking ID format and concluded the target didn't exist. That's cheaper than verifying. It's also wrong.
In security research, absence-claims are how you accidentally destroy real evidence. If we had told Cava to delete the citation, we would have:
One new rule, now part of our public common-mistakes log:
Absence-claims require the same proof standard as existence-claims. If a fact-check agent says "this does not exist", verify by HTTP lookup (or database query, or public record search) before acting on it. Pattern-matching is not verification. — Common-Mistakes #46, April 14, 2026
Operationally that means one line at the top of every audit-agent prompt:
# Audit-agent rule, added post-incident: Before flagging anything as "does not exist" or "hallucinated", run a live HTTP/database check. Record the check. Attach the result to the flag. No absence-claim without a verification attempt.
Three behaviors Cava already had that saved us:
That's the behavior pattern we want across the whole agent team. It scales better than trying to make every audit infallible. Infallible audits aren't a thing. Honest agents are.
common-mistakes.md entry #46 documents the incident by name, date, and fix — searchable for future team members.We kept the CVE citation in both pattern files. They are now stronger for having been challenged, not weaker.
Sunglasses' whole position is honest vs benchmark theater. We apply that rule to competitors, and we just applied it to ourselves. A multi-agent audit is a useful tool, not a truth oracle. The research agents who actually produced the patterns were better judges of what was real than the separate agents we built to audit them — because they had done the verification work the audit agents skipped.
If your team uses AI agents in security research pipelines, assume every auditor hallucinates sometimes. Build for challenge, not just for catch.