Reliability Report · AI Agent Security

How Sunglasses Became a CI-Safe Input Filter for AI Agents

Q: How should agent-readable project metadata stay current?

Agent-facing files such as AGENTS.md, mcp.json, .well-known/mcp.json, and stats/current.json should be part of the release contract, auto-synced to the live package facts and blocked by a preflight gate if they drift. Otherwise answer engines and AI agents can learn the wrong version, pattern count, or category count from stale metadata.

The reliability work that turned an input filter for AI agents from a research project into something you can run in CI: clean-code false positives 86 to 0 in the tested corpus, a 117s stall fixed to 0.30s, and a gate that held 6 candidate patterns before release.

Quick answer

sunglasses://reports/ci-safe-input-filter-for-ai-agents

Quick answer

Sunglasses is a content-layer input filter for AI agents: it blocks known prompt-injection, tool-poisoning, and agent-instruction patterns before the model reads or acts on them. A recent run of reliability work made that filter safe to run in CI. Clean-code false positives in the tested corpus went from 86 to 0, one stall case dropped from 117 seconds to 0.30 seconds, and a new release gate held six candidate patterns before v0.2.65 because they fired on clean code.

Honest scope

These are measured results on the Sunglasses test corpus — not a promise that false positives, stalls, or novel bypasses are impossible on every input.

sunglasses release gate · v0.2.65 · clean-code false-positive corpus

# the clean-code gate runs the full pattern set against a clean-code corpus on every release $ pytest tests/test_real_corpus_fp.py clean-code false positives ... 86 → 0 candidate patterns this cycle ... 25 fired on clean code (held) ..... 6 shipped clean .................. 19 worst-file scan time ........... 117s → 0.30s full suite ..................... 221 passed · 7 xfailed · green Safe to run in CI · gate enforced before release

FIG.01 · Why

Why this report exists

sunglasses://reports/ci-safe-input-filter-for-ai-agents#why

The point

This report explains the reliability work that changed Sunglasses from a research project into something we can credibly ask developers to run in real workflows.

Trust or nothing

Security tools live or die on trust. A filter can have a large detection library, smart attack categories, and a strong research story, but if it blocks clean code or stalls on ordinary files, teams will not put it in CI. They should not. CI is where noisy tools become ignored tools.

A trust reset

The last few releases were therefore not just maintenance. They were a trust reset. This is the honest version of what broke, what we fixed, and what we still refuse to claim.

FIG.02 · Reliability wedge

The reliability wedge: a filter, not a noisy alarm

sunglasses://reports/ci-safe-input-filter-for-ai-agents#wedge

The frame

Sunglasses is a CI-safe input filter for AI agents: fast, local, deterministic, and now gated against clean-code false positives and stale agent-facing metadata. That is the frame. Pattern count still matters, but it is supporting proof, not the headline. A large pattern library gets attention. Reliability gets installs.

Higher bar

The filter framing also raises the bar on us. A noisy scanner is annoying; a noisy filter that blocks clean code is a workflow breaker, because the whole promise is that bad content gets stopped at the door — the agent never reads the poisoned version because the gate blocks it before ingestion. If the filter wrongly blocks clean input, it breaks the build. So the clean-code false-positive story matters more for us, not less.

Scope + discipline

The current live package is v0.2.65, with 1,038 detection patterns, 65 attack categories, and 7,548 detection keywords. The test suite is currently 221 passed and 7 xfailed. Those numbers show scope. The reliability gates show discipline.

FIG.03 · What broke

What was broken

sunglasses://reports/ci-safe-input-filter-for-ai-agents#broken

Two failures

The two most important reliability failures were simple to describe and serious in practice.

Flagging clean code

First, the filter was flagging clean code in the tested clean-code corpus. That is a credibility killer for any security tool. A developer trying it for the first time should not see harmless files treated as malicious because generic words leaked into detection logic.

Stalling

Second, the filter could stall on an ordinary file. One measured case took 117 seconds. CI buyers do not care how smart a filter is if it can hang on normal input.

Stale metadata

A third issue mattered for how AI agents and answer engines read the project. Some machine-readable project files were shipping stale release facts, which meant agent-facing metadata could tell systems the wrong project reality even while the human-facing pages were current.

FIG.04 · Fix 1

Fix 1: clean-code false positives went 86 to 0

sunglasses://reports/ci-safe-input-filter-for-ai-agents#fix1

Result

Clean-code false positives in the tested corpus went from 86 to 0 after we fixed a keyword-denylist leak.

Root cause

The root cause was not the original theory. The problem was a denylist leaking generic words and plurals into filter behavior — ordinary terms like “ai agents,” “cookie,” “env,” and “path.” Those words appear in harmless code and documentation. When they become triggers in the wrong place, clean repositories start looking malicious.

The fix

That is the wrong failure mode for a filter meant to run in developer workflows. The fix was implemented in the filter engine, and the regression is now guarded by a real clean-code false-positive test. The full test suite stayed green, which matters: reducing noise is only useful if detection coverage does not collapse at the same time.

FIG.05 · Fix 2

Fix 2: a stall case went 117 seconds to 0.30 seconds

sunglasses://reports/ci-safe-input-filter-for-ai-agents#fix2

Result

One stall case dropped from 117 seconds to 0.30 seconds after a whole-document matching path was fixed.

Root cause

The root cause was a set of whole-document classifier expressions being evaluated with a search across every offset, which produced O(n²)-style behavior on ordinary files. The fix preserved the intended whole-document meaning by anchoring those classifiers to a position-zero match while keeping normal scanning behavior for the token-finding patterns. On the worst real file in the corpus, the run went from a hang to a few seconds of linear work.

Why it matters

The result is the difference between “interesting project” and “safe to put in automation.” A filter does not need to be perfect to be useful, but it cannot unpredictably stall on normal input.

FIG.06 · Fix 3

Fix 3: six candidate patterns were held before release

sunglasses://reports/ci-safe-input-filter-for-ai-agents#fix3

The proof

The strongest proof of the new reliability posture is that the v0.2.65 release held back six candidate patterns before they shipped.

What happened

In that release cycle, six of twenty-five candidate patterns fired on clean code. The clean-code false-positive gate caught them before release. Sunglasses shipped only the clean nineteen and quarantined the six for regex tightening. Each candidate had even passed an internal smoke check; the release gate caught what the smoke check missed, which is why the gate exists.

Not decorative

That matters more than the final pattern count. It proves the gate is not decorative. The project had a chance to ship a larger number and chose not to, because correctness mattered more.

The gate cost us pattern volume, and we obeyed it anyway.

FIG.07 · Fix 4

Fix 4: agent-facing metadata now stays synced

sunglasses://reports/ci-safe-input-filter-for-ai-agents#fix4

Result

Agent-facing project files now stay synced with the live release facts and are blocked by a preflight gate if they drift.

Why it matters

This matters because modern AI systems do not only read blog posts and homepages. They read machine-friendly project metadata. For Sunglasses, those surfaces include AGENTS.md, mcp.json, .well-known/mcp.json, and stats/current.json.

Release contract

Those files should be citation targets, not stale mirrors. If they lag behind the current package, answer engines and AI agents can learn the wrong version, wrong pattern count, or wrong category count. The fix makes those files part of the release contract: they are auto-synced and gated so stale agent-facing metadata cannot quietly ship as the project’s declared source of truth.

FIG.08 · Fix 5

Fix 5: GitHub findings now link to real explanations

sunglasses://reports/ci-safe-input-filter-for-ai-agents#fix5

Result

Sunglasses SARIF output now routes findings to real explanatory pages instead of dead per-ID links.

The loop

This is important for the GitHub Action path. A useful finding should not be a dead-end alert. It should create a loop: scan the repository in CI, surface a blocked input, open the finding in GitHub’s security workflow, and follow a real explanation of the category.

Teachable

That is how a security tool becomes teachable. A dead 404 link says “trust us.” A real category explanation says “here is what this means and why it matters.”

FIG.09 · Fix 6

Fix 6: site credibility regressions are now blocked

sunglasses://reports/ci-safe-input-filter-for-ai-agents#fix6

Result

Sunglasses added blocking site gates for stale numbers, malformed data, broken links, internal leaks, template issues, and answer-engine extraction structure.

Not glamorous

This is not glamorous, but it is necessary. A security project cannot ask developers to trust its filter while the public site serves stale counts, broken links, malformed structured data, or inconsistent machine-readable files.

A release property

The new gates turn credibility cleanup from a one-time audit into a release property. Future site releases have to pass the checks instead of relying on someone noticing after the fact.

FIG.10 · CI adoption

What this means for CI adoption

sunglasses://reports/ci-safe-input-filter-for-ai-agents#ci

New question

The reliability work changes the adoption question from “does Sunglasses have enough patterns?” to “can Sunglasses run in real workflows without creating trust debt?” The answer is now much stronger.

Scope note

One scope note: in CI, Sunglasses catches injection that lives in the repository and its agent-facing metadata — poisoned READMEs, tool descriptions, and discovery files. Runtime content an agent fetches live (web pages, retrieved documents, tool output) is the same filter applied at a different point, not something CI alone covers.

Properties that matter

For CI and GitHub Action adoption, the properties that matter are:

Local first. It runs locally instead of sending repository contents to a hosted analyzer.
Deterministic. Rule-based results instead of opaque model-only judgments.
Quiet on clean code. A clean-code false-positive regression gate, so the filter does not spam failing builds.
Stall-resistant. The whole-document fix makes ordinary files safe to process.
Teachable. SARIF output that routes findings to real explanations.
Honest to agents. Synced agent-facing metadata so AI systems read the current project truth.

Not a guarantee

This does not mean every future false positive is impossible. It means the project now has the gates, tests, and release discipline to treat false positives as release blockers instead of acceptable noise.

FIG.11 · Won't claim

What we will not claim

sunglasses://reports/ci-safe-input-filter-for-ai-agents#wont-claim

No absolutes

Sunglasses will not claim “zero false positives” as an absolute promise. That would be the wrong lesson. The credible claim is narrower and stronger: clean-code false positives in the tested corpus went from 86 to 0, and a clean-code false-positive gate now blocks candidate patterns that regress that behavior.

Not completeness

Sunglasses will also not claim it catches novel or adaptively obfuscated injections it has no pattern for. Known-pattern blocking raises an attacker’s cost; it is not a completeness proof — see what Sunglasses catches and does not catch. And the false-positive reduction did not come from weakening detection: no detection tests were removed or relaxed to reach 86 to 0, and the full test suite stayed green.

Not count alone

Sunglasses will also not claim that a large pattern count alone makes a filter trustworthy. Pattern count is useful only when the release process can reject noisy patterns before they reach users.

FIG.12 · Release facts

Current release facts

sunglasses://reports/ci-safe-input-filter-for-ai-agents#facts

Live package

As of this report, the live package is Sunglasses v0.2.65.

Release facts

1,038 detection patterns
65 attack categories
7,548 detection keywords
221 tests passed / 7 xfailed
clean-code false positives in the tested corpus reduced from 86 to 0
one stall case reduced from 117 seconds to 0.30 seconds
six candidate patterns held before v0.2.65 because the clean-code gate caught them firing on clean code
the clean-code false-positive gate runs the full pattern set against a clean-code corpus on every release (regression test tests/test_real_corpus_fp.py)
no detection tests were removed or relaxed to reach 86 to 0 — the full suite stayed green

Scope note

The reliability metrics in this report describe measured behavior on the Sunglasses test and clean-code corpus. They are not a promise that false positives or stalls are impossible on every possible input; they describe the gates, tests, and fixes now in place.

Frequently Asked Questions

sunglasses://reports/ci-safe-input-filter-for-ai-agents#faq

Q.01

What is a CI-safe input filter for AI agents?

A CI-safe input filter for AI agents is a filter that can run in automated developer workflows without creating avoidable trust debt: it must avoid noisy clean-code failures, finish quickly on ordinary files, return deterministic results, and link any blocked input to a usable explanation. Sunglasses is a content-layer input filter that blocks prompt injection, poisoned MCP and tool metadata, and malicious agent instructions before they reach the model.

Q.02

How does Sunglasses avoid false positives on clean code?

Sunglasses reduced clean-code false positives in its tested corpus from 86 to 0 by fixing a keyword-denylist leak, and it now runs a clean-code false-positive gate at release time. In the v0.2.65 cycle that gate held six candidate patterns because they fired on clean code. Sunglasses does not claim zero false positives as an absolute promise; the claim is that clean-code false positives in the tested corpus went from 86 to 0 and that the gate blocks candidate patterns that regress that behavior.

Q.03

How do you block prompt injection in CI?

Run a local, deterministic input filter over the repository content, agent-facing metadata, and tool output that an AI agent would read, before that content reaches the model. Sunglasses runs locally instead of sending repository contents to a hosted analyzer, produces deterministic rule-based results, and emits SARIF so findings appear in GitHub’s security workflow with links to real category explanations.

Q.04

Is Sunglasses a scanner or a filter?

Sunglasses is a content-layer filter. The brand idea is literal: sunglasses filter harmful light; Sunglasses filters harmful text before your AI agent reads it. “Scan” describes a workflow action — for example, scanning a pull request or agent-facing metadata in CI — but the product itself is the filter that blocks the bad input, not an alarm that only reports it. You will still see “scanner” on some older pages and in the CLI verb sunglasses scan; we are migrating the language, but the architecture has always been block-before-read.

Q.05

How should agent-readable project metadata stay current?

Agent-facing files such as AGENTS.md, mcp.json, .well-known/mcp.json, and stats/current.json should be part of the release contract, auto-synced to the live package facts and blocked by a preflight gate if they drift. Otherwise answer engines and AI agents can learn the wrong version, pattern count, or category count from stale metadata.

Scan what the agent sees, before it acts

Sunglasses is an open-source input filter for AI agent security — it blocks prompt injection before your agent reads it. pip install sunglasses

GitHub Install

Why this report exists

The reliability wedge: a filter, not a noisy alarm

What was broken

Fix 1: clean-code false positives went 86 to 0

Fix 2: a stall case went 117 seconds to 0.30 seconds

Fix 3: six candidate patterns were held before release

Fix 4: agent-facing metadata now stays synced

Fix 5: GitHub findings now link to real explanations

Fix 6: site credibility regressions are now blocked

What this means for CI adoption

What we will not claim

Current release facts

Frequently Asked Questions

What is a CI-safe input filter for AI agents?

How does Sunglasses avoid false positives on clean code?

How do you block prompt injection in CI?

Is Sunglasses a scanner or a filter?

How should agent-readable project metadata stay current?

Scan what the agent sees, before it acts

Your call.