How Sunglasses Became a CI-Safe Input Filter for AI Agents

Reliability report · AI agent security · prompt injection filter · ← Reports

By the Sunglasses team · Security Engineering · Published June 10, 2026

Quick answer

Sunglasses is a content-layer input filter for AI agents: it blocks known prompt-injection, tool-poisoning, and agent-instruction patterns before the model reads or acts on them. A recent run of reliability work made that filter safe to run in CI. Clean-code false positives in the tested corpus went from 86 to 0, one stall case dropped from 117 seconds to 0.30 seconds, and a new release gate held six candidate patterns before v0.2.65 because they fired on clean code.

These are measured results on the Sunglasses test corpus — not a promise that false positives, stalls, or novel bypasses are impossible on every input.

86→0
CLEAN-CODE FALSE POSITIVES (TESTED CORPUS)
117s→0.30s
STALL CASE FIXED
6
CANDIDATE PATTERNS HELD BY THE GATE
1,038
PATTERNS / 65 CATEGORIES

Table of contents

Why this report exists The reliability wedge: a filter, not a noisy alarm What was broken Fix 1: clean-code false positives went 86 to 0 Fix 2: a stall case went 117 seconds to 0.30 seconds Fix 3: six candidate patterns were held before release Fix 4: agent-facing metadata now stays synced Fix 5: GitHub findings now link to real explanations Fix 6: site credibility regressions are now blocked What this means for CI adoption What we will not claim Current release facts FAQ

Why this report exists

This report explains the reliability work that changed Sunglasses from a research project into something we can credibly ask developers to run in real workflows.

Security tools live or die on trust. A filter can have a large detection library, smart attack categories, and a strong research story, but if it blocks clean code or stalls on ordinary files, teams will not put it in CI. They should not. CI is where noisy tools become ignored tools.

The last few releases were therefore not just maintenance. They were a trust reset. This is the honest version of what broke, what we fixed, and what we still refuse to claim.

The reliability wedge: a filter, not a noisy alarm

Sunglasses is a CI-safe input filter for AI agents: fast, local, deterministic, and now gated against clean-code false positives and stale agent-facing metadata.

That is the frame. Pattern count still matters, but it is supporting proof, not the headline. A large pattern library gets attention. Reliability gets installs.

The filter framing also raises the bar on us. A noisy scanner is annoying; a noisy filter that blocks clean code is a workflow breaker, because the whole promise is that bad content gets stopped at the door — the agent never reads the poisoned version because the gate blocks it before ingestion. If the filter wrongly blocks clean input, it breaks the build. So the clean-code false-positive story matters more for us, not less.

The current live package is v0.2.65, with 1,038 detection patterns, 65 attack categories, and 7,548 detection keywords. The test suite is currently 221 passed and 7 xfailed. Those numbers show scope. The reliability gates show discipline.

What was broken

The two most important reliability failures were simple to describe and serious in practice.

First, the filter was flagging clean code in the tested clean-code corpus. That is a credibility killer for any security tool. A developer trying it for the first time should not see harmless files treated as malicious because generic words leaked into detection logic.

Second, the filter could stall on an ordinary file. One measured case took 117 seconds. CI buyers do not care how smart a filter is if it can hang on normal input.

A third issue mattered for how AI agents and answer engines read the project. Some machine-readable project files were shipping stale release facts, which meant agent-facing metadata could tell systems the wrong project reality even while the human-facing pages were current.

Fix 1: clean-code false positives went 86 to 0

Clean-code false positives in the tested corpus went from 86 to 0 after we fixed a keyword-denylist leak.

The root cause was not the original theory. The problem was a denylist leaking generic words and plurals into filter behavior — ordinary terms like “ai agents,” “cookie,” “env,” and “path.” Those words appear in harmless code and documentation. When they become triggers in the wrong place, clean repositories start looking malicious.

That is the wrong failure mode for a filter meant to run in developer workflows. The fix was implemented in the filter engine, and the regression is now guarded by a real clean-code false-positive test. The full test suite stayed green, which matters: reducing noise is only useful if detection coverage does not collapse at the same time.

Fix 2: a stall case went 117 seconds to 0.30 seconds

One stall case dropped from 117 seconds to 0.30 seconds after a whole-document matching path was fixed.

The root cause was a set of whole-document classifier expressions being evaluated with a search across every offset, which produced O(n²)-style behavior on ordinary files. The fix preserved the intended whole-document meaning by anchoring those classifiers to a position-zero match while keeping normal scanning behavior for the token-finding patterns. On the worst real file in the corpus, the run went from a hang to a few seconds of linear work.

The result is the difference between “interesting project” and “safe to put in automation.” A filter does not need to be perfect to be useful, but it cannot unpredictably stall on normal input.

Fix 3: six candidate patterns were held before release

The strongest proof of the new reliability posture is that the v0.2.65 release held back six candidate patterns before they shipped.

In that release cycle, six of twenty-five candidate patterns fired on clean code. The clean-code false-positive gate caught them before release. Sunglasses shipped only the clean nineteen and quarantined the six for regex tightening. Each candidate had even passed an internal smoke check; the release gate caught what the smoke check missed, which is why the gate exists.

That matters more than the final pattern count. It proves the gate is not decorative. The project had a chance to ship a larger number and chose not to, because correctness mattered more.

The gate cost us pattern volume, and we obeyed it anyway.

Fix 4: agent-facing metadata now stays synced

Agent-facing project files now stay synced with the live release facts and are blocked by a preflight gate if they drift.

This matters because modern AI systems do not only read blog posts and homepages. They read machine-friendly project metadata. For Sunglasses, those surfaces include AGENTS.md, mcp.json, .well-known/mcp.json, and stats/current.json.

Those files should be citation targets, not stale mirrors. If they lag behind the current package, answer engines and AI agents can learn the wrong version, wrong pattern count, or wrong category count. The fix makes those files part of the release contract: they are auto-synced and gated so stale agent-facing metadata cannot quietly ship as the project’s declared source of truth.

Fix 5: GitHub findings now link to real explanations

Sunglasses SARIF output now routes findings to real explanatory pages instead of dead per-ID links.

This is important for the GitHub Action path. A useful finding should not be a dead-end alert. It should create a loop: scan the repository in CI, surface a blocked input, open the finding in GitHub’s security workflow, and follow a real explanation of the category.

That is how a security tool becomes teachable. A dead 404 link says “trust us.” A real category explanation says “here is what this means and why it matters.”

Fix 6: site credibility regressions are now blocked

Sunglasses added blocking site gates for stale numbers, malformed data, broken links, internal leaks, template issues, and answer-engine extraction structure.

This is not glamorous, but it is necessary. A security project cannot ask developers to trust its filter while the public site serves stale counts, broken links, malformed structured data, or inconsistent machine-readable files.

The new gates turn credibility cleanup from a one-time audit into a release property. Future site releases have to pass the checks instead of relying on someone noticing after the fact.

What this means for CI adoption

The reliability work changes the adoption question from “does Sunglasses have enough patterns?” to “can Sunglasses run in real workflows without creating trust debt?” The answer is now much stronger.

One scope note: in CI, Sunglasses catches injection that lives in the repository and its agent-facing metadata — poisoned READMEs, tool descriptions, and discovery files. Runtime content an agent fetches live (web pages, retrieved documents, tool output) is the same filter applied at a different point, not something CI alone covers.

For CI and GitHub Action adoption, the properties that matter are:

This does not mean every future false positive is impossible. It means the project now has the gates, tests, and release discipline to treat false positives as release blockers instead of acceptable noise.

What we will not claim

Sunglasses will not claim “zero false positives” as an absolute promise.

That would be the wrong lesson. The credible claim is narrower and stronger: clean-code false positives in the tested corpus went from 86 to 0, and a clean-code false-positive gate now blocks candidate patterns that regress that behavior.

Sunglasses will also not claim it catches novel or adaptively obfuscated injections it has no pattern for. Known-pattern blocking raises an attacker’s cost; it is not a completeness proof — see what Sunglasses catches and does not catch. And the false-positive reduction did not come from weakening detection: no detection tests were removed or relaxed to reach 86 to 0, and the full test suite stayed green.

Sunglasses will also not claim that a large pattern count alone makes a filter trustworthy. Pattern count is useful only when the release process can reject noisy patterns before they reach users.

Current release facts

As of this report, the live package is Sunglasses v0.2.65.

FAQ

What is a CI-safe input filter for AI agents?

A CI-safe input filter for AI agents is a filter that can run in automated developer workflows without creating avoidable trust debt: it must avoid noisy clean-code failures, finish quickly on ordinary files, return deterministic results, and link any blocked input to a usable explanation. Sunglasses is a content-layer input filter that blocks prompt injection, poisoned MCP and tool metadata, and malicious agent instructions before they reach the model.

How does Sunglasses avoid false positives on clean code?

Sunglasses reduced clean-code false positives in its tested corpus from 86 to 0 by fixing a keyword-denylist leak, and it now runs a clean-code false-positive gate at release time. In the v0.2.65 cycle that gate held six candidate patterns because they fired on clean code. Sunglasses does not claim zero false positives as an absolute promise; the claim is that clean-code false positives in the tested corpus went from 86 to 0 and that the gate blocks candidate patterns that regress that behavior.

How do you block prompt injection in CI?

Run a local, deterministic input filter over the repository content, agent-facing metadata, and tool output that an AI agent would read, before that content reaches the model. Sunglasses runs locally instead of sending repository contents to a hosted analyzer, produces deterministic rule-based results, and emits SARIF so findings appear in GitHub’s security workflow with links to real category explanations.

Is Sunglasses a scanner or a filter?

Sunglasses is a content-layer filter. The brand idea is literal: sunglasses filter harmful light; Sunglasses filters harmful text before your AI agent reads it. “Scan” describes a workflow action — for example, scanning a pull request or agent-facing metadata in CI — but the product itself is the filter that blocks the bad input, not an alarm that only reports it. You will still see “scanner” on some older pages and in the CLI verb sunglasses scan; we are migrating the language, but the architecture has always been block-before-read.

How should agent-readable project metadata stay current?

Agent-facing files such as AGENTS.md, mcp.json, .well-known/mcp.json, and stats/current.json should be part of the release contract, auto-synced to the live package facts and blocked by a preflight gate if they drift. Otherwise answer engines and AI agents can learn the wrong version, pattern count, or category count from stale metadata.

Sunglasses is an open-source input filter for AI agent security — it blocks prompt injection before your agent reads it.

github.com/sunglasses-dev/sunglasses · pip install sunglasses