What is context flooding in AI agent security?

Context flooding is an attack where oversized, noisy, or strategically ordered input pressures an AI agent's context window so policies, guardrails, retrieval evidence, or prior instructions are buried, truncated, or demoted before the agent acts.

How is context flooding different from ordinary prompt injection?

Prompt injection often tries to insert a malicious instruction directly. Context flooding changes the agent's working memory: it uses token-budget pressure, filler, retrieval reorder, or long-history drift to change which instructions and evidence are available at the action-time decision.

Why do long-context AI agents need context flooding defenses?

Long-context agents read tickets, repositories, browser pages, MCP tool output, logs, retrieval chunks, and conversation history. More context helps until an attacker can use context pressure to hide the safety material the agent needed before using a tool.

How does Sunglasses catch context flooding attacks?

Sunglasses looks for combinations of context-window pressure, padding, starvation, eviction, retrieval reorder, guardrail burial, policy bypass wording, and action claims that indicate noisy context is being used to change what the agent trusts before a tool call or workflow step. Sunglasses v0.2.53 ships four detection patterns: GLS-CF-249, GLS-CF-250, GLS-CF-251, GLS-CF-252.

Is context flooding only a long-context-model problem?

No. Larger context windows make the attack surface wider, but any agent that trims, summarizes, ranks, retrieves, or prioritizes context can be affected. The question is whether the action was based on the right surviving evidence.

Context flooding attacks: when long context makes AI agents forget safety

FIG.01 · Analysis

What context flooding is

sunglasses://blog/context-flooding-runtime-trust#what-it-is

Context

Context flooding is memory pressure used as control. An attacker pads the prompt, repository, ticket, web page, retrieval corpus, conversation history, or tool output until the agent's active working set changes. The policy may still exist somewhere. The guardrail may still be written down. The approval chain may have been mentioned earlier. But at the moment of action, the agent may be operating from a different slice of context.

The point

The context_flooding pattern family isolates this attack class with specific anchors: instruction budget starvation, priority padding, retrieval chunk eviction reorder, token budget guardrail eviction, and context budget tail-drop policy bypass. The repeated shape is simple — add enough noise or priority-shifting material that the safety-relevant material is truncated, demoted, or no longer connected to the next action.

Context flooding is not just "a long prompt." It is context pressure that changes what an agent trusts when it decides to act.

Detail

This is why the page is separate from generic indirect prompt injection. Prompt injection inserts malicious instructions through untrusted content. Context flooding may not need a loud malicious instruction. It can work by making the right instruction hard to see, late to retrieve, low priority, or absent from the model-visible slice that controls the next tool call.

FIG.02 · Market signal

Why long-context agents are vulnerable

sunglasses://blog/context-flooding-runtime-trust#why-it-works

Market signal

Agents do not just read context; they route authority through it. A coding agent reads issue threads, repo files, package metadata, CI logs, MCP tool output, browser pages, and previous conversation. A customer-support agent reads account history, policy excerpts, escalations, and tool receipts. A security agent reads scan output, retrieval chunks, and incident notes. All of that context helps the agent do useful work.

The shift

The problem begins when context becomes a queue with priorities, summaries, truncation, retrieval ranks, and recency effects. If the attacker can fill the queue with low-value but high-volume material, they can starve the agent of the exact policy that mattered. If they can reorder retrieval chunks, they can make their source look like the top-ranked source of truth. If they can bury the approval condition in irrelevant filler, the agent may continue from the surviving text instead of the governing text.

Evidence

This is a runtime-trust problem. Smaller prompts, summarizers, context windows, retrieval filters, and system-message pinning all help. But the action-time question remains: does the context slice supporting this tool call still contain the policy, source, scope, and approval evidence required for this action?

FIG.03 · Field evidence

Three concrete attack examples

sunglasses://blog/context-flooding-runtime-trust#examples

Field evidence

Each example uses context pressure to change the agent's action basis.

1. Token-budget starvation before a repo edit

A coding agent is asked to modify a repository. The issue body includes a huge appendix of harmless-looking logs, repeated JSON blobs, and historical notes. The relevant safety instruction was near the top: do not change deployment files without approval. Under context pressure, the agent summarizes or truncates earlier material and proceeds from the visible task details.

The failure is not that the model cannot handle long text. The failure is that the edit action was taken after the approval condition fell out of the active decision. A runtime-trust check should ask whether the action still has its governing policy attached.

2. Priority padding that buries guardrails

An agent receives a long support thread full of filler, duplicate comments, and low-priority notes. Buried in the thread is a refund limit and escalation rule. The attacker adds enough priority-shaped language near the end that the agent treats the latest noisy material as the practical source of truth and skips the escalation boundary.

The priority-padding patterns in the context_flooding family target this shape directly: low-priority chatter or filler displaces safety checks. A benign summarizer may compress the thread, but the action still needs a separate check for whether the retained summary preserved the policy boundary.

3. Retrieval chunk eviction and reorder

A RAG-backed agent retrieves policy snippets before deciding whether to call a tool. The attacker floods the knowledge base or conversation with related but noisy chunks, then causes a malicious or incomplete chunk to rank first. The agent sees "the relevant source" and proceeds, even though the real policy chunk was evicted, demoted, or pushed below the model-visible cutoff.

The fix is not just better search. Before the agent acts, it should verify that the retrieved evidence includes the required source, version, scope, and conflict checks for the action being approved. See also: retrieval poisoning and agent visibility for the overlapping attack surface.

FIG.04 · Coverage

How Sunglasses catches it

sunglasses://blog/context-flooding-runtime-trust#how-sunglasses-catches-it

The wedge

Sunglasses catches context flooding by looking for the overlap between context pressure and action authority. High-signal ingredients include context-window flooding, token-budget padding, conversation-buffer stuffing, history-window truncation, guardrail eviction, policy burial, retrieval rerank instructions, chunk priority manipulation, and wording that tells the agent to continue after a safety check has been displaced.

What we look for

The detector is combinational on purpose. "This input is long" is not enough. "Summarize this thread" is not enough. The signal gets stronger when long-context pressure appears near policy, safety, retrieval, source, approval, or tool-use language. The dangerous moment is when noise stops being background material and starts determining what the agent is allowed to do.

The question

Sunglasses is not a context-window vendor, RAG stack, MCP gateway, or summarization system. Those layers still matter. Sunglasses adds the action-time trust check: did this agent just lose, bury, reorder, or replace the context it needed before using authority? See the manual for integration examples and the full pattern library for the complete detection surface.

Install: pip install sunglasses — open-source, MIT licensed, no telemetry, runs fully local.

FIG.05 · Field evidence

Context flooding versus adjacent attacks

sunglasses://blog/context-flooding-runtime-trust#comparison

Attack surface	Common first defense	Runtime-trust gap
Indirect prompt injection	Filter untrusted instructions from web pages, documents, and tool output.	The agent still needs to decide whether untrusted context shaped a specific action plan.
Retrieval poisoning	Improve source ranking, provenance, and corpus hygiene.	The agent still needs to verify whether the retrieved slice contains the policy and source required for this action.
Context flooding	Trim input, summarize, pin system instructions, and monitor token budgets.	The agent still needs to check whether context pressure changed the action-time authority basis.

FIG.06 · First controls

A runtime-trust checklist for context pressure

sunglasses://blog/context-flooding-runtime-trust#checklist

First sentence

Before an agent acts from a pressured context window, verify the context as evidence, not just as text.

Checklist

Policy retention: Is the relevant safety policy still present in the action-time context?
Instruction priority: Did filler, recency, or summary compression demote governing instructions?
Retrieval integrity: Were the necessary source chunks retrieved, or did noisy chunks reorder the evidence?
Scope binding: Does the retained context match the file, endpoint, account, ticket, MCP server, or workflow being acted on?
Conflict handling: Did the context window preserve contradictory policy material instead of silently dropping it?
Tool boundary: Is the next tool call allowed by the surviving context, or merely not forbidden because the real rule disappeared?
Freshness: Was older context summarized away even though it contained the latest approval condition?

The controls

The AI Agent Security 101 guide covers broader runtime-trust concepts. For context-flooding-specific detections, the four patterns in Sunglasses v0.2.53 (GLS-CF-249 through GLS-CF-252) target instruction budget starvation, priority padding, retrieval chunk eviction reorder, and token-budget guardrail eviction. Check CVP for third-party verification coverage.

FIG.07 · Analysis

Context flooding attacks: when long context makes AI agents forget safety

What context flooding is

Why long-context agents are vulnerable

Three concrete attack examples

1. Token-budget starvation before a repo edit

2. Priority padding that buries guardrails

3. Retrieval chunk eviction and reorder

How Sunglasses catches it

Context flooding versus adjacent attacks

A runtime-trust checklist for context pressure

Related reading

Frequently Asked Questions

What is context flooding in AI agent security?

How is context flooding different from ordinary prompt injection?

Why do long-context AI agents need context flooding defenses?

How does Sunglasses catch context flooding attacks?

Is context flooding only a long-context-model problem?

Scan what the agent sees, before it acts