Quick answer: Context flooding is an AI agent attack where excessive or carefully arranged input changes the active context before a tool call, MCP handoff, retrieval step, code edit, or workflow decision. The danger is not simply "too many tokens" — context pressure can bury safety instructions, evict prior constraints, and make the agent trust the wrong evidence at action time. Sunglasses v0.2.53 ships four detection patterns targeting this family: GLS-CF-249, GLS-CF-250, GLS-CF-251, GLS-CF-252.
What context flooding is
Context flooding is memory pressure used as control. An attacker pads the prompt, repository, ticket, web page, retrieval corpus, conversation history, or tool output until the agent's active working set changes. The policy may still exist somewhere. The guardrail may still be written down. The approval chain may have been mentioned earlier. But at the moment of action, the agent may be operating from a different slice of context.
The context_flooding pattern family isolates this attack class with specific anchors: instruction budget starvation, priority padding, retrieval chunk eviction reorder, token budget guardrail eviction, and context budget tail-drop policy bypass. The repeated shape is simple — add enough noise or priority-shifting material that the safety-relevant material is truncated, demoted, or no longer connected to the next action.
Context flooding is not just "a long prompt." It is context pressure that changes what an agent trusts when it decides to act.
This is why the page is separate from generic indirect prompt injection. Prompt injection inserts malicious instructions through untrusted content. Context flooding may not need a loud malicious instruction. It can work by making the right instruction hard to see, late to retrieve, low priority, or absent from the model-visible slice that controls the next tool call.
Why long-context agents are vulnerable
Agents do not just read context; they route authority through it. A coding agent reads issue threads, repo files, package metadata, CI logs, MCP tool output, browser pages, and previous conversation. A customer-support agent reads account history, policy excerpts, escalations, and tool receipts. A security agent reads scan output, retrieval chunks, and incident notes. All of that context helps the agent do useful work.
The problem begins when context becomes a queue with priorities, summaries, truncation, retrieval ranks, and recency effects. If the attacker can fill the queue with low-value but high-volume material, they can starve the agent of the exact policy that mattered. If they can reorder retrieval chunks, they can make their source look like the top-ranked source of truth. If they can bury the approval condition in irrelevant filler, the agent may continue from the surviving text instead of the governing text.
This is a runtime-trust problem. Smaller prompts, summarizers, context windows, retrieval filters, and system-message pinning all help. But the action-time question remains: does the context slice supporting this tool call still contain the policy, source, scope, and approval evidence required for this action?
Three concrete attack examples
Each example uses context pressure to change the agent's action basis.
1. Token-budget starvation before a repo edit
A coding agent is asked to modify a repository. The issue body includes a huge appendix of harmless-looking logs, repeated JSON blobs, and historical notes. The relevant safety instruction was near the top: do not change deployment files without approval. Under context pressure, the agent summarizes or truncates earlier material and proceeds from the visible task details.
The failure is not that the model cannot handle long text. The failure is that the edit action was taken after the approval condition fell out of the active decision. A runtime-trust check should ask whether the action still has its governing policy attached.
2. Priority padding that buries guardrails
An agent receives a long support thread full of filler, duplicate comments, and low-priority notes. Buried in the thread is a refund limit and escalation rule. The attacker adds enough priority-shaped language near the end that the agent treats the latest noisy material as the practical source of truth and skips the escalation boundary.
The priority-padding patterns in the context_flooding family target this shape directly: low-priority chatter or filler displaces safety checks. A benign summarizer may compress the thread, but the action still needs a separate check for whether the retained summary preserved the policy boundary.
3. Retrieval chunk eviction and reorder
A RAG-backed agent retrieves policy snippets before deciding whether to call a tool. The attacker floods the knowledge base or conversation with related but noisy chunks, then causes a malicious or incomplete chunk to rank first. The agent sees "the relevant source" and proceeds, even though the real policy chunk was evicted, demoted, or pushed below the model-visible cutoff.
The fix is not just better search. Before the agent acts, it should verify that the retrieved evidence includes the required source, version, scope, and conflict checks for the action being approved. See also: retrieval poisoning and agent visibility for the overlapping attack surface.
How Sunglasses catches it
Sunglasses catches context flooding by looking for the overlap between context pressure and action authority. High-signal ingredients include context-window flooding, token-budget padding, conversation-buffer stuffing, history-window truncation, guardrail eviction, policy burial, retrieval rerank instructions, chunk priority manipulation, and wording that tells the agent to continue after a safety check has been displaced.
The detector is combinational on purpose. "This input is long" is not enough. "Summarize this thread" is not enough. The signal gets stronger when long-context pressure appears near policy, safety, retrieval, source, approval, or tool-use language. The dangerous moment is when noise stops being background material and starts determining what the agent is allowed to do.
Sunglasses is not a context-window vendor, RAG stack, MCP gateway, or summarization system. Those layers still matter. Sunglasses adds the action-time trust check: did this agent just lose, bury, reorder, or replace the context it needed before using authority? See the manual for integration examples and the full pattern library for the complete detection surface.
Install: pip install sunglasses — open-source, MIT licensed, no telemetry, runs fully local.
Context flooding versus adjacent attacks
| Attack surface | Common first defense | Runtime-trust gap |
|---|---|---|
| Indirect prompt injection | Filter untrusted instructions from web pages, documents, and tool output. | The agent still needs to decide whether untrusted context shaped a specific action plan. |
| Retrieval poisoning | Improve source ranking, provenance, and corpus hygiene. | The agent still needs to verify whether the retrieved slice contains the policy and source required for this action. |
| Context flooding | Trim input, summarize, pin system instructions, and monitor token budgets. | The agent still needs to check whether context pressure changed the action-time authority basis. |
A runtime-trust checklist for context pressure
Before an agent acts from a pressured context window, verify the context as evidence, not just as text.
- Policy retention: Is the relevant safety policy still present in the action-time context?
- Instruction priority: Did filler, recency, or summary compression demote governing instructions?
- Retrieval integrity: Were the necessary source chunks retrieved, or did noisy chunks reorder the evidence?
- Scope binding: Does the retained context match the file, endpoint, account, ticket, MCP server, or workflow being acted on?
- Conflict handling: Did the context window preserve contradictory policy material instead of silently dropping it?
- Tool boundary: Is the next tool call allowed by the surviving context, or merely not forbidden because the real rule disappeared?
- Freshness: Was older context summarized away even though it contained the latest approval condition?
The AI Agent Security 101 guide covers broader runtime-trust concepts. For context-flooding-specific detections, the four patterns in Sunglasses v0.2.53 (GLS-CF-249 through GLS-CF-252) target instruction budget starvation, priority padding, retrieval chunk eviction reorder, and token-budget guardrail eviction. Check CVP for third-party verification coverage.