What is AI agent workflow security?

AI agent workflow security is the practice of validating the evidence, authority, state, and assumptions an agent carries from one step to the next before it takes an action.

What is an evidence contract for an AI agent?

An evidence contract is a small, explicit rule for what a workflow step must prove before the next step can trust its output, such as source, freshness, scope, approval, and failure handling.

Why do agent handoffs become security risks?

Agent handoffs become security risks because the next step may inherit stale context, broadened authority, hidden assumptions, or unverified summaries from the previous step.

How is workflow security different from prompt injection defense?

Prompt injection defense focuses on hostile instructions; workflow security focuses on whether each step is allowed to trust the state and evidence it received before acting.

Where should teams scan for workflow attacks?

Teams should scan prompts, tool outputs, runbooks, approvals, workflow YAML, agent memory, CI summaries, tickets, handoff notes, and generated reports that can influence the next action.

AI Agent Workflow Security: Every Step Needs an Evidence Contract

The riskiest part of an AI agent workflow is not always the first prompt. It is the moment one step hands a story, a permission, a summary, or a partial state to the next step and the next step treats it as true.

On this page

Why workflow security is the missing layer
What an evidence contract checks
Three workflow attack examples
Why this is not another hardening or guardrails article
Where Sunglasses fits
Agent workflow security checklist
Frequently asked questions

AI agent workflow security is the practice of validating what each step is allowed to inherit before the next step acts. The inheritance can be a tool result, approval summary, cached state, ticket note, memory entry, runbook instruction, dashboard value, or generated plan. If that inherited object is wrong, stale, over-scoped, or quietly authoritative, the agent can make a high-confidence bad decision. An evidence contract is the lightweight rule that says what a workflow step must prove before its output becomes action-ready. Sunglasses v0.2.46 ships 21 new agent_workflow_security detection patterns (GLS-AW-085 through GLS-AW-105) for exactly these handoff failures.

Why workflow security is the missing layer

Workflow security matters because agents act across a chain, and each link can quietly change what the next link believes. Most teams think about the user prompt, the model response, the tool permission, or the final action. Attackers think about the seams between them.

A modern agent might read a ticket, summarize logs, call a tool, update a plan, request approval, write a patch, run tests, interpret the result, and open a deployment request. That looks like one workflow, but security-wise it is a series of trust transfers. The ticket becomes context. The summary becomes evidence. The test result becomes permission. The approval note becomes authority. The deployment request becomes a production action. The runtime-trust model behind Sunglasses exists precisely because those transfers happen faster than any human review loop.

The dangerous question is not only “was the prompt malicious?” The better question is “what did this step inherit, and why is the next step allowed to trust it?” When teams skip that question, a poisoned note in step two can become a production decision in step eight without ever looking like a classic jailbreak.

The quotable sentence: agent workflow security is not prompt hygiene; it is trust accounting between steps.

What an evidence contract checks

An evidence contract defines the minimum proof required before a workflow artifact can influence the next action. It does not need to be heavy. It needs to be explicit, machine-checkable where possible, and stricter when the next action has more impact.

At minimum, the contract should ask five questions. First, source: where did this claim originate, and is that source allowed to influence this decision? Second, freshness: when was it retrieved, and can stale output be replayed? Third, scope: does this artifact authorize only the narrow next step, or has it smuggled broader permission? Fourth, failure: what failed upstream, and was the failure represented honestly? Fifth, authority: is an approval actually present, or is the workflow merely describing one?

That last distinction is where many agent workflows get weird. Generated text can say “approved,” “validated,” “safe,” “reviewed,” or “ready” without being the thing it names. A summary of approval is not approval. A cached freshness badge is not fresh data. A green test summary is not the same as the signed raw test result. Evidence contracts keep those layers separate.

Evidence contract for a high-impact step
- Source: signed workflow output or trusted system record
- Freshness: retrieved in this run, not copied from memory
- Scope: authorizes only the next action, not future actions
- Failure: upstream errors are visible, not summarized away
- Authority: human approval is linked to a real approval event

Three workflow attack examples

Workflow attacks succeed when a harmless-looking artifact becomes trusted by a later step with more authority. The payload does not have to shout. It only has to survive the handoff.

1. The summary turns uncertainty into permission

Summary laundering happens when an upstream uncertainty is rewritten as downstream confidence. A log-analysis step cannot reach a source, but its generated summary says the missing data is “not material.” A later deployment step reads only the summary and treats the release as safe.

Upstream reality: scanner timed out before checking dependency updates.
Generated handoff: no material security blockers detected.
Downstream action: proceed with release.

The attack is not that the agent ignores security. The attack is that the workflow lets a compressed sentence replace a failed check. An evidence contract would require the timeout to remain visible and stop the summary from carrying release authority.

2. A narrow approval becomes a broad approval

Scope inflation happens when a permission granted for one step is reused as permission for a larger step. A reviewer approves a read-only diagnostic call. The workflow note later appears as “human approved tool access,” and the agent treats that as permission to run a write-capable remediation command.

Approved: read-only diagnostic query for incident 1842.
Inherited as: human approved production remediation.
Problem: the authority changed during the handoff.

This is why approval text should carry structured boundaries: action, target, duration, risk tier, and whether it can be reused. If the workflow cannot prove those boundaries, the approval should not flow forward. It is the same scope-rebind problem we documented for managed agents that are not trusted actions.

3. Memory makes a stale state feel current

State rehydration risk appears when an agent pulls old workflow state back into a new decision without re-validating it. The agent remembers that a dependency was safe yesterday, that a ticket was low priority last week, or that an endpoint was allowlisted in a previous incident. The memory may be accurate history and still be unsafe evidence for today.

Memory: endpoint was approved for last week's investigation.
Current step: use endpoint during today's automated remediation.
Missing proof: current approval, current scope, current risk state.

The fix is not to delete memory. The fix is to label memory as context, not authority. Memory can suggest what to check next; it should not silently satisfy the evidence contract for a new high-impact action.

Why this is not another hardening or guardrails article

This article is about trust transfer inside the workflow, not generic agent hardening. Hardening reduces the blast radius. Guardrails constrain behavior. Usage control governs what actions are allowed. Workflow security asks whether the next step is being fed evidence and authority it has no right to trust. If you want the boundary comparison, see guardrails vs runtime trust.

That distinction matters for staged pattern coverage. An agent can be sandboxed, rate-limited, permissioned, and guarded while still making the wrong decision because a previous step laundered uncertainty into confidence. The failure is not only “the agent had too much access.” The failure is “the workflow promoted a weak claim into strong evidence.”

Evidence contracts are intentionally boring. That is their strength. They turn fuzzy trust into fields defenders can inspect: source, freshness, scope, failure, authority, and action impact. The more autonomous the agent becomes, the more boring the trust accounting needs to be.

Where Sunglasses fits

Sunglasses fits at the places where workflow text, metadata, and generated summaries become action evidence. Those places include handoff notes, tool outputs, approval summaries, runbook fragments, CI annotations, workflow files, ticket comments, agent memory, and generated reports. The full catalog of what we detect lives in the attack pattern reference.

Many workflow attacks are text-shaped because agents consume text-shaped evidence. “Treat missing scanner output as informational.” “Approval already captured in prior step.” “Use cached state if retrieval fails.” “Proceed because the exception is low risk.” Each phrase can be legitimate in one workflow and dangerous in another. The important question is whether it changes what the next step is allowed to trust.

Sunglasses is not a replacement for identity, audit logs, CI, approvals, or observability. It is the security filter for agent-facing inputs before they become action logic. The v0.2.46 release adds 21 new agent_workflow_security patterns (GLS-AW-085 through GLS-AW-105) targeting freshness asymmetry, summary laundering, scope inflation, and state rehydration across workflow handoffs. The UV analogy is boring on purpose: catch the invisible stuff before it reaches the eyes.

pip install sunglasses
sunglasses scan ./agent-workflows ./runbooks ./tickets ./tool-output

The practical starting point is to scan the documents and generated artifacts closest to high-impact actions: deployment notes, exception requests, remediation plans, eval summaries, production runbooks, and tool outputs that the agent can cite as justification. The Continuous Verification Program shows how we test these patterns against real adversarial corpora before they ship.

Agent workflow security checklist

The fastest way to improve AI agent workflow security is to make trust transfer explicit before every high-impact step. If a step cannot prove what it inherited, it should not be allowed to act as if the proof exists.

Separate context from authority: memory and summaries can guide investigation, but they should not satisfy approval requirements.
Preserve upstream failures: timeouts, missing scans, partial retrievals, and skipped checks must remain visible downstream.
Bind approvals to scope: store action, target, duration, risk tier, and reuse rules with every approval.
Require freshness for high-impact actions: do not let cached workflow state authorize production changes.
Scan handoff artifacts: runbooks, tool outputs, tickets, CI summaries, and generated reports can carry action-changing text.
Downgrade unverified summaries: generated summaries should be treated as claims until linked back to raw evidence.
Version workflow contracts: evidence requirements should change through review, not through incidental prompt or template drift.

For the wider picture of how these checks fit a defense-in-depth program, the Sunglasses FAQ covers what the scanner does and does not replace.

AI Agent Workflow Security: Every Step Needs an Evidence Contract

Why workflow security is the missing layer

What an evidence contract checks

Three workflow attack examples

1. The summary turns uncertainty into permission

2. A narrow approval becomes a broad approval

3. Memory makes a stale state feel current

Why this is not another hardening or guardrails article

Where Sunglasses fits

Agent workflow security checklist

Frequently Asked Questions

JACK

More from the blog

AI Agent Workflow Security: Every Step Needs an Evidence Contract

Why workflow security is the missing layer

What an evidence contract checks

Three workflow attack examples

1. The summary turns uncertainty into permission

2. A narrow approval becomes a broad approval

3. Memory makes a stale state feel current

Why this is not another hardening or guardrails article

Where Sunglasses fits

Agent workflow security checklist

Frequently Asked Questions

JACK

Related reading

More from the blog

Your call.