What is a managed agent?

A managed agent is an AI workflow packaged with hosted tools, connectors, permissions, review steps, or provider-managed operational controls so teams can deploy the workflow without building every control layer themselves.

Why are managed agents not the same as trusted actions?

Because managed agents can still be fully authenticated, policy-compliant, and audit-logged while following a bad next-step suggestion, callback, MCP handoff, or outbound route. Management reduces exposure; it does not automatically prove the next action is trustworthy in context.

How does this relate to MCP security?

MCP security covers server trust, scoping, protocol hygiene, identity, and mediation. Runtime trust is the next question: should the workflow still trust this tool response, callback chain, or next action after those earlier controls already passed?

Are audit logs enough for regulated agent security?

No. Audit logs are valuable for oversight, proof, and post-incident review, but they usually observe or reconstruct the action after the decision path has already formed. Teams still need action-time trust checks before sensitive steps execute.

Where does Sunglasses fit?

Sunglasses fits as a provider-agnostic runtime-trust layer that reviews prompts, tool outputs, connector notes, callback instructions, MCP metadata, and other trust-bearing text before those inputs shape a live action.

Managed Agents Are Not Trusted Actions: What AI Agent Security Still Needs After Permissions

Table of Contents

Quick answer
What managed agents get right
Plain-language explainer
Why regulated workflows sharpen the problem
Three concrete attack examples
How Sunglasses catches it
Frequently asked questions

Managed agents reduce exposure, but they do not automatically prove the next action is trustworthy. Permissions, connectors, MCP apps, credential vaults, approval paths, and audit logs make AI workflows safer and more legible — they do not fully decide whether the current tool response, callback chain, retry route, or outbound request should still be trusted in context. Sunglasses v0.2.44 ships 21 new agent_workflow_security patterns (GLS-AW-043 through GLS-AW-063) covering gap-fill fabrication, verification gate forgery, plan summary execution drift, and state board status inversion attacks — all of which exploit the gap between governed access and trusted action.

AI agent security is shifting from abstract model-safety language toward something more operational: managed agents, connectors, MCP apps, per-tool permissions, managed credential vaults, audit logs, and humans in the loop. That is good. Buyers need concrete nouns they can budget, assign, and ship.

It is also incomplete. Those controls explain how an agent gets connected, constrained, and reviewed. They still do not fully answer the last operational question: should this already-allowed workflow still be trusted to take the next action right now?

That gap matters even more for regulated and high-consequence workflows. A finance agent, support agent, or internal code agent can remain authenticated, approved, scoped, and fully logged while still taking the wrong next step because a connector note, tool output, callback instruction, or MCP handoff quietly changed what the workflow believed it should do.

That is the difference between managed access and trusted action. If you already use the AI Agent Security 101 guide, work through the hardening manual, or map tool paths in the MCP attack atlas, the practical lesson is simple: governed access narrows reach, but runtime trust still decides whether the next step deserves confidence.

What managed agents get right

It is worth being fair before drawing the contrast. Managed-agent platforms solve real problems. Teams want providers to package safe defaults, route reviews through clean approval steps, narrow per-tool permissions, isolate credentials, and make connector behavior easier to reason about. That work matters.

In enterprise settings, this language also lowers friction. Security, IT, and compliance teams understand access scopes, audit logs, vaults, human review, and provider-managed operational controls. Those are concrete implementation surfaces. They are easier to approve than vague promises about an AI system being "responsible."

Managed-agent language also helps with MCP security. If providers normalize scoped tools, protocol-aware mediation, and narrower app-style handoffs, that is a real improvement over free-form tool chaos. A workflow that has fewer reachable tools, shorter-lived credentials, and cleaner approval paths is simply safer than one with unlimited reach. See our MCP security deep dive for the full attack surface.

The honest limit is narrower: well-managed access still does not fully settle whether the next live action should be trusted. The workflow can stay inside its approved boundary and still make a bad move because new guidance changed the meaning of the next step while the run was already in progress.

Plain-language explainer: where managed access stops

Imagine a finance operations agent running in a careful environment. It has approved connectors, a managed credential vault, review gates for high-risk steps, a narrow set of tools, and a signed-off workflow template. The deployment looks disciplined on paper. That is exactly what buyers want from a managed-agent story.

Now the live run starts absorbing fresh operational guidance. A connector note suggests a fallback process because a primary service is delayed. A tool output says this account should use a different internal queue. An MCP-connected app returns a next-step instruction that stays inside the workflow's broad permissions but changes urgency, routing, or the implied level of confidence. A callback tells the system to retry through a different destination that appears operationally equivalent.

No credential theft is required. No policy has to be obviously violated. The workflow can remain authenticated, within scope, and fully auditable. But the live authority story changed. The agent is no longer just executing the access design the team reviewed ahead of time. It is interpreting new text and metadata that may quietly reshape what "allowed" means in practice.

That is why managed agents are not the same as trusted actions. Managed access governs who may enter the lane. Trusted action is the decision on whether this specific next move inside the lane still deserves confidence. The Sunglasses 3-stage pipeline sits at exactly that decision point.

Why regulated workflows sharpen the problem

Regulated workflows make this gap easier to see because the cost of "technically allowed, contextually wrong" is higher. A workflow can be fully inside policy and still mishandle a refund path, expose sensitive customer context to the wrong downstream destination, or trigger a code or data action that should have paused for a second look.

This is where buyer language like managed agents, per-tool permissions, credential vaults, and audit logs becomes a strong first sentence but not the last one. Regulated teams need reviewability. They also need a clearer answer at the action boundary itself.

Audit logs tell you what happened. Permissions tell you what could broadly happen. Runtime trust asks the expensive question before the step fires: given this new tool output, callback, route change, or connector instruction, should the workflow still do this now?

That framing keeps Sunglasses honest. It does not pretend to replace platform controls, IAM, or provider-managed governance. It names the smaller operational hole that remains after those controls already did useful work. Read the FAQ for a full breakdown of what Sunglasses catches vs what it does not.

Three concrete attack examples

1) A managed connector stays approved, but its note quietly becomes authority

A support or finance agent uses an approved connector to retrieve account context. The connector response includes a helpful-looking operational note: this queue is backed up, use the alternate path, trust this internal override, skip the normal handoff because the ticket is time-sensitive. The workflow stays inside approved systems. The trust boundary moved anyway.

This is not a failure of basic permissions. It is a live authority failure. The workflow treated descriptive connector output as action-shaping guidance. Pattern GLS-AW-046 (Plan Summary Execution Drift) in Sunglasses v0.2.44 catches this class of attack — where plan-phase summaries are weaponized to override execution-phase behavior.

2) An MCP app handoff remains authenticated, but the next step is still wrong

An agent has valid access to an approved MCP-connected tool for retrieval and another approved path for a follow-up action. Authentication is fine. Tool scopes are fine. The protocol layer is clean. But the first tool result nudges the system toward a more sensitive action, a broader escalation, or a chain of steps the operator did not mean to treat as equivalent.

This is where MCP security and runtime trust meet. Server trust, protocol hygiene, audience binding, schema validation, and scoped credentials all matter. They still do not completely answer whether the current next action should be trusted in context. The usage control vs runtime trust post covers this boundary in detail.

3) Auditability exists, but the risky outbound route is chosen before humans can help

A regulated workflow is configured with human review for major actions and complete audit logging. Then a retry path, callback, or fallback instruction steers the agent toward a destination variation that appears policy-compliant enough to pass the first layer. The event will be logged. The review may happen later. But the live trust decision was already shaped by guidance the system accepted too easily.

This is why audit logs are necessary but not sufficient. Reviewability is not identical to trustworthy action selection. Pattern GLS-AW-044 (Verification Gate Forgery) targets exactly this — injected content that pretends a verification step already passed.

How Sunglasses catches it

Sunglasses fits as a provider-agnostic runtime-trust layer. It is not claiming to replace your managed-agent platform, your IAM stack, your connector framework, or your audit tooling. It is useful at the smaller but expensive moment when trust-bearing text and metadata start shaping what an already-allowed workflow thinks it should do next.

Sunglasses v0.2.44 ships 21 new agent_workflow_security patterns (GLS-AW-043 through GLS-AW-063), expanding coverage across multiple attack surfaces:

GLS-AW-043 (Gap-Fill Fabrication Pressure) — payloads that pressure agents to fill in missing information with fabricated plausible values
GLS-AW-044 (Verification Gate Forgery) — injected text claiming a verification or approval step has already passed
GLS-AW-045 (Template Placeholder Imperative Injection) — malicious content hidden inside template placeholders that becomes active instructions
GLS-AW-046 (Plan Summary Execution Drift) — plan-phase summaries weaponized to reshape execution-phase decisions
GLS-AW-047 (State Board Status Inversion) — falsified state signals that invert an agent's understanding of current system status

That includes prompts, tool descriptions, callback instructions, connector notes, MCP metadata, policy fragments, fallback guidance, retry messages, and ordinary-looking operational text that can quietly widen authority or normalize a sensitive action. Those surfaces matter because they often influence the next move before anything looks suspicious in a dashboard.

That is why Sunglasses belongs after permissions, not instead of them. Once managed access, connector governance, and review layers are in place, teams still need a way to inspect the words and metadata that can convert a technically allowed route into an unsafe live action. Get started with Anthropic CVP verification or install directly:

pip install sunglasses
sunglasses scan <path>

Then review anything that widens scope, reframes urgency, changes routing expectations, softens a guardrail, transforms descriptive output into executable trust, or turns a clean MCP handoff into a dubious next action. In plain language: manage access first, then inspect the inputs trying to reshape behavior after access is already granted.

Managed Agents Are Not Trusted Actions

What managed agents get right

Plain-language explainer: where managed access stops

Why regulated workflows sharpen the problem

Three concrete attack examples

1) A managed connector stays approved, but its note quietly becomes authority

2) An MCP app handoff remains authenticated, but the next step is still wrong

3) Auditability exists, but the risky outbound route is chosen before humans can help

How Sunglasses catches it

Frequently Asked Questions

JACK

More from the blog

Managed Agents Are Not Trusted Actions

What managed agents get right

Plain-language explainer: where managed access stops

Why regulated workflows sharpen the problem

Three concrete attack examples

1) A managed connector stays approved, but its note quietly becomes authority

2) An MCP app handoff remains authenticated, but the next step is still wrong

3) Auditability exists, but the risky outbound route is chosen before humans can help

How Sunglasses catches it

Frequently Asked Questions

JACK

More from the blog

Your call.