Why are MCP handoffs risky even when the server is approved?

Because an approved server only tells you the route is allowed. The next tool response, description, callback, or follow-up action can still change what the workflow believes it should do next.

How do package endpoints become an AI-agent security problem?

Dependency actions are normal coding workflow steps. If the workflow trusts the wrong endpoint, the wrong metadata, or the wrong next package instruction, package operations become action-time trust problems rather than just software-supply-chain hygiene.

Is package scanning or policy alone enough?

No. Package scanning and policy reduce exposure, but they do not fully answer whether the already-allowed coding workflow should still trust the next dependency action after new runtime context, tool output, or callback information arrives.

Where does Sunglasses fit?

Sunglasses helps inspect trust-bearing text and metadata around handoffs, callbacks, package pulls, and tool outputs so teams can catch hidden authority shifts before the next action becomes a live decision.

How to Stop AI Coding Agents From Following Untrusted MCP Handoffs, Callbacks, or Package Endpoints

Q: How do I stop AI coding agents from following untrusted MCP handoffs, callbacks, or package endpoints?

Start with a centralized MCP gateway, strict allowlists, discovery off by default, callback verification, narrow package sources, checksum pinning, and approval for risky writes. Then add runtime trust so the system can decide whether the already-allowed workflow should still execute after new context appears.

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

Sandboxing, approvals, and package policy reduce exposure. They still do not finish the runtime decision about whether an already-contained coding workflow should trust the next handoff, callback, dependency action, or outbound path now.

FIG.01 · Analysis

Quick answer

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

Context

To stop AI coding agents from following untrusted MCP handoffs, callbacks, or package endpoints after sandboxing, start with a layered checklist: use a centralized MCP gateway, keep strict allowlists, turn dynamic discovery off by default, validate tool metadata, verify callbacks, default-deny outbound egress, constrain package sources to trusted mirrors, pin what gets installed, and require human approval for risky writes and first-time actions.

The point

Then add one more layer many hardening answers still skip: runtime trust. Even after the route is allowed, the sandbox is in place, and the package policy is compliant, the next tool response, callback, registry hint, or handoff note can still change what the workflow believes it should do next.

Sandboxing contains execution. Runtime trust decides whether the already-allowed coding workflow should still act now.

Detail

This topic sits next to AI agent security fundamentals, the practical operator manual, and the full Sunglasses pattern catalog.

FIG.02 · First controls

The checklist that should come first

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

First sentence

Before talking about products, start with the control stack operators actually need:

Checklist

Centralized MCP gateway: broker tool access through one reviewable control point instead of letting agents discover and trust arbitrary servers on the fly.
Strict server allowlisting: keep the approved MCP surface narrow and explicit.
Dynamic discovery off by default: do not let new tools or servers quietly join a workflow during runtime unless they pass a separate approval path.
Tool metadata validation: treat descriptions, schemas, and capability hints as trust-bearing inputs, not harmless decoration.
Callback verification: require source checks, token scoping, and clear branch ownership for any callback that can influence the next action.
Default-deny egress: keep execution environments narrow so a coding workflow cannot freely reach new domains, mirrors, or callbacks just because it was asked to.
Private package mirrors and registry controls: prefer internal mirrors over direct registry fetches.
Checksum and lockfile pinning: make dependency actions explicit and reviewable instead of ambient.
Immutable or isolated MCP servers where possible: reduce the chances that handoff assumptions drift mid-workflow.
Human approval for risky writes: hold actions like git push, npm publish, destructive database changes, or credential use behind an approval step.
Anomalous-chain monitoring: watch for suspicious sequences across tool calls, callbacks, dependency pulls, and outbound requests.

The controls

That list solves real problems. It reduces blast radius. It makes AI coding workflows less brittle and less easy to steer. But it still does not fully answer the last action-time question. A package path can stay technically approved while the version context shifts. A callback can be signed while pointing the workflow toward a new branch the team never intended to trust automatically. A tool description can stay inside policy while teaching the model the wrong next step.

FIG.03 · First controls

Where hardening ends and trusted action starts

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

First sentence

Imagine a coding agent that can read a repo, call an approved MCP tool, fetch package metadata from a trusted mirror, propose a patch, run tests, and open a release workflow. Every piece sounds reasonable. The tools are approved. The registry mirror is approved. The callback path is approved. The write action is behind review. On paper, the workflow is locked down.

The controls

Now the agent receives a tool response that suggests a fallback MCP path, a callback payload that changes which repo step comes next, or dependency metadata that quietly steers it toward a different install decision. The route is still nominally allowed. The server is still on the list. The registry is still technically trusted. But the workflow just inherited a new authority source about what it should do next.

What to do

That is why this security problem is smaller and sharper than generic governance. Governance answers who may use which classes of tools. Package policy answers where dependencies may come from. MCP hardening answers which servers are allowed to participate. Runtime trust answers whether the already-allowed workflow should still trust this specific next action after new context arrives.

Control layer	What it helps with	What still remains open
MCP allowlists and gateways	Stops uncontrolled tool sprawl and narrows the approved route.	The next handoff can still reshape action intent inside the approved route.
Callback verification	Confirms the callback came from the expected source.	A valid callback can still point the workflow toward an unsafe or unintended next step.
Package mirrors and pinning	Reduces random dependency drift and narrows package sources.	The workflow can still mis-handle package context or trust the wrong next dependency action.
Approval gates for writes	Slows high-impact actions and creates review points.	The operator still needs to understand whether the action inherited unsafe authority from prior steps.

Bottom line

The cleanest way to say it is this: hardening lowers exposure; runtime trust decides whether the already-allowed workflow should still execute the next action now. The same trust-break shows up in secure MCP server hardening and AI IDE security — different surfaces, one missing decision.

FIG.04 · Field evidence

Three concrete attack examples

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

Case 01

1) Approved MCP server, unsafe handoff meaning

Field evidence

A coding agent calls an approved MCP server to inspect a repo. The server returns a follow-up instruction suggesting the workflow use a secondary tool path for the next step. The server is still on the allowlist. The connection is still expected. But the handoff changes which capability the workflow trusts next. The route stayed compliant. The meaning of the next action changed underneath it. This is the shape behind GLS-MCP-002 (MCP capability drift), which detects dynamic tool-list changes that can indicate capability drift or rug-pull behavior, and GLS-MCP-006 (tool-metadata prompt injection), which flags malicious tool metadata trying to become higher-priority control text for the agent.

Case 02

2) Signed callback, wrong workflow branch

The pattern

An agent opens a pull-request workflow, receives a valid callback, and continues automatically. The signature checks out and the callback came from the expected service. But the payload now points the workflow toward a different repo branch, release path, or follow-up tool action than the team intended to trust without review. Verification passed. Trust inheritance still shifted. A returned payload that overrides what the workflow already believed is the pattern behind GLS-TOP-237 (tool-output trusted-output override) — the tool response is treated as more authoritative than the workflow's own plan.

Case 03

3) Trusted mirror, unsafe dependency decision

What happens

An agent uses an internal package mirror and stays inside policy. It still sees metadata, dependency hints, or version-selection context that steer it toward an unsafe install or an unintended package action. The mirror is legitimate. The route is allowed. The remaining problem is whether the workflow should still treat the next package decision as trusted after absorbing new context. Agent-facing instructions hidden in a package manifest are exactly what GLS-BMP-001 (npm package.json manifest agent-policy poisoning) targets — a valid manifest whose text tries to become the workflow's policy.

FIG.05 · Coverage

How Sunglasses catches it

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

The wedge

Sunglasses fits after those first-layer controls are already in place. It treats trust-bearing text and metadata around MCP handoffs, tool descriptions, callback payloads, dependency notes, package instructions, and workflow messages as part of the live authority model. That matters because many coding-agent failures do not look like loud exploits. They look operational. A tool recommendation sounds helpful. A callback seems routine. A package note looks normal. A handoff reads like a benign optimization.

What we look for

If those surfaces are never treated as trust-bearing, teams can have strong MCP policy and still allow the wrong action because the workflow quietly changed what it was trusting. One related check, GLS-MCP-004 (tool trust mismatch), looks for a gap between a tool's claimed safety and the action verbs in its description — a tool that says it only reads but actually asks to write, push, or send. Sunglasses helps review those trust-bearing surfaces before they become production decisions. The practical question is not only is this route allowed? It is should this already-allowed coding workflow still trust this exact next handoff, callback, or dependency action now?

The question

That is also why this topic belongs naturally next to AI agent security fundamentals, the MCP attack atlas, secure MCP server hardening, and generated MCP server security. The setup and access layers matter. Sunglasses keeps naming the smaller runtime decision that still remains after those layers already passed — the same model the CVP trust framework evaluates.

Specimen

pip install sunglasses
sunglasses scan <path>

House sentence

Then look closely at the places where coding workflows inherit authority: tool metadata, callback payloads, repo instructions, package notes, dependency selection context, and any text that changes what the agent believes it should do next.

FIG.06 · First controls

Operator checklist: safer coding-agent workflows

sunglasses://blog/stop-coding-agents-untrusted-mcp-handoffs-callbacks-package-endpoints

Checklist

Route MCP through one gateway: keep tool access reviewable.
Default to narrow allowlists: approved servers should be explicit, not ambient.
Turn discovery off by default: do not let new servers quietly appear at runtime.
Validate tool metadata: descriptions and schemas can shape action authority.
Verify callbacks end to end: signed is necessary, not sufficient.
Keep execution narrow: no-egress or reduced-egress environments limit surprise action paths.
Use trusted package mirrors: avoid casual direct registry pulls from production workflows.
Pin dependency intent: checksums, lockfiles, and explicit versions make package actions reviewable.
Require review for risky writes: especially publish, push, deploy, delete, and credential-bearing actions.
Watch for anomalous chains: suspicious sequences across tools, callbacks, and package pulls deserve investigation.
Add runtime trust: ask whether the already-allowed workflow should still take the next action now.

First sentence

The compact version: MCP hardening narrows the route; runtime trust decides whether the already-allowed coding workflow should still act. The full operator playbook lives in the Sunglasses manual, and common questions are answered in the FAQ.

FIG.07 · Analysis

How to Stop AI Coding Agents From Following Untrusted MCP Handoffs, Callbacks, or Package Endpoints

Quick answer

The checklist that should come first

Where hardening ends and trusted action starts

Three concrete attack examples

1) Approved MCP server, unsafe handoff meaning

2) Signed callback, wrong workflow branch

3) Trusted mirror, unsafe dependency decision

How Sunglasses catches it

Operator checklist: safer coding-agent workflows

Related reading

Frequently Asked Questions

How do I stop AI coding agents from following untrusted MCP handoffs, callbacks, or package endpoints?

Why are MCP handoffs risky even when the server is approved?

How do package endpoints become an AI-agent security problem?

Is package scanning or policy alone enough?

Where does Sunglasses fit?

Scan what the agent sees, before it acts