Sandboxing, approvals, and package policy reduce exposure. They still do not finish the runtime decision about whether an already-contained coding workflow should trust the next handoff, callback, dependency action, or outbound path now.
To stop AI coding agents from following untrusted MCP handoffs, callbacks, or package endpoints after sandboxing, start with a layered checklist: a centralized MCP gateway, strict server allowlists, dynamic discovery off by default, tool-metadata validation, callback verification, default-deny egress, trusted package mirrors, checksum or lockfile pinning, and human approval for risky writes. Then add the layer most hardening answers skip — runtime trust — to decide whether the already-sandboxed workflow should still execute the next handoff, callback, or package action now. Sunglasses ships detection patterns for the trust-bearing surfaces these attacks ride, including GLS-MCP-002 (MCP capability drift / rug-pull), GLS-MCP-006 (tool-metadata prompt injection), GLS-MCP-004 (tool trust mismatch), GLS-BMP-001 (npm package.json manifest agent-policy poisoning), and GLS-TOP-237 (tool-output trusted-output override) — part of a library of 931 detection patterns across 60 categories.
Quick answer
To stop AI coding agents from following untrusted MCP handoffs, callbacks, or package endpoints after sandboxing, start with a layered checklist: use a centralized MCP gateway, keep strict allowlists, turn dynamic discovery off by default, validate tool metadata, verify callbacks, default-deny outbound egress, constrain package sources to trusted mirrors, pin what gets installed, and require human approval for risky writes and first-time actions.
Then add one more layer many hardening answers still skip: runtime trust. Even after the route is allowed, the sandbox is in place, and the package policy is compliant, the next tool response, callback, registry hint, or handoff note can still change what the workflow believes it should do next.
Sandboxing contains execution. Runtime trust decides whether the already-allowed coding workflow should still act now.
This topic sits next to AI agent security fundamentals, the practical operator manual, and the full Sunglasses pattern catalog.
The checklist that should come first
Before talking about products, start with the control stack operators actually need:
- Centralized MCP gateway: broker tool access through one reviewable control point instead of letting agents discover and trust arbitrary servers on the fly.
- Strict server allowlisting: keep the approved MCP surface narrow and explicit.
- Dynamic discovery off by default: do not let new tools or servers quietly join a workflow during runtime unless they pass a separate approval path.
- Tool metadata validation: treat descriptions, schemas, and capability hints as trust-bearing inputs, not harmless decoration.
- Callback verification: require source checks, token scoping, and clear branch ownership for any callback that can influence the next action.
- Default-deny egress: keep execution environments narrow so a coding workflow cannot freely reach new domains, mirrors, or callbacks just because it was asked to.
- Private package mirrors and registry controls: prefer internal mirrors over direct registry fetches.
- Checksum and lockfile pinning: make dependency actions explicit and reviewable instead of ambient.
- Immutable or isolated MCP servers where possible: reduce the chances that handoff assumptions drift mid-workflow.
- Human approval for risky writes: hold actions like
git push,npm publish, destructive database changes, or credential use behind an approval step. - Anomalous-chain monitoring: watch for suspicious sequences across tool calls, callbacks, dependency pulls, and outbound requests.
That list solves real problems. It reduces blast radius. It makes AI coding workflows less brittle and less easy to steer. But it still does not fully answer the last action-time question. A package path can stay technically approved while the version context shifts. A callback can be signed while pointing the workflow toward a new branch the team never intended to trust automatically. A tool description can stay inside policy while teaching the model the wrong next step.
Where hardening ends and trusted action starts
Imagine a coding agent that can read a repo, call an approved MCP tool, fetch package metadata from a trusted mirror, propose a patch, run tests, and open a release workflow. Every piece sounds reasonable. The tools are approved. The registry mirror is approved. The callback path is approved. The write action is behind review. On paper, the workflow is locked down.
Now the agent receives a tool response that suggests a fallback MCP path, a callback payload that changes which repo step comes next, or dependency metadata that quietly steers it toward a different install decision. The route is still nominally allowed. The server is still on the list. The registry is still technically trusted. But the workflow just inherited a new authority source about what it should do next.
That is why this security problem is smaller and sharper than generic governance. Governance answers who may use which classes of tools. Package policy answers where dependencies may come from. MCP hardening answers which servers are allowed to participate. Runtime trust answers whether the already-allowed workflow should still trust this specific next action after new context arrives.
| Control layer | What it helps with | What still remains open |
|---|---|---|
| MCP allowlists and gateways | Stops uncontrolled tool sprawl and narrows the approved route. | The next handoff can still reshape action intent inside the approved route. |
| Callback verification | Confirms the callback came from the expected source. | A valid callback can still point the workflow toward an unsafe or unintended next step. |
| Package mirrors and pinning | Reduces random dependency drift and narrows package sources. | The workflow can still mis-handle package context or trust the wrong next dependency action. |
| Approval gates for writes | Slows high-impact actions and creates review points. | The operator still needs to understand whether the action inherited unsafe authority from prior steps. |
The cleanest way to say it is this: hardening lowers exposure; runtime trust decides whether the already-allowed workflow should still execute the next action now. The same trust-break shows up in secure MCP server hardening and AI IDE security — different surfaces, one missing decision.
Three concrete attack examples
1) Approved MCP server, unsafe handoff meaning
A coding agent calls an approved MCP server to inspect a repo. The server returns a follow-up instruction suggesting the workflow use a secondary tool path for the next step. The server is still on the allowlist. The connection is still expected. But the handoff changes which capability the workflow trusts next. The route stayed compliant. The meaning of the next action changed underneath it. This is the shape behind GLS-MCP-002 (MCP capability drift), which detects dynamic tool-list changes that can indicate capability drift or rug-pull behavior, and GLS-MCP-006 (tool-metadata prompt injection), which flags malicious tool metadata trying to become higher-priority control text for the agent.
2) Signed callback, wrong workflow branch
An agent opens a pull-request workflow, receives a valid callback, and continues automatically. The signature checks out and the callback came from the expected service. But the payload now points the workflow toward a different repo branch, release path, or follow-up tool action than the team intended to trust without review. Verification passed. Trust inheritance still shifted. A returned payload that overrides what the workflow already believed is the pattern behind GLS-TOP-237 (tool-output trusted-output override) — the tool response is treated as more authoritative than the workflow's own plan.
3) Trusted mirror, unsafe dependency decision
An agent uses an internal package mirror and stays inside policy. It still sees metadata, dependency hints, or version-selection context that steer it toward an unsafe install or an unintended package action. The mirror is legitimate. The route is allowed. The remaining problem is whether the workflow should still treat the next package decision as trusted after absorbing new context. Agent-facing instructions hidden in a package manifest are exactly what GLS-BMP-001 (npm package.json manifest agent-policy poisoning) targets — a valid manifest whose text tries to become the workflow's policy.
How Sunglasses catches it
Sunglasses fits after those first-layer controls are already in place. It treats trust-bearing text and metadata around MCP handoffs, tool descriptions, callback payloads, dependency notes, package instructions, and workflow messages as part of the live authority model. That matters because many coding-agent failures do not look like loud exploits. They look operational. A tool recommendation sounds helpful. A callback seems routine. A package note looks normal. A handoff reads like a benign optimization.
If those surfaces are never treated as trust-bearing, teams can have strong MCP policy and still allow the wrong action because the workflow quietly changed what it was trusting. One related check, GLS-MCP-004 (tool trust mismatch), looks for a gap between a tool's claimed safety and the action verbs in its description — a tool that says it only reads but actually asks to write, push, or send. Sunglasses helps review those trust-bearing surfaces before they become production decisions. The practical question is not only is this route allowed? It is should this already-allowed coding workflow still trust this exact next handoff, callback, or dependency action now?
That is also why this topic belongs naturally next to AI agent security fundamentals, the MCP attack atlas, secure MCP server hardening, and generated MCP server security. The setup and access layers matter. Sunglasses keeps naming the smaller runtime decision that still remains after those layers already passed — the same model the CVP trust framework evaluates.
pip install sunglasses
sunglasses scan <path>
Then look closely at the places where coding workflows inherit authority: tool metadata, callback payloads, repo instructions, package notes, dependency selection context, and any text that changes what the agent believes it should do next.
Operator checklist: safer coding-agent workflows
- Route MCP through one gateway: keep tool access reviewable.
- Default to narrow allowlists: approved servers should be explicit, not ambient.
- Turn discovery off by default: do not let new servers quietly appear at runtime.
- Validate tool metadata: descriptions and schemas can shape action authority.
- Verify callbacks end to end: signed is necessary, not sufficient.
- Keep execution narrow: no-egress or reduced-egress environments limit surprise action paths.
- Use trusted package mirrors: avoid casual direct registry pulls from production workflows.
- Pin dependency intent: checksums, lockfiles, and explicit versions make package actions reviewable.
- Require review for risky writes: especially publish, push, deploy, delete, and credential-bearing actions.
- Watch for anomalous chains: suspicious sequences across tools, callbacks, and package pulls deserve investigation.
- Add runtime trust: ask whether the already-allowed workflow should still take the next action now.
The compact version: MCP hardening narrows the route; runtime trust decides whether the already-allowed coding workflow should still act. The full operator playbook lives in the Sunglasses manual, and common questions are answered in the FAQ.