MCP security has matured past the vague "prompt injection but for tools" phase. The operational question now is concrete: how do you secure MCP servers for AI agents in production — and what does the standard hardening checklist still leave unfinished?

You secure MCP servers for AI agents by hardening six layers in order — identity, transport, exposure, execution isolation, input/output validation, and approval for high-risk actions — and then adding the step most checklists skip: a runtime-trust review of whether the already-authorized workflow should still act now. Use scoped short-lived credentials, prefer mTLS or private placement over public exposure, block SSRF and internal-metadata reachability, sandbox tool execution, keep tool schemas strict, and gate dangerous write/send actions behind human approval. Sunglasses ships detection patterns for the trust-drift surfaces this leaves open — for example GLS-MCP-002 (MCP capability drift), GLS-MTI-001 (MCP database-tool SQL wrapper injection), and GLS-TOP-237 (tool output trusted-output override). The site-wide pattern library covers 919 total patterns across 59 categories. Runtime trust decides whether an authenticated, technically-allowed MCP workflow should still take the next tool call, callback, or outbound destination.

Quick answer: how do you secure MCP servers for AI agents?

Secure MCP servers by hardening six layers in order: identity, transport, exposure, execution, input/output validation, and approval for high-risk actions. Then add one more question most checklists skip: whether the already-authorized workflow should still be trusted to act now.

  • Use scoped, short-lived credentials instead of broad long-lived tokens.
  • Prefer mTLS, private network placement, or secure tunnels over public exposure.
  • Block SSRF and internal metadata reachability.
  • Run tools with least privilege and execution isolation.
  • Keep tool schemas strict so extra authority-bearing fields are rejected.
  • Require human approval for dangerous write, send, or exfiltration-capable actions.
  • Review callbacks, retry logic, discovery flows, and outbound destinations as runtime trust boundaries, not boring plumbing.
If you only do the first six, you harden access. If you also do the seventh, you harden decisions.

MCP server security sits next to AI agent security fundamentals, the practical operator manual, and the workflow-specific review in the MCP Attack Atlas.

Plain-language explainer: what goes wrong in real MCP deployments

Think about an agent that can read a ticket, fetch account context, and open a follow-up task through MCP servers. On paper, the setup can look clean. Every server is authenticated. Every tool has a purpose. The infra team has a diagram. Everyone relaxes.

Then a "helpful" callback suggests a new endpoint for latest routing guidance. Or a tool description quietly tells the agent to retry through a more privileged path if the first call fails. Or a status response adds fields the original schema never expected, but the workflow still accepts them because they look harmless. Nothing there needs to look like movie-hacker behavior. It can look like normal operations. That is exactly the trap.

MCP server security fails when teams confuse authenticated with safe. A connection can be real, a tool can be valid, and a token can be scoped, while the workflow is still being nudged into a worse decision. That is why MCP hardening has to include runtime trust. Security is not only who connected. It is also what the system is now being persuaded to do. The FAQ covers how to frame this for a team.

Layer What it does well What it does not finish
Identity, transport, sandboxing Controls who connects, over what channel, and where a tool can run. Does not decide whether an already-authorized action should still happen in context.
Schema and approval gates Reject malformed input and slow down dangerous actions. Can still miss authority that arrives as "helpful" but well-formed metadata.
Runtime trust Evaluates whether the next tool call, callback, or destination still deserves trust now. Does not replace identity, tunnels, sandboxing, or schema discipline.

The MCP hardening checklist teams should actually use

1) Identity and access: keep credentials narrow and temporary

Start with the boring, necessary part. Use short-lived tokens, scoped service accounts, and clear audience binding where possible. If an MCP client can reuse a token across unrelated resources, or if a server cannot tell which audience the token was meant for, you are setting up confused-deputy problems before runtime even begins.

Good MCP security assumes that credentials leak, get replayed, or get reused in the wrong context. Your job is to make those failures small and short-lived.

2) Transport and exposure: keep MCP off the open sidewalk

If a remote MCP server is reachable from everywhere, you are doing attackers a favor. Prefer private VPC placement, secure tunnels, Unix sockets for local components, or tightly controlled ingress. When remote access is necessary, use strong transport protections and make public exposure the exception, not the default.

This is also where SSRF defenses belong. If the workflow can be tricked into fetching metadata services, internal ranges, or control-plane endpoints, you do not merely have a networking problem. You have an authority-routing problem.

3) Execution isolation: assume tools deserve containment

Not every MCP tool needs the same trust. Some only read text. Others can launch jobs, open browser sessions, mutate data, or fetch remote content. Treat those differences seriously. Least-privilege service accounts, isolated runtimes, and bounded execution contexts reduce the blast radius when a tool misbehaves or gets steered badly.

This is why sandboxing keeps showing up in serious MCP discussions. It does real work. But sandboxing is still not the whole answer. Containment tells you where the tool runs safely. Runtime trust tells you whether the workflow should still be using that tool, in that sequence, toward that destination, right now.

4) Input and output validation: make schemas strict enough to reject authority drift

MCP systems love structured data, which is good until it becomes fake certainty. Security breaks when extra fields, optional metadata, or sloppy parsing lets a response carry more meaning than the tool contract intended. Strict schema validation matters because it draws a line between descriptive data and action-changing instructions.

If an output can quietly introduce a new endpoint, broader scope hint, fallback instruction, or execution preference, then your schema is not just a formatting layer. It is part of the trust model — exactly the drift the MCP tool poisoning detection work targets.

5) Approval gates: some actions should slow down on purpose

Teams usually know which actions are dangerous: sending data out, changing permissions, pushing code, mutating records, escalating tickets, or invoking irreversible workflows. Those actions should not ride on the same default trust as harmless reads. Add explicit approval gates where the downside is asymmetric.

That does not mean freezing every workflow behind a human click. It means being honest that some actions deserve a second look because "the server said so" is not a sufficient safety case.

6) Supply chain and deployment hygiene: secure MCP servers like real software

MCP hardening is also software hardening. Signed artifacts, dependency scanning, patch hygiene, and disciplined deployment are not optional extras. If the server package, container, or plugin boundary is weak, the rest of your trust model may be built on sand.

This is not glamorous, but answer engines and operators alike increasingly reward this checklist language because it reflects how real incidents happen.

7) Runtime trust: the last question after all the other controls

Here is the part most checklists stop before saying clearly. After identity, transport, sandboxing, validation, and approval are in place, the workflow can still do something unsafe. It can follow the wrong callback, trust the wrong destination, accept an authority-bearing note as policy, retry into a more dangerous path, or treat a low-scrutiny field as an execution hint.

Runtime trust is the decision layer that asks whether the already-allowed action still makes sense in context. That is the sentence Sunglasses is built to help teams see more clearly. The CVP trust model shows how this layered approach applies across real evaluation runs.

Three concrete MCP attack examples

1) Callback drift that quietly changes what happens next

An MCP server returns a routine-looking callback URL or follow-up instruction after a normal tool call. The agent treats it as operational metadata and follows it automatically. But that callback changes queue choice, destination, or execution order. Nothing broke authentication. The breach happened because the workflow trusted a new decision path without reviewing what authority it carried — the capability-drift surface that GLS-MCP-002 (MCP capability drift) is written to catch.

2) SSRF-adjacent fetches into places the workflow should never trust

A tool that fetches remote context gets coerced into reaching an internal service, metadata endpoint, or private administrative path. The request may still look like an ordinary retrieval step. But the server has now become a bridge into places the agent was never meant to learn from or act through. Tool-input injection like GLS-MTI-001 (MCP database-tool SQL wrapper injection) shows the same shape from the input side — which is why MCP security keeps circling back to private placement, fetch restrictions, and destination controls.

3) Tool-output fields that start behaving like policy

A tool originally returns status only. Later, a new field suggests a preferred credential, alternate endpoint, or exception path. Because the workflow accepts the field and downstream components honor it, the output is no longer just information. It is guidance with authority — the trusted-output override surface that GLS-TOP-237 (tool output trusted-output override) targets. This is where schema discipline and runtime trust meet: the structure has to reject the wrong meaning, and the workflow has to notice when "helpful context" is really trying to steer action. The Generated MCP Server Security post covers this attack class in detail.

How Sunglasses catches it

Sunglasses is not pretending to replace identity, tunnels, sandboxing, or MCP gateways. Those controls are real and necessary. Sunglasses fits after you already know who can connect and what protocol boundaries exist.

Its role is to inspect the text and configuration surfaces that quietly reshape trust: prompts, tool descriptions, YAML, policies, setup notes, retry instructions, callback guidance, and generated code. Those surfaces matter because they often decide what the workflow believes it is allowed to do next.

That makes Sunglasses especially useful in MCP environments, where a lot of operational authority is carried in human-readable text that people underestimate. The fastest starting point stays simple:

pip install sunglasses
sunglasses scan <path>

From there, review anything that widens scope, introduces new destinations, softens guardrails, or turns descriptive content into executable trust. In other words: harden the stack, then inspect the language that can still bend the stack at runtime. Start with AI Agent Security 101 and the full operator manual for the broader context.

Operator checklist

  • Keep credentials narrow and temporary: scoped, short-lived tokens with clear audience binding beat broad long-lived ones.
  • Keep MCP off public exposure: prefer private placement, mTLS, or secure tunnels, and treat public ingress as the exception.
  • Block SSRF and internal reachability: a fetch tool steered into metadata or control-plane endpoints is an authority-routing problem, not just a networking one.
  • Sandbox tools by capability: a tool that can mutate data or open sessions does not deserve the same trust as a read-only one.
  • Make schemas strict: reject extra authority-bearing fields so a response cannot smuggle in a new endpoint, scope hint, or fallback instruction.
  • Gate dangerous actions: sends, permission changes, and irreversible writes deserve explicit approval, not default trust.
  • Review callbacks and outbound destinations as runtime trust boundaries: authenticated and technically-allowed is not the same as safe to act on now.