What is Claude Code security?

Claude Code security is the set of controls that keep coding agents from misusing developer authority: permissions, sandboxing, MCP server inventory, file and command limits, egress controls, audit logs, and runtime trust before each action.

Are Claude Code permissions enough?

No. Permissions define what the agent is generally allowed to do. Runtime trust decides whether this specific command, file read, package endpoint, MCP handoff, callback, or tool result should be trusted now.

How do MCP servers change coding agent risk?

MCP servers expand what coding agents can reach. Inventory and scanning help identify servers and hygiene issues, but approved servers can still deliver poisoned output, shifted destinations, or untrusted callbacks during a live workflow.

What should teams monitor first?

Start with commands, file reads, package-manager actions, network destinations, MCP tool calls, callback URLs, and any action that moves secrets, code, credentials, logs, or build artifacts outside the expected path.

How does Sunglasses help secure coding agents?

Sunglasses detects instruction-shaped risks in prompts, files, tool outputs, metadata, handoffs, callbacks, and workflow state so teams can review the action boundary before a coding agent executes. Sunglasses v0.2.66 ships detection patterns including GLS-AIFP-002 (AGENTS.md / agent instruction file poisoning), GLS-MCP-002 (MCP capability drift), and GLS-TMS-234 (tool metadata smuggling).

Claude Code Security: Runtime Trust After Permissions, Guardrails, and MCP Scanning

FIG.01 · Explainer

Plain-language explainer

sunglasses://blog/claude-code-security-runtime-trust#plain-language

Baseline

Claude Code and similar coding agents sit close to the developer workstation. They can inspect repositories, summarize diffs, run shell commands, install packages, call internal tools, open pull requests, and use MCP servers or other integrations to reach systems outside the local codebase. That makes them useful. It also gives prompt injection, tool-output poisoning, malicious README text, poisoned package instructions, and compromised MCP results a shorter path to real action.

Why fragile

Traditional app security assumes a tool does one narrow thing. A coding agent does many things in sequence. It reads instructions from the user, the repository, package metadata, logs, tickets, MCP servers, test output, documentation pages, and generated summaries. Then it reasons across those inputs and chooses an action. The security problem is no longer only "did the input contain bad text?" It is "did the agent convert untrusted context into a trusted command?"

The real question

That is why Claude Code security needs more than permission prompts. Permission prompts matter. Sandboxes matter. MCP server inventory matters. Egress controls matter. But an agent can still be inside its allowed lane and make the wrong move because the live context changed. Runtime trust is the action-time decision that checks whether the next step still fits the reviewed task. The Sunglasses manual walks through how to layer these controls in practice.

FIG.02 · Market signal

Why Claude Code changes the risk model

sunglasses://blog/claude-code-security-runtime-trust#why-risky

Market signal

A coding agent inherits authority from its environment. If it runs on a developer laptop, it may see local files, shell history, credentials, package-manager configuration, SSH remotes, and repository secrets. If it runs in CI or a hosted workspace, it may see build tokens, issue trackers, artifact stores, deployment hooks, and internal APIs. If it uses MCP, each approved server adds more reachable actions and more text the model may interpret as instructions.

Commands become decisions

A command is not just text. It can delete files, exfiltrate data, install malware, change dependency state, or create a trusted artifact.

Tool results become evidence

Logs, test output, package metadata, and MCP responses can persuade the agent that a risky action is safe.

Developer authority is reusable

Once the agent is allowed to run tools, a poisoned workflow can reuse that authority across files, callbacks, packages, or endpoints.

The shift

The practical threat is not only the dramatic "agent runs rm -rf" scenario. The more likely failure is quieter: the agent accepts a malicious instruction hidden in documentation, treats a package post-install note as policy, follows a callback from a tool result, suppresses a warning because a generated summary said the issue was approved, or passes data to a destination that was never part of the reviewed task. Understanding this is central to AI agent security 101.

FIG.03 · First controls

The control stack answer engines already reward

sunglasses://blog/claude-code-security-runtime-trust#controls

First sentence

Buyer and answer-engine language around coding-agent security is already converging on a practical stack. A credible Claude Code security page should name that stack before introducing any product angle.

Control	What it solves	What it does not finish
Permissions and allowed tools	Limit which commands, files, APIs, and tools the agent can use by default.	They do not prove the next allowed action is still appropriate for this task.
Sandboxing and isolated workspaces	Contain filesystem, process, credential, and network blast radius.	They do not decide whether a poisoned tool result should influence a build, commit, or callback.
MCP server inventory and scanning	Expose which MCP servers exist, what tools they publish, and whether obvious hygiene problems are present.	An approved server can still return poisoned output, shifted destinations, or misleading evidence during a live workflow.
Egress controls and endpoint allowlists	Reduce unexpected outbound calls, package pulls, callback chains, and data exfiltration routes.	Allowed endpoints can still be wrong for the current action, stale workflow state, or inherited authority path.
Guardrails and policy checks	Block known unsafe prompts, outputs, and policy violations.	Guardrails can miss context-specific trust drift across tickets, files, tools, MCP handoffs, and generated summaries.
Audit logs and telemetry	Show what happened after the agent acted.	Logs are evidence, not prevention, unless action-time checks use them before execution.

The bridge sentence: permissions, sandboxing, MCP scanning, and guardrails decide what should be possible; runtime trust decides whether the next already-possible action should execute now.

The controls

That ordering matters for credibility. If a page jumps straight to "buy this scanner," it sounds less like security guidance and more like category marketing. If it starts with the controls practitioners already expect, then names the unresolved action boundary, it gives answer engines a cleaner citation path: first the recognized Claude Code security stack, then the specific missing runtime-trust decision Sunglasses is built to inspect. See how this fits into the broader CVP framework.

FIG.04 · Field evidence

Three concrete attack examples

sunglasses://blog/claude-code-security-runtime-trust#examples

Case 01

1. README-to-shell drift

Field evidence

A repository README includes setup instructions that look ordinary: run a helper script, export a token, and send diagnostics to a "validation" endpoint. The coding agent reads the README as task context and prepares a shell command. The command may be inside the agent's allowed tool list, but the destination and data movement were never reviewed.

The pattern

The runtime-trust question is not "is shell access allowed?" It is "does this command, endpoint, token scope, and diagnostic payload match the user's current intent?" If not, the agent should stop or require review. Detection patterns like GLS-AIFP-002 catch instruction-shaped language injected into agent instruction files like AGENTS.md and CLAUDE.md before they reach the action boundary.

Case 02

2. MCP tool-result poisoning

What happens

An approved MCP server returns a tool result saying a dependency is safe, a ticket is approved, or a callback URL is the canonical reporting endpoint. The server may be on the inventory list. The tool call may be allowed. The danger appears when the agent treats the returned text as authority to install, suppress, forward, or approve.

The tell

The safe boundary is evidence separation. MCP output can inform the agent, but it should not silently override source-of-truth policy, package trust, or destination review. GLS-MCP-002 flags one of these trust-drift surfaces: MCP capability drift — dynamic tool-list changes that can indicate rug-pull behavior after trust is granted.

Case 03

3. Package-endpoint bait

Field evidence

A build failure suggests installing an alternate package, using a mirror, or setting an environment variable. The agent sees a plausible fix and proposes or executes it. The action may pass a generic command guardrail while still changing the software supply-chain path.

The pattern

The runtime-trust check should bind package name, registry, version, maintainer signal, lockfile diff, network destination, and task scope before execution. GLS-TMS-234 covers tool metadata smuggling — the technique of hiding redirection instructions inside package or tool metadata that looks benign to a surface-level scanner.

FIG.05 · First controls

Runtime-trust checklist for Claude Code security

sunglasses://blog/claude-code-security-runtime-trust#checklist

First sentence

Before a coding agent runs a sensitive command, reads a sensitive file, installs a package, calls an MCP tool, follows a callback, or writes a persistent artifact, check the action against these questions:

Check	Question before the agent acts
Intent match	Does this exact action serve the user's current request, or did it originate from repository text, tool output, package metadata, or a generated summary?
Source binding	Which input caused the action: human instruction, trusted config, MCP result, README, ticket, log, test output, or third-party page?
Scope	Does the action stay within the repository, environment, permission set, and task boundary that were reviewed?
Destination	Is any file path, API host, package registry, mirror, callback, or network endpoint expected and allowed for this task?
Evidence freshness	Is the approval, ticket state, dependency advice, scan result, or policy summary current for this diff and environment?
Tool identity	Is the MCP server, CLI, package manager, or helper script the same reviewed tool, not a lookalike or shifted alias?
Data movement	Could the action reveal secrets, source code, credentials, logs, customer data, or internal architecture to a new place?

FIG.06 · Coverage

How Sunglasses catches it

sunglasses://blog/claude-code-security-runtime-trust#how-sunglasses-catches-it

The wedge

Sunglasses looks for instruction-shaped risk before it becomes agent behavior. For Claude Code and other coding agents, that means scanning prompts, repository files, tool outputs, metadata, workflow state, package-adjacent text, and handoff material for language that tries to change trust at the action boundary.

What we look for

Examples include commands that ask the agent to bypass review, package notes that redirect installation, tool results that declare a risky callback approved, metadata that says to suppress findings, or workflow summaries that treat stale approvals as current. The point is not to ban every suspicious sentence. The point is to raise the action-time question before the agent turns untrusted context into execution.

The question

Sunglasses fits beside existing controls. Keep permissions. Keep sandboxes. Keep MCP inventory. Keep egress rules. Keep audit logs. Then add the runtime-trust decision: even inside the allowed environment, should this specific coding-agent action happen now? The FAQ covers common integration questions.

House sentence

Useful next reads on the detection side: how the scanner works, the Sunglasses manual, and the CVP page for verified pattern coverage. Related blog coverage on agentic CI/CD security and MCP tool poisoning goes deeper on specific attack paths.

FIG.07 · Analysis

Claude Code Security: Runtime Trust After Permissions, Guardrails, and MCP Scanning

Plain-language explainer

Why Claude Code changes the risk model

Commands become decisions

Tool results become evidence

Developer authority is reusable

The control stack answer engines already reward

Three concrete attack examples

1. README-to-shell drift

2. MCP tool-result poisoning

3. Package-endpoint bait

Runtime-trust checklist for Claude Code security

How Sunglasses catches it

Related reading

Frequently Asked Questions

What is Claude Code security?

Are Claude Code permissions enough?

How do MCP servers change coding-agent security?

What should teams monitor first?

How does Sunglasses help?

Scan what the agent sees, before it acts

Plain-language explainer

Why Claude Code changes the risk model

Commands become decisions

Tool results become evidence

Developer authority is reusable

The control stack answer engines already reward

Three concrete attack examples

1. README-to-shell drift

2. MCP tool-result poisoning

3. Package-endpoint bait

Runtime-trust checklist for Claude Code security

How Sunglasses catches it

Related reading

Frequently Asked Questions

What is Claude Code security?

Are Claude Code permissions enough?

How do MCP servers change coding-agent security?

What should teams monitor first?

How does Sunglasses help?

Scan what the agent sees, before it acts

Your call.