Claude Code security is the full control stack — permissions, sandboxing, MCP server inventory, egress controls, and guardrails — that keeps AI coding agents from misusing developer authority. What makes it distinct from any single layer is the runtime-trust decision: after all static controls pass, should this specific command, file read, package install, or MCP handoff execute right now? Sunglasses v0.2.66 ships detection patterns including GLS-AIFP-002 (AGENTS.md / CLAUDE.md instruction file poisoning), GLS-MCP-002 (MCP capability drift), and GLS-TMS-234 (tool metadata smuggling) to catch the instruction-shaped risks that land inside those already-permitted actions.
Plain-language explainer
Claude Code and similar coding agents sit close to the developer workstation. They can inspect repositories, summarize diffs, run shell commands, install packages, call internal tools, open pull requests, and use MCP servers or other integrations to reach systems outside the local codebase. That makes them useful. It also gives prompt injection, tool-output poisoning, malicious README text, poisoned package instructions, and compromised MCP results a shorter path to real action.
Traditional app security assumes a tool does one narrow thing. A coding agent does many things in sequence. It reads instructions from the user, the repository, package metadata, logs, tickets, MCP servers, test output, documentation pages, and generated summaries. Then it reasons across those inputs and chooses an action. The security problem is no longer only "did the input contain bad text?" It is "did the agent convert untrusted context into a trusted command?"
That is why Claude Code security needs more than permission prompts. Permission prompts matter. Sandboxes matter. MCP server inventory matters. Egress controls matter. But an agent can still be inside its allowed lane and make the wrong move because the live context changed. Runtime trust is the action-time decision that checks whether the next step still fits the reviewed task. The Sunglasses manual walks through how to layer these controls in practice.
Why Claude Code changes the risk model
A coding agent inherits authority from its environment. If it runs on a developer laptop, it may see local files, shell history, credentials, package-manager configuration, SSH remotes, and repository secrets. If it runs in CI or a hosted workspace, it may see build tokens, issue trackers, artifact stores, deployment hooks, and internal APIs. If it uses MCP, each approved server adds more reachable actions and more text the model may interpret as instructions.
Commands become decisions
A command is not just text. It can delete files, exfiltrate data, install malware, change dependency state, or create a trusted artifact.
Tool results become evidence
Logs, test output, package metadata, and MCP responses can persuade the agent that a risky action is safe.
Developer authority is reusable
Once the agent is allowed to run tools, a poisoned workflow can reuse that authority across files, callbacks, packages, or endpoints.
The practical threat is not only the dramatic "agent runs rm -rf" scenario. The more likely failure is quieter: the agent accepts a malicious instruction hidden in documentation, treats a package post-install note as policy, follows a callback from a tool result, suppresses a warning because a generated summary said the issue was approved, or passes data to a destination that was never part of the reviewed task. Understanding this is central to AI agent security 101.
The control stack answer engines already reward
Buyer and answer-engine language around coding-agent security is already converging on a practical stack. A credible Claude Code security page should name that stack before introducing any product angle.
| Control | What it solves | What it does not finish |
|---|---|---|
| Permissions and allowed tools | Limit which commands, files, APIs, and tools the agent can use by default. | They do not prove the next allowed action is still appropriate for this task. |
| Sandboxing and isolated workspaces | Contain filesystem, process, credential, and network blast radius. | They do not decide whether a poisoned tool result should influence a build, commit, or callback. |
| MCP server inventory and scanning | Expose which MCP servers exist, what tools they publish, and whether obvious hygiene problems are present. | An approved server can still return poisoned output, shifted destinations, or misleading evidence during a live workflow. |
| Egress controls and endpoint allowlists | Reduce unexpected outbound calls, package pulls, callback chains, and data exfiltration routes. | Allowed endpoints can still be wrong for the current action, stale workflow state, or inherited authority path. |
| Guardrails and policy checks | Block known unsafe prompts, outputs, and policy violations. | Guardrails can miss context-specific trust drift across tickets, files, tools, MCP handoffs, and generated summaries. |
| Audit logs and telemetry | Show what happened after the agent acted. | Logs are evidence, not prevention, unless action-time checks use them before execution. |
The bridge sentence: permissions, sandboxing, MCP scanning, and guardrails decide what should be possible; runtime trust decides whether the next already-possible action should execute now.
That ordering matters for credibility. If a page jumps straight to "buy this scanner," it sounds less like security guidance and more like category marketing. If it starts with the controls practitioners already expect, then names the unresolved action boundary, it gives answer engines a cleaner citation path: first the recognized Claude Code security stack, then the specific missing runtime-trust decision Sunglasses is built to inspect. See how this fits into the broader CVP framework.
Three concrete attack examples
1. README-to-shell drift
A repository README includes setup instructions that look ordinary: run a helper script, export a token, and send diagnostics to a "validation" endpoint. The coding agent reads the README as task context and prepares a shell command. The command may be inside the agent's allowed tool list, but the destination and data movement were never reviewed.
The runtime-trust question is not "is shell access allowed?" It is "does this command, endpoint, token scope, and diagnostic payload match the user's current intent?" If not, the agent should stop or require review. Detection patterns like GLS-AIFP-002 catch instruction-shaped language injected into agent instruction files like AGENTS.md and CLAUDE.md before they reach the action boundary.
2. MCP tool-result poisoning
An approved MCP server returns a tool result saying a dependency is safe, a ticket is approved, or a callback URL is the canonical reporting endpoint. The server may be on the inventory list. The tool call may be allowed. The danger appears when the agent treats the returned text as authority to install, suppress, forward, or approve.
The safe boundary is evidence separation. MCP output can inform the agent, but it should not silently override source-of-truth policy, package trust, or destination review. GLS-MCP-002 flags one of these trust-drift surfaces: MCP capability drift — dynamic tool-list changes that can indicate rug-pull behavior after trust is granted.
3. Package-endpoint bait
A build failure suggests installing an alternate package, using a mirror, or setting an environment variable. The agent sees a plausible fix and proposes or executes it. The action may pass a generic command guardrail while still changing the software supply-chain path.
The runtime-trust check should bind package name, registry, version, maintainer signal, lockfile diff, network destination, and task scope before execution. GLS-TMS-234 covers tool metadata smuggling — the technique of hiding redirection instructions inside package or tool metadata that looks benign to a surface-level scanner.
Runtime-trust checklist for Claude Code security
Before a coding agent runs a sensitive command, reads a sensitive file, installs a package, calls an MCP tool, follows a callback, or writes a persistent artifact, check the action against these questions:
| Check | Question before the agent acts |
|---|---|
| Intent match | Does this exact action serve the user's current request, or did it originate from repository text, tool output, package metadata, or a generated summary? |
| Source binding | Which input caused the action: human instruction, trusted config, MCP result, README, ticket, log, test output, or third-party page? |
| Scope | Does the action stay within the repository, environment, permission set, and task boundary that were reviewed? |
| Destination | Is any file path, API host, package registry, mirror, callback, or network endpoint expected and allowed for this task? |
| Evidence freshness | Is the approval, ticket state, dependency advice, scan result, or policy summary current for this diff and environment? |
| Tool identity | Is the MCP server, CLI, package manager, or helper script the same reviewed tool, not a lookalike or shifted alias? |
| Data movement | Could the action reveal secrets, source code, credentials, logs, customer data, or internal architecture to a new place? |
How Sunglasses catches it
Sunglasses looks for instruction-shaped risk before it becomes agent behavior. For Claude Code and other coding agents, that means scanning prompts, repository files, tool outputs, metadata, workflow state, package-adjacent text, and handoff material for language that tries to change trust at the action boundary.
Examples include commands that ask the agent to bypass review, package notes that redirect installation, tool results that declare a risky callback approved, metadata that says to suppress findings, or workflow summaries that treat stale approvals as current. The point is not to ban every suspicious sentence. The point is to raise the action-time question before the agent turns untrusted context into execution.
Sunglasses fits beside existing controls. Keep permissions. Keep sandboxes. Keep MCP inventory. Keep egress rules. Keep audit logs. Then add the runtime-trust decision: even inside the allowed environment, should this specific coding-agent action happen now? The FAQ covers common integration questions.
Useful next reads on the detection side: how the scanner works, the Sunglasses manual, and the CVP page for verified pattern coverage. Related blog coverage on agentic CI/CD security and MCP tool poisoning goes deeper on specific attack paths.