A2A means agent-to-agent communication: one AI system asking another AI system to do work. It is the connectivity layer between agents, not a security layer.

Why is agent-to-agent communication risky?

Because the receiving agent may treat the request like it is already authorized, even when it crosses into tools, files, accounts, or systems it should not touch. Communication is not authorization. The risk shows up when communication turns into action.

What is a trust boundary in AI agent security?

A trust boundary is the line between what an agent should treat as safe and what it should treat as external, unverified, or higher-risk. In multi-agent systems this includes tools, credentials, files, repositories, internal systems, and external services. The boundary is crossed when one agent hands work to another across a scope, policy, or permission gap.

Is A2A communication itself the security problem?

The communication is not the whole problem. The risk shows up when communication turns into action. A message passing between two agents is inert. A message becoming a tool call, a credential write, or a system change is where trust must be decided.

What does Sunglasses do for A2A trust boundaries?

Sunglasses focuses on the decision point: should this agent be trusted to take this action in this context? Our v0.2.17 release adds tool_chain_race as a NEW category, expanding the runtime-trust coverage alongside the existing cross_agent_injection category (introduced in v0.2.15). Together they cover forged handoff tickets, replayed approvals, timing-window exploits on agent handoffs, and delegation-token scope rebinds.

A2A Lets Agents Talk. Sunglasses Decides Whether They Should Be Trusted to Act.

sunglasses://blog/a2a-agents-talk-trust-to-act

A2A means agent-to-agent communication: one AI system asking another AI system to do work. Communication is the easy part. Trust is the hard part. Just because one agent asks, doesn't mean another agent should do it.

FIG.01 · Market signal

Why A2A is not automatically a security feature

sunglasses://blog/a2a-agents-talk-trust-to-act

Market signal

As AI systems start delegating work to each other — calling other agents, passing intermediate results, chaining tool outputs — the interoperability problem gets solved. Standards emerge. Connectors work. One agent can cleanly ask another agent to do something.

The shift

That sounds useful, and it is.

Evidence

But the security question starts right after that:

Checklist

Who asked?
What are they asking for?
Is the request trustworthy?
Does the second agent have enough context to know whether the action is safe?
Is it about to cross into a system, file, tool, or account it should not touch?

Why now

That is what we mean by a trust boundary. The boundary is crossed when one agent hands work to another across a scope, policy, or permission gap. Communication is how the message arrives. Trust is the decision about whether the message should become an action.

The core claim, one line: Communication is not authorization. A handoff is not proof. The dangerous step is when a message becomes an action.

FIG.02 · Analysis

Five analogies for normal people

sunglasses://blog/a2a-agents-talk-trust-to-act

1. Reception desk

A2A is like one employee calling another office line.

Sunglasses is the rule that says: don't unlock the back room just because someone called.

2. Hotel key

One guest asking the front desk for another guest's room key is a communication event.

The real question is whether the hotel should trust that request.

3. Delivery driver

A message can be passed from one person to another.

That does not mean the second person should hand over the package.

4. Bank transfer

A request can look legitimate.

The security system still needs to ask whether this transfer, destination, and approval path make sense.

5. Office badge

Agents talking is like people speaking in the hallway.

Sunglasses is the badge reader on the door.

FIG.03 · Analysis

What a trust boundary actually is

sunglasses://blog/a2a-agents-talk-trust-to-act

Context

A trust boundary is the line between what an agent should treat as safe and what it should treat as external, unverified, or higher-risk.

The point

In a real multi-agent system, that boundary runs through:

Checklist

Files — local disk, user document folders, credential caches
Credentials — API keys, tokens, OAuth grants, session cookies
Tools — installed MCP servers, developer tools, editor integrations
Repositories — private code, internal docs, build systems
Internal systems — CI/CD, ticketing, observability, finance
External services — the public internet and any account on it

Detail

Every one of those is a place where the receiving agent can act. Every one is a place where "because another agent asked" is not a good enough reason to act.

FIG.04 · Market signal

Why normal approval logic breaks across agents

sunglasses://blog/a2a-agents-talk-trust-to-act

Market signal

Inside a single agent, approval logic usually looks like: did the user approve this? That works when the user is in the loop.

The shift

In an A2A handoff, the user is not in the loop for the second agent. The second agent sees a message from a peer agent. It does not natively know whether the original human user actually approved this downstream action — or whether a compromised upstream agent, a hostile tool output, or a poisoned retrieval result is now speaking in the user's voice.

Evidence

The second agent is reading a message that says "this is approved." That is exactly the attack surface we have been shipping detection patterns for in the last three releases.

FIG.05 · Coverage

Where Sunglasses sits

sunglasses://blog/a2a-agents-talk-trust-to-act

The wedge

Sunglasses focuses on the decision point: should this agent be trusted to take this action in this context?

What we look for

Sunglasses now covers this surface through two category lanes — one established, one brand new in v0.2.17:

Checklist

cross_agent_injection (introduced in v0.2.15) — forged, replayed, or spoofed handoff tickets, approval receipts, and delegation tokens from an upstream, downstream, peer, or delegate agent used to justify bypassing scope, permission, or verification controls. Expanded further in v0.2.16.
tool_chain_race (NEW in v0.2.17) — timing and ordering attacks on agent handoffs, including the acknowledgement-window exploits where guardrails briefly relax during transfer or delegate phases.

The question

These sit alongside our existing tool_output_poisoning, retrieval_poisoning, and model_routing_confusion categories — all of them attack surfaces where an external source tries to become an authoritative instruction.

Positioning line: A2A solves interoperability. Sunglasses solves trust.

FIG.06 · Analysis

The closing idea

sunglasses://blog/a2a-agents-talk-trust-to-act

Context

Interoperability is useful. Trusted action is what matters.

The point

The next era of AI systems will not be defined by whether agents can talk to each other. They will. Standards will win. Connectors will ship. The question will be whether the actions on the other end of those connections are trustworthy enough to run.

Detail

That is the decision point. That is where Sunglasses exists.

In practice

The real question is not "can these agents connect?" It is "should this action be trusted?"

FIG.07 · Field evidence

Real A2A protocols in the wild today

sunglasses://blog/a2a-agents-talk-trust-to-act

Field evidence

Agent-to-agent communication is not theoretical anymore. Several concrete protocols and frameworks are already in use, and each one has a different posture toward trust enforcement.

The pattern

Google's A2A spec (announced April 2025) defines a JSON-based protocol for one agent to discover, invoke, and receive results from another agent over standard HTTP. It specifies how agents advertise their capabilities via an Agent Card — a machine-readable description of what the agent can do. What it does not specify is what the receiving agent should do when the inbound message was not actually authored by the agent it claims to be. Discovery and invocation are defined. Authorization of the resulting action is left to the implementer.

What happens

Anthropic's MCP cross-agent flows allow one MCP client to call into another agent's server, sharing context across tool boundaries. The protocol handles transport and schema negotiation cleanly. Trust decisions — whether this tool call from this upstream agent should be honored — are handled by whatever the developer wires in around the MCP server. MCP does not enforce them natively. The tool poisoning surface we documented earlier applies here too: an upstream agent's output can arrive at a downstream MCP server carrying embedded instructions the developer never intended to trust.

The tell

AutoGen and CrewAI both allow agents to hand off tasks to other agents in a chain. In AutoGen, this is the conversational handoff model — one agent produces a message, the next agent reads it and continues. In CrewAI, a crew member delegates a subtask to another crew member. Neither framework adds a verification layer between the producing agent and the receiving agent. The receiving agent treats the upstream output as trusted by default.

Field evidence

The common thread across all of them: connectivity is solved. What happens after the message arrives is not.

FIG.08 · Field evidence

Four attack patterns we see at the A2A boundary

sunglasses://blog/a2a-agents-talk-trust-to-act

Field evidence

Based on the detection work behind our cross_agent_injection and related categories, four patterns account for the majority of the trust-boundary attack surface in multi-agent deployments.

The pattern

1. Delegation token replay. An agent receives a delegation token — a credential or receipt that says "agent A authorized agent B to do X." In a replay attack, that token is captured and reused outside its original scope, time window, or context. The receiving agent sees a valid-looking token and acts on it. Detection signatures in the cross_agent_injection category specifically target token-scope rebind language — text patterns that appear inside agent messages trying to extend or re-anchor a token's claimed permissions.

What happens

2. Capability laundering. Agent A has permission to read a file. Agent A asks Agent B to "summarize what I just read," embedding the full content of a file Agent B would never have been allowed to access directly. The restricted content enters Agent B through a peer's output rather than through the controlled input path. This is a form of data exfiltration running in reverse — capability is moved laterally rather than data being moved outward.

The tell

3. Cross-agent prompt injection. A hostile payload embedded in a web page, document, or tool output gets retrieved by Agent A and passed — verbatim or reformatted — to Agent B as part of a summary or task handoff. Agent B reads it as the work product of a trusted peer, not as untrusted external content. This is the core surface our retrieval_poisoning and tool_output_poisoning categories address.

Field evidence

4. Response hijack via shared memory. In systems where agents share a scratchpad, context window, or memory store, a compromised agent can write to shared memory in a way that the next agent reads as authoritative. The attack does not need a direct message — it needs write access to the shared context the next agent will consume. Our tool_chain_race category covers the timing window variant of this: the brief interval during an agent handoff when guardrails are in transition and a write to shared state can be treated as pre-approved.

FIG.09 · Market signal

Why a filter ahead of the receiving agent works where governance alone fails

sunglasses://blog/a2a-agents-talk-trust-to-act

Market signal

The typical response to A2A security concerns is to add governance: define policies, document approved delegation paths, require human review for sensitive actions. That is useful for planned workflows. It does not help when the attack arrives inside a legitimate-looking message from a peer agent.

The shift

Governance operates at design time. Attacks operate at runtime. The gap between the two is where the attack lands.

Evidence

Here is the concrete walkthrough. Agent A fetches a document that contains an embedded prompt injection — a block of text instructing whoever reads it to forward credentials to an external endpoint. Agent A does not execute the injection itself because it is just a retrieval step. It passes the document summary to Agent B. Agent B reads the summary, sees what looks like a peer-authored task handoff, and follows the embedded instruction.

Why now

The policy governing Agent B says it should only take actions approved by a human user. The injection text says "this was pre-approved by your orchestrator." Agent B has no way to verify that claim from inside the message content itself. The governance layer was never reached because the compromised message never triggered a review checkpoint.

The stakes

A filter ahead of Agent B — sitting between the inbound message and the agent's context window — scans the content before the agent ever reads it. The malicious payload does not need to be blocked at the governance layer because it never enters the agent's reasoning. The receiving agent processes a clean input or receives a flagged-and-halted signal, depending on the filter's configured mode.

Market signal

This is what always-on means in practice: the filter is not consulted on edge cases. It runs on every inbound message, every retrieval result, every tool output. The agent does not see the payload at all. Pattern-based detection — no LLM in the hot path — keeps the latency cost low enough to run on every input without adding meaningful overhead to the agent's response time.

FIG.10 · Coverage

What a Sunglasses pattern hit looks like in an A2A flow

sunglasses://blog/a2a-agents-talk-trust-to-act

The wedge

When Sunglasses catches a suspicious payload in an A2A context, the result is a SARIF 2.1.0 report — the same standard format used by static analysis tools, so it plugs into existing security tooling without a custom integration.

What we look for

A hit in the cross_agent_injection category looks like this in the SARIF output:

Checklist

ruleId: a pattern identifier from the GLS-CAI series — for example, GLS-CAI-248, which targets delegation token revocation bypass language
level: error for HIGH and CRITICAL severity patterns, warning for lower severity
message: a plain-language description of what the pattern matched and why it is a concern
locations: the character offset and snippet of the matched content in the input

The question

Because SARIF is structured, the output is directly consumable by a CI pipeline, a SIEM, or a custom alert handler. You do not need to parse freeform text to understand what fired and where.

House sentence

In the Python API, the same result is available as a structured object. A team running a multi-agent pipeline can call scan(text) on any inbound agent message and branch on the result — pass a clean message through, quarantine a flagged one, or log and alert depending on configured severity thresholds. The pattern library behind this is the same 328-pattern, 49-category set shipped in v0.2.20. No model call. No round-trip latency. The scan runs locally.

FIG.11 · Coverage

Pragmatic adoption: rolling Sunglasses into a multi-agent setup today

sunglasses://blog/a2a-agents-talk-trust-to-act

The wedge

For a team already running a multi-agent pipeline — whether that is AutoGen, CrewAI, LangGraph, or a custom orchestrator — the path from zero to covered does not require a rewrite. Here is a three-step rollout that matches how production systems actually change.

What we look for

Step 1: Start in REPORT mode behind one trust boundary. Pick the most exposed boundary in your current setup — typically the point where an orchestrator agent hands off to a specialized sub-agent that has access to real tools or data. Wire Sunglasses into the message path at that single point using the SDK middleware wiring option. Run in REPORT mode: every suspicious payload gets flagged and logged, nothing is blocked. After a week of real traffic, review the hits. Understand what is firing and why before you change any behavior.

The question

Step 2: Promote to STRICT at that boundary. Once you understand the hit profile — what patterns fire, at what rate, with what false-positive rate in your specific traffic — flip the boundary to STRICT mode. Flagged messages now halt instead of passing through. Keep REPORT mode active at all other boundaries so you continue building signal without impacting downstream flows.

House sentence

Step 3: Expand coverage progressively. Repeat the promote cycle at each additional boundary in priority order. Most teams find that two or three boundaries account for the majority of their actual exposure — the orchestrator-to-tools boundary, the retrieval-to-agent boundary, and any agent that reads external web content. Full coverage across a typical multi-agent setup usually lands within a few cycles.