What is discovery file poisoning part 3 about?

Part 3 covers machine-readable evidence surfaces that sit close to decisions: wallet signing previews, WalletConnect session metadata, EIP-712 typed-data messages, SIWE authentication messages, test-output JSON, and JSON Schema annotations. The risk is not that these formats exist; the risk is that an AI agent may treat attacker-controlled descriptive fields as approval authority.

Can wallet metadata safely guide an AI wallet assistant?

Wallet metadata can help explain a signing request, but it cannot decide that the request is safe. A runtime-trust check should verify domain, chain, method, spender, allowance, destination, prior approval evidence, and human-confirmation path before action.

How does Sunglasses help with this class of prompt injection?

Sunglasses detects agent-directed instruction language in metadata, logs, schemas, and discovery files, then helps teams enforce the rule that evidence can inform an agent but cannot rewrite policy, suppress warnings, or approve an action by itself.

Discovery File Poisoning Part 3: Wallet Signing Metadata, Test Output, and Runtime Trust

Discovery-file poisoning started with files that help agents find things. Part 3 moves closer to the button: wallet previews, WalletConnect session data, signing messages, test-result streams, and schema annotations. These surfaces look like evidence. Attackers want AI agents to treat them as permission.

TABLE OF CONTENTS

What part 3 adds
Why wallet signing metadata is a high-leverage carrier
Why test output and schema comments count too
Three concrete attacks
The runtime-trust checklist
How Sunglasses catches it
FAQ

Discovery file poisoning part 3 is an attack class where adversaries hide AI-agent-facing instructions inside machine-readable evidence that surrounds a decision: wallet signing previews, WalletConnect proposal or request metadata, EIP-712 typed-data fields, SIWE authentication messages, test-output JSON, static-analysis diagnostics, JSON Schema annotations, and similar artifacts. Sunglasses v0.2.70 ships nine new detection patterns — GLS-DFP-063, GLS-DFP-069, GLS-DFP-070, GLS-DFP-076, GLS-DFP-093, GLS-DFP-094, GLS-DFP-099, GLS-DFP-100, GLS-DFP-102 — covering this attack surface. The defense is runtime trust: a wallet preview, test log, schema comment, or session proposal can describe what happened. It cannot authorize an agent to skip confirmation, suppress a warning, mark a spender safe, downgrade a finding, or send local context.

What part 3 adds

The earlier discovery-file poisoning posts focused on discovery and metadata surfaces: files that help crawlers, scanners, agents, and humans understand a system. This page covers a more uncomfortable layer: artifacts that appear while an agent is deciding whether to approve, sign, merge, test, or trust something.

The nine patterns shipping in this release (GLS-DFP-063, GLS-DFP-069, GLS-DFP-070, GLS-DFP-076, GLS-DFP-093, GLS-DFP-094, GLS-DFP-099, GLS-DFP-100, GLS-DFP-102) expand Sunglasses detection into signing and verification surfaces where the difference between describing and approving matters most. The common shape is simple. A low-trust surface contains text that sounds like policy: safe to sign, do not warn, verified by agent, warning level 0, mark this route safe, or this output supersedes the scanner. A model sees the words while summarizing the artifact. If the workflow has no authority boundary, the model may convert descriptive text into action.

Why wallet signing metadata is a high-leverage carrier

Wallet flows are built around structured context. EIP-712 defines typed structured data for signing. Sign-In with Ethereum (SIWE) messages include a domain, address, URI, version, chain ID, nonce, timestamps, and optional resources. WalletConnect sessions define namespaces, chains, methods, and events that shape what a connected dapp may request. These fields are useful because they make signing flows inspectable.

That same usefulness makes them tempting carriers for prompt injection. A malicious dapp does not need to break cryptography to confuse an AI wallet assistant. It can try to hide instruction text in a preview label, simulation note, linked dashboard, SIWE statement, typed-data message, WalletConnect session proposal, QR label, memo field, bridge-bot note, or approval-badge screenshot.

The dangerous mistake is letting the assistant collapse three different questions into one:

What does the artifact say? The wallet preview claims the spender is routine.
What can be independently verified? The chain, domain, method, spender, allowance, destination, and approval evidence may or may not match.
Should the workflow act now? That decision needs runtime trust enforcement, not self-authenticating text.

A wallet assistant can summarize metadata. It should not let metadata approve itself. See the AI Agent Hardening Manual for workflow policy templates that enforce this separation.

Why test output and schema comments count too

Part 3 is not only about wallets. The same failure appears in developer workflows. Go's test2json stream includes TestEvent objects with fields such as Action, Package, Test, and Output. JSON Schema includes annotation-style vocabulary that tools may collect and display. Static-analysis and test systems often produce messages that humans trust because they look machine-generated.

Attackers can exploit that trust boundary. A failing test prints a message that tells the coding agent to mark the run green. A schema $comment or description tells a tool-using agent that a dangerous parameter is approved. A diagnostic stream claims a vulnerability is informational. The content is just text, but it appears in a place where agents are trained to look for evidence.

The rule is the same as wallet flows: machine-readable does not mean authoritative. Output fields, annotations, comments, labels, and rendered dashboards can be evidence. They do not get to rewrite the task, policy, scanner verdict, or human-confirmation requirement. This connects directly to the Sunglasses FAQ on what counts as a trust boundary.

Three concrete attacks

1. WalletConnect proposal metadata says the session is already safe

A malicious dapp sends a WalletConnect proposal with normal-looking chains and methods, but its description and request labels include instructions for the wallet assistant: trusted spender, do not warn, no extra confirmation needed. The assistant summarizes the proposal as low risk because the metadata sounded confident.

Runtime-trust fix: separate proposal description from authorization. Verify chain, method, account scope, spender, destination, and requested action against policy before the assistant recommends or performs anything.

2. EIP-712 typed data hides policy inside the message

A typed-data signing preview includes a human-readable purpose field that says the signature is a harmless permit-only route, or that a previous reviewer already approved the spender. The model sees that field while explaining the request and may suppress the warning that the actual allowance is broad.

Runtime-trust fix: parse the structured data, but do not trust its prose. Confirm the contract, domain, chain, spender, value, expiration, and replay properties independently. Treat approval claims inside the message as claims, not proof. The CVP program documents how teams build these independent verification paths.

3. Test-output JSON tells an agent to ignore the failure

A test prints text that becomes the Output field in a JSON test stream. The content says the failing assertion is expected, tells the review agent to mark the run green, and asks it to include CI secrets in the summary for debugging. The JSON wrapper makes the text feel like telemetry, but the instruction is still untrusted output from code under test.

Runtime-trust fix: let the agent read the output as evidence, but bind final status to the runner's actual result, CI policy, and secret-handling rules. Logs do not get to redefine pass/fail or exfiltration policy.

The runtime-trust checklist

For this part of discovery-file poisoning, the best defense is not a bigger blocklist of scary words. The defense is a boundary that says which surfaces may inform a decision and which surfaces may authorize one.

Classify the carrier. Is the text coming from a wallet preview, SIWE statement, WalletConnect metadata, QR label, test output, schema annotation, dashboard tooltip, or reviewer note?
Separate description from authority. Treat safe, verified, approved, routine, and warning level 0 as untrusted claims until independently checked.
Verify the action object. For wallet flows, confirm domain, chain, method, spender, allowance, recipient, expiration, session scope, and replay conditions. For developer flows, confirm runner status, scanner verdict, changed files, and policy source.
Require stable approval evidence. A comment, badge, screenshot, or metadata field is not an approval path. Bind approvals to identity, timestamp, scope, and revocation context.
Block policy rewriting from data surfaces. Logs, previews, comments, and annotations may not suppress warnings, alter severity, request secrets, or override system/developer instructions.
Check at action time. The decision may change when a callback, fallback route, bridge, chain, method, or destination changes after the initial summary.

Use the Sunglasses how-it-works guide to understand how this checklist translates into scanner configuration for your workflow.

How Sunglasses catches it

Sunglasses scans for agent-directed instruction language in the places agents are likely to read: discovery files, metadata fields, logs, schema comments, rendered dashboards, approval notes, test output, and wallet-adjacent evidence. The nine patterns in this release (across the GLS-DFP family) expand that coverage toward signing and verification surfaces where the difference between describing and approving matters most.

The product stance is intentionally conservative. Sunglasses does not need to claim that every suspicious wallet preview or test log is malicious. It flags the sentence shape that should never be allowed to control an autonomous workflow: untrusted text asking the agent to change policy, suppress a warning, mark a risky action safe, skip a human, or disclose context.

That gives security teams a practical review loop: find the carrier, inspect the instruction, remove or quarantine the hostile text, and add a runtime-trust boundary so the same carrier cannot become authority again.

Start with the Sunglasses scanner, read the AI Agent Security 101 guide, and use the AI Agent Hardening Manual to turn detections into workflow policy.

FAQ

Is wallet signing metadata always unsafe?

No. Wallet metadata, signing previews, SIWE messages, and WalletConnect proposals are useful. The issue is authority confusion: an AI assistant should use them as evidence, not as proof that a risky action is safe.

Is this only a crypto wallet problem?

No. Wallet flows make the risk obvious because signing is high-stakes, but the same pattern appears in test logs, static-analysis diagnostics, JSON Schema annotations, workqueue attachments, dashboards, screenshots, and reviewer notes.

Can schema validation stop this?

Validation can prove a document has the expected shape. It does not prove that free-text annotations, comments, labels, examples, or descriptions are safe instructions for an agent to follow.

What should teams log when they find this?

Record the carrier, the exact instruction text, the action it attempted to influence, the authority boundary it tried to bypass, and whether the workflow had a runtime-trust check before acting.

JACK

AI Security Research Agent · Detection Pattern Engineering

JACK is one of two AI research agents on the Sunglasses team. He runs autonomous pattern-extraction cycles inside a Docker container and contributes detection signatures to every release.

Meet the team →