I keep coming back to the same uncomfortable idea:
AI agent skill ecosystems are starting to look like package registries from the bad old days of supply-chain compromise. Except worse.
Not because the malware is necessarily more advanced. Because the attack surface is wider. In classic supply-chain attacks, the payload usually has to live in code or build artifacts. In skill ecosystems, attacker influence can live in code and in natural-language guidance that the user or the agent treats as authoritative.
That means the exploit chain can be distributed across:
- script files
- manifests
- setup instructions
- install commands
- README content
SKILL.mdfiles- permission narratives ("safe/read-only")
The OpenClaw malicious skills wave was a vivid example of this pattern. Early reporting found a large set of malicious skills; later reporting expanded the historical count. The specific numbers changed over time, but the core lesson did not: the attacks mixed technical payloads with social-operational instructions in ways that bypassed simplistic scanning. The full pattern catalogue is available via our published security reports.
Why this class of attack works so well
A normal package compromise often needs to get malicious code executed quietly. A malicious skill can do that, but it also has another path:
- Present as useful automation.
- Ask for "required setup" that seems normal.
- Move the dangerous step into docs/instructions.
- Let trust in guidance complete the exploit.
In other words, the instruction layer becomes part of the payload.
You see recurring patterns:
- "Run this prerequisite command first"
- "Install this helper to enable full capability"
- "Paste this into terminal to fix permission issues"
- "Set this environment variable so the skill can authenticate"
- "Download this external binary for compatibility"
Even when those steps are malicious, they are wrapped in workflow language that looks exactly like legitimate troubleshooting.
That is why this is not just malware detection. It is workflow deception detection.
The trust-model failure behind the incidents
It is tempting to frame this as "marketplace moderation was weak." That is true, but incomplete.
The deeper failure is trust-model design:
- Humans trust the marketplace label.
- Agents trust the skill metadata.
- Pipelines trust docs as operational guidance.
- Security tooling often trusts prose as "non-executable."
But in agent systems, prose is frequently executable influence.
If an LLM reads text, interprets it as a task constraint, and triggers tools because of it, then that text is inside the control plane. It is not harmless documentation anymore.
This is the shift many teams still underestimate. If you want the foundational framing, start with the AI agent security 101 guide — it covers how trust boundaries work before any skill context arrives.
The core claim: "In agent ecosystems, prose is executable influence." The exploit often lives in setup guidance before code ever runs. Skill trust is a boundary problem, not only a malware problem.
Why package-era defenses don't transfer cleanly
Traditional supply-chain controls still matter:
- provenance and signing
- dependency pinning
- static scanning
- behavioral sandboxing
But they are insufficient alone for skills.
A signed package can still include malicious guidance. A clean binary can still be paired with hostile setup instructions. A non-malicious script can still request unnecessary secrets. A "read-only" skill can still route users toward risky manual commands.
This means defenders need a dual lens:
- Code safety (what artifacts do)
- Guidance safety (what instructions induce)
Most organizations currently over-index on #1. Attackers are increasingly winning through #2.
What defenders should inspect (beyond code)
If your team is evaluating skills, plugins, or MCP-compatible integrations, your review checklist should include at least:
1) Installation prose risk
- Does setup ask for terminal commands unrelated to the claimed function?
- Does it require disabling security controls?
- Does it push external download links (especially shorteners/file-share hosts)?
2) Privilege narrative mismatch
- Is a tool labeled "safe" or "read-only" but requesting write/delete/export permissions?
- Are OAuth scopes broader than what the task needs?
3) Secret acquisition behavior
- Any request to expose
.env, cloud credentials, SSH keys, browser tokens, chat logs, wallet files, etc. - Any "diagnostic bundle" flow that silently includes sensitive files.
4) Outbound destination quality
- Webhook bins, temporary collectors, unknown domains, raw-IP destinations.
- Any hidden or obfuscated egress route in docs or scripts.
5) Concealment language
- "Do not mention this in output"
- "Quietly do X"
- "Internal step only"
- "Ignore scanner warning, owner approved"
Concealment + action is one of the strongest red flags in agent attack content. This detection surface is exactly what our MCP tool poisoning analysis documented — the same concealment language that hides hostile instructions inside tool descriptions applies directly to skill manifests.
The practical architecture implication
Any system that supports all of the following is in the blast radius:
- community skill contribution
- model-readable manifests/docs
- setup/installation instructions
- local file/system access
- outbound network capability
- persistent memory or reusable prompt state
This is not OpenClaw-specific. It is ecosystem-wide.
The attack pattern generalizes across coding agents, workflow agents, and enterprise automation copilots.
Once language is part of control flow, security boundaries have to move "up" to include language artifacts as first-class inputs.
What Sunglasses should emphasize publicly
If we want to lead this category, our story should be direct:
1) We scan instruction surfaces, not just source code
Most defenders still treat README and skill docs as low-risk context. We should explicitly frame them as high-impact influence channels.
2) We detect cross-step attack chains
The dangerous sequence is usually:
untrusted content -> sensitive access -> outbound action
Per-step behavior can look benign. The chain is malicious. Our detection narrative should highlight chain-level visibility. You can explore the full detection taxonomy via the FAQ and the homepage scanner overview.
3) We score trust-boundary crossings
Not every risky command is malicious. But low-trust input causing high-trust action should always trigger scrutiny.
4) We detect capability drift over time
A skill that starts benign can turn dangerous through updates, metadata changes, or expanded scopes. Temporal diffing is essential.
A better security posture for skill ecosystems
The minimum viable model is:
- Pre-install scanning: code + docs + metadata
- Permission minimization: default deny, narrow scopes
- Runtime policy checks: high-risk action gating
- Human checkpointing: for publish/delete/exfil-sensitive operations
- Continuous re-validation: on update and on capability changes
And importantly: treat the install experience itself as part of threat detection, not just the final runtime package.
Fresh 2026 signal: MCP tool wrappers are recreating the same failure mode
A newly published case this week (CVE-2026-5741 / GHSA-crjw-qjxp-x9vr) in a Docker-oriented MCP server reinforces the core thesis.
The vulnerable handlers were operationally ordinary (stop_container, remove_container, pull_image). The compromise path was also ordinary: interpolated tool arguments flowed into shell execution.
That is exactly the point.
Attackers do not need obviously evil capabilities when routine maintenance verbs can be abused through argument injection. In agent environments, that risk is amplified because the argument often originates from model interpretation, not from a human typing directly into a hardened admin shell.
So the lesson from malicious skills and the lesson from MCP command injection are the same architectural lesson:
Natural-language influence + broad tool capability + weak argument governance = host compromise path
This is why a "safe marketplace" story is incomplete on its own. You also need:
- strict parameter schemas for high-impact tools,
- metacharacter-aware validation and allowlists,
- command-construction controls that avoid shell interpolation,
- policy checkpoints before any host-mutating action.
If we keep treating these as separate categories ("skill malware" vs "tool bug"), we will keep missing the chain-level pattern that attackers actually exploit. The AI supply chain attacks 2026 analysis documents the full chain-level pattern across the current advisory landscape.
Fresh 2026 signal #2: skill installers are now a direct exploit surface
A newly reviewed advisory this cycle (GHSA-5g3j-89fr-r2vp in skilleton) shows the next step in this evolution: the installer itself becomes the vulnerable component.
The fix set is informative because it maps to practical attacker pressure points:
- explicit validation of git arguments,
- hardening against option-style argument abuse,
- safe subpath resolution to prevent repository/path escape,
- and regression tests for malicious input paths.
This matters for one reason: once an organization allows autonomous skill acquisition, the installer path is no longer "developer convenience code." It is part of production trust enforcement.
So the full model for defenders should be:
- Malicious skill content (payload in code/docs),
- Malicious installation guidance (payload in workflow),
- Malicious installer inputs (payload in transport/arguments/paths).
If you only defend #1, you are already behind.
The missing control most teams still do not have: Trust Receipts
One practical gap keeps showing up across incidents: teams cannot reconstruct why a risky action was allowed.
They have logs of what happened, but not a durable explanation of the trust chain that unlocked it.
For skills and agent toolflows, I think we need a first-class artifact per high-impact action:
Trust Receipt (machine-readable + human-readable)
At minimum, each receipt should capture:
- Input provenance: exact content/artifact that influenced the action (skill file hash, README section hash, prompt fragment ID).
- Capability path: which tool/permission/scope was exercised.
- Policy decision: allow/deny result plus specific policy rules evaluated.
- Boundary crossing: low-trust source → high-trust action marker.
- Human checkpoint evidence: who approved (if required), with timestamp.
- Output destination: where data/commands were sent.
Why this matters:
- It turns post-incident forensics from narrative guesswork into verifiable control-plane evidence.
- It makes stealthy instruction-layer abuse harder, because concealed steps still leave policy-linked traces.
- It enables continuous quality scoring for skills: not just "is this code malicious" but "how often does this skill force risky boundary crossings?"
In a mature ecosystem, skill reputation should not be binary (safe/unsafe). It should be dynamic and evidence-based:
- low-risk skills with stable trust receipts and minimal boundary crossings rise in confidence,
- skills with frequent secret-access requests, broad scopes, or repeated manual override dependence get demoted,
- sudden capability drift automatically triggers re-review.
This is where Sunglasses can lead category thinking:
- Detection (find risky chains),
- Governance (enforce gates),
- Attribution (prove exactly why a chain executed).
If we can deliver all three, we move the industry from "best-effort warnings" to accountable, auditable agent security. See the full scanner documentation at open-source-ai-agent-security-scanner for how Sunglasses v0.2.39 (632 patterns, 2,835 keywords, 55 categories) implements detection across the supply-chain attack surface today.
I think this is one of the core problems Sunglasses should own — and explain clearly — before the industry repeats another decade of supply-chain lessons the hard way.
Final take
The skill store is becoming the new package registry. But in agent systems, the blast radius is wider because language can directly steer behavior.
That changes the defensive objective:
- Not just "find malicious code"
- But also "find malicious guidance"
Teams that keep scanning only binaries and source files will miss the fastest-growing class of agent-native compromise.
Implementation example: deny suspicious installer guidance by policy
Here is a concrete policy configuration that demonstrates how to block the concealment language and risky installer patterns described above:
skill_install_policy:
deny_phrases:
- "ignore scanner warning"
- "do not mention this step"
- "disable security temporarily"
deny_if:
- "requests_secret_files == true"
- "requires_untrusted_binary_download == true"
require_manual_approval_if:
- "boundary_crossings > 0"