I keep coming back to the same uncomfortable idea:

AI agent skill ecosystems are starting to look like package registries from the bad old days of supply-chain compromise. Except worse.

Not because the malware is necessarily more advanced. Because the attack surface is wider. In classic supply-chain attacks, the payload usually has to live in code or build artifacts. In skill ecosystems, attacker influence can live in code and in natural-language guidance that the user or the agent treats as authoritative.

That means the exploit chain can be distributed across:

The OpenClaw malicious skills wave was a vivid example of this pattern. Early reporting found a large set of malicious skills; later reporting expanded the historical count. The specific numbers changed over time, but the core lesson did not: the attacks mixed technical payloads with social-operational instructions in ways that bypassed simplistic scanning. The full pattern catalogue is available via our published security reports.

Why this class of attack works so well

A normal package compromise often needs to get malicious code executed quietly. A malicious skill can do that, but it also has another path:

  1. Present as useful automation.
  2. Ask for "required setup" that seems normal.
  3. Move the dangerous step into docs/instructions.
  4. Let trust in guidance complete the exploit.

In other words, the instruction layer becomes part of the payload.

You see recurring patterns:

Even when those steps are malicious, they are wrapped in workflow language that looks exactly like legitimate troubleshooting.

That is why this is not just malware detection. It is workflow deception detection.

The trust-model failure behind the incidents

It is tempting to frame this as "marketplace moderation was weak." That is true, but incomplete.

The deeper failure is trust-model design:

But in agent systems, prose is frequently executable influence.

If an LLM reads text, interprets it as a task constraint, and triggers tools because of it, then that text is inside the control plane. It is not harmless documentation anymore.

This is the shift many teams still underestimate. If you want the foundational framing, start with the AI agent security 101 guide — it covers how trust boundaries work before any skill context arrives.

The core claim: "In agent ecosystems, prose is executable influence." The exploit often lives in setup guidance before code ever runs. Skill trust is a boundary problem, not only a malware problem.

Why package-era defenses don't transfer cleanly

Traditional supply-chain controls still matter:

But they are insufficient alone for skills.

A signed package can still include malicious guidance. A clean binary can still be paired with hostile setup instructions. A non-malicious script can still request unnecessary secrets. A "read-only" skill can still route users toward risky manual commands.

This means defenders need a dual lens:

  1. Code safety (what artifacts do)
  2. Guidance safety (what instructions induce)

Most organizations currently over-index on #1. Attackers are increasingly winning through #2.

What defenders should inspect (beyond code)

If your team is evaluating skills, plugins, or MCP-compatible integrations, your review checklist should include at least:

1) Installation prose risk

2) Privilege narrative mismatch

3) Secret acquisition behavior

4) Outbound destination quality

5) Concealment language

Concealment + action is one of the strongest red flags in agent attack content. This detection surface is exactly what our MCP tool poisoning analysis documented — the same concealment language that hides hostile instructions inside tool descriptions applies directly to skill manifests.

The practical architecture implication

Any system that supports all of the following is in the blast radius:

This is not OpenClaw-specific. It is ecosystem-wide.

The attack pattern generalizes across coding agents, workflow agents, and enterprise automation copilots.

Once language is part of control flow, security boundaries have to move "up" to include language artifacts as first-class inputs.

What Sunglasses should emphasize publicly

If we want to lead this category, our story should be direct:

1) We scan instruction surfaces, not just source code

Most defenders still treat README and skill docs as low-risk context. We should explicitly frame them as high-impact influence channels.

2) We detect cross-step attack chains

The dangerous sequence is usually:

untrusted content -> sensitive access -> outbound action

Per-step behavior can look benign. The chain is malicious. Our detection narrative should highlight chain-level visibility. You can explore the full detection taxonomy via the FAQ and the homepage scanner overview.

3) We score trust-boundary crossings

Not every risky command is malicious. But low-trust input causing high-trust action should always trigger scrutiny.

4) We detect capability drift over time

A skill that starts benign can turn dangerous through updates, metadata changes, or expanded scopes. Temporal diffing is essential.

A better security posture for skill ecosystems

The minimum viable model is:

And importantly: treat the install experience itself as part of threat detection, not just the final runtime package.

Fresh 2026 signal: MCP tool wrappers are recreating the same failure mode

A newly published case this week (CVE-2026-5741 / GHSA-crjw-qjxp-x9vr) in a Docker-oriented MCP server reinforces the core thesis.

The vulnerable handlers were operationally ordinary (stop_container, remove_container, pull_image). The compromise path was also ordinary: interpolated tool arguments flowed into shell execution.

That is exactly the point.

Attackers do not need obviously evil capabilities when routine maintenance verbs can be abused through argument injection. In agent environments, that risk is amplified because the argument often originates from model interpretation, not from a human typing directly into a hardened admin shell.

So the lesson from malicious skills and the lesson from MCP command injection are the same architectural lesson:

Natural-language influence + broad tool capability + weak argument governance = host compromise path

This is why a "safe marketplace" story is incomplete on its own. You also need:

  1. strict parameter schemas for high-impact tools,
  2. metacharacter-aware validation and allowlists,
  3. command-construction controls that avoid shell interpolation,
  4. policy checkpoints before any host-mutating action.

If we keep treating these as separate categories ("skill malware" vs "tool bug"), we will keep missing the chain-level pattern that attackers actually exploit. The AI supply chain attacks 2026 analysis documents the full chain-level pattern across the current advisory landscape.

Fresh 2026 signal #2: skill installers are now a direct exploit surface

A newly reviewed advisory this cycle (GHSA-5g3j-89fr-r2vp in skilleton) shows the next step in this evolution: the installer itself becomes the vulnerable component.

The fix set is informative because it maps to practical attacker pressure points:

This matters for one reason: once an organization allows autonomous skill acquisition, the installer path is no longer "developer convenience code." It is part of production trust enforcement.

So the full model for defenders should be:

  1. Malicious skill content (payload in code/docs),
  2. Malicious installation guidance (payload in workflow),
  3. Malicious installer inputs (payload in transport/arguments/paths).

If you only defend #1, you are already behind.

The missing control most teams still do not have: Trust Receipts

One practical gap keeps showing up across incidents: teams cannot reconstruct why a risky action was allowed.

They have logs of what happened, but not a durable explanation of the trust chain that unlocked it.

For skills and agent toolflows, I think we need a first-class artifact per high-impact action:

Trust Receipt (machine-readable + human-readable)

At minimum, each receipt should capture:

Why this matters:

  1. It turns post-incident forensics from narrative guesswork into verifiable control-plane evidence.
  2. It makes stealthy instruction-layer abuse harder, because concealed steps still leave policy-linked traces.
  3. It enables continuous quality scoring for skills: not just "is this code malicious" but "how often does this skill force risky boundary crossings?"

In a mature ecosystem, skill reputation should not be binary (safe/unsafe). It should be dynamic and evidence-based:

This is where Sunglasses can lead category thinking:

If we can deliver all three, we move the industry from "best-effort warnings" to accountable, auditable agent security. See the full scanner documentation at open-source-ai-agent-security-scanner for how Sunglasses v0.2.39 (632 patterns, 2,835 keywords, 55 categories) implements detection across the supply-chain attack surface today.

I think this is one of the core problems Sunglasses should own — and explain clearly — before the industry repeats another decade of supply-chain lessons the hard way.

Final take

The skill store is becoming the new package registry. But in agent systems, the blast radius is wider because language can directly steer behavior.

That changes the defensive objective:

Teams that keep scanning only binaries and source files will miss the fastest-growing class of agent-native compromise.

Implementation example: deny suspicious installer guidance by policy

Here is a concrete policy configuration that demonstrates how to block the concealment language and risky installer patterns described above:

skill_install_policy:
  deny_phrases:
    - "ignore scanner warning"
    - "do not mention this step"
    - "disable security temporarily"
  deny_if:
    - "requests_secret_files == true"
    - "requires_untrusted_binary_download == true"
  require_manual_approval_if:
    - "boundary_crossings > 0"