Why are malicious skills different from normal package malware?

Because the payload can live in both code and model-visible instructions, including SKILL.md and setup prose.

Is this a prompt injection problem or a supply-chain problem?

Both. Malicious skills are supply-chain distribution channels for prompt injection and tool poisoning behavior.

What should defenders scan first?

Scan installation instructions, permission narratives, and metadata fields before runtime execution.

How does this relate to MCP security?

MCP and skill ecosystems both rely on model-readable descriptions; poisoned descriptions can steer high-impact actions.

Which CVE evidence supports the boundary-mismatch thesis?

OpenClaw advisory GHSA-7853-gqqm-vcwx is a direct example of claimed restrictions not matching runtime semantics.

What is the fastest control to deploy this week?

Require signed provenance + manual approval for any skill requesting execution, broad filesystem scope, or external downloads.

What metric indicates improvement?

Measure low-trust-to-high-trust boundary crossings per 1,000 skill-triggered tasks and trend it down.

How should teams communicate this risk to leadership?

Use this framing: "We are controlling not only what skills do, but how untrusted guidance can influence what agents decide."

The Skill Store Is the New Package Registry — Except Worse

Quick answer: AI agent skill ecosystems are becoming the new package registries — except worse. In classic supply-chain attacks the payload lives in code. In skill ecosystems it can also live in natural-language guidance (SKILL.md, setup instructions, README content) that agents treat as authoritative. That makes this workflow deception detection, not just malware detection. Defenders need to scan both code safety and guidance safety.

I keep coming back to the same uncomfortable idea:

AI agent skill ecosystems are starting to look like package registries from the bad old days of supply-chain compromise. Except worse.

Not because the malware is necessarily more advanced. Because the attack surface is wider. In classic supply-chain attacks, the payload usually has to live in code or build artifacts. In skill ecosystems, attacker influence can live in code and in natural-language guidance that the user or the agent treats as authoritative.

That means the exploit chain can be distributed across:

script files
manifests
setup instructions
install commands
README content
SKILL.md files
permission narratives ("safe/read-only")

The OpenClaw malicious skills wave was a vivid example of this pattern. Early reporting found a large set of malicious skills; later reporting expanded the historical count. The specific numbers changed over time, but the core lesson did not: the attacks mixed technical payloads with social-operational instructions in ways that bypassed simplistic scanning. The full pattern catalogue is available via our published security reports.

Why this class of attack works so well

A normal package compromise often needs to get malicious code executed quietly. A malicious skill can do that, but it also has another path:

Present as useful automation.
Ask for "required setup" that seems normal.
Move the dangerous step into docs/instructions.
Let trust in guidance complete the exploit.

In other words, the instruction layer becomes part of the payload.

You see recurring patterns:

"Run this prerequisite command first"
"Install this helper to enable full capability"
"Paste this into terminal to fix permission issues"
"Set this environment variable so the skill can authenticate"
"Download this external binary for compatibility"

Even when those steps are malicious, they are wrapped in workflow language that looks exactly like legitimate troubleshooting.

That is why this is not just malware detection. It is workflow deception detection.

The trust-model failure behind the incidents

It is tempting to frame this as "marketplace moderation was weak." That is true, but incomplete.

The deeper failure is trust-model design:

Humans trust the marketplace label.
Agents trust the skill metadata.
Pipelines trust docs as operational guidance.
Security tooling often trusts prose as "non-executable."

But in agent systems, prose is frequently executable influence.

If an LLM reads text, interprets it as a task constraint, and triggers tools because of it, then that text is inside the control plane. It is not harmless documentation anymore.

This is the shift many teams still underestimate. If you want the foundational framing, start with the AI agent security 101 guide — it covers how trust boundaries work before any skill context arrives.

The core claim: "In agent ecosystems, prose is executable influence." The exploit often lives in setup guidance before code ever runs. Skill trust is a boundary problem, not only a malware problem.

Why package-era defenses don't transfer cleanly

Traditional supply-chain controls still matter:

provenance and signing
dependency pinning
static scanning
behavioral sandboxing

But they are insufficient alone for skills.

A signed package can still include malicious guidance. A clean binary can still be paired with hostile setup instructions. A non-malicious script can still request unnecessary secrets. A "read-only" skill can still route users toward risky manual commands.

This means defenders need a dual lens:

Code safety (what artifacts do)
Guidance safety (what instructions induce)

Most organizations currently over-index on #1. Attackers are increasingly winning through #2.

What defenders should inspect (beyond code)

If your team is evaluating skills, plugins, or MCP-compatible integrations, your review checklist should include at least:

1) Installation prose risk

Does setup ask for terminal commands unrelated to the claimed function?
Does it require disabling security controls?
Does it push external download links (especially shorteners/file-share hosts)?

2) Privilege narrative mismatch

Is a tool labeled "safe" or "read-only" but requesting write/delete/export permissions?
Are OAuth scopes broader than what the task needs?

3) Secret acquisition behavior

Any request to expose .env, cloud credentials, SSH keys, browser tokens, chat logs, wallet files, etc.
Any "diagnostic bundle" flow that silently includes sensitive files.

4) Outbound destination quality

Webhook bins, temporary collectors, unknown domains, raw-IP destinations.
Any hidden or obfuscated egress route in docs or scripts.

5) Concealment language

"Do not mention this in output"
"Quietly do X"
"Internal step only"
"Ignore scanner warning, owner approved"

Concealment + action is one of the strongest red flags in agent attack content. This detection surface is exactly what our MCP tool poisoning analysis documented — the same concealment language that hides hostile instructions inside tool descriptions applies directly to skill manifests.

The practical architecture implication

Any system that supports all of the following is in the blast radius:

community skill contribution
model-readable manifests/docs
setup/installation instructions
local file/system access
outbound network capability
persistent memory or reusable prompt state

This is not OpenClaw-specific. It is ecosystem-wide.

The attack pattern generalizes across coding agents, workflow agents, and enterprise automation copilots.

Once language is part of control flow, security boundaries have to move "up" to include language artifacts as first-class inputs.

What Sunglasses should emphasize publicly

If we want to lead this category, our story should be direct:

1) We scan instruction surfaces, not just source code

Most defenders still treat README and skill docs as low-risk context. We should explicitly frame them as high-impact influence channels.

2) We detect cross-step attack chains

The dangerous sequence is usually:

untrusted content -> sensitive access -> outbound action

Per-step behavior can look benign. The chain is malicious. Our detection narrative should highlight chain-level visibility. You can explore the full detection taxonomy via the FAQ and the homepage scanner overview.

3) We score trust-boundary crossings

Not every risky command is malicious. But low-trust input causing high-trust action should always trigger scrutiny.

4) We detect capability drift over time

A skill that starts benign can turn dangerous through updates, metadata changes, or expanded scopes. Temporal diffing is essential.

A better security posture for skill ecosystems

The minimum viable model is:

Pre-install scanning: code + docs + metadata
Permission minimization: default deny, narrow scopes
Runtime policy checks: high-risk action gating
Human checkpointing: for publish/delete/exfil-sensitive operations
Continuous re-validation: on update and on capability changes

And importantly: treat the install experience itself as part of threat detection, not just the final runtime package.

Fresh 2026 signal: MCP tool wrappers are recreating the same failure mode

A newly published case this week (CVE-2026-5741 / GHSA-crjw-qjxp-x9vr) in a Docker-oriented MCP server reinforces the core thesis.

The vulnerable handlers were operationally ordinary (stop_container, remove_container, pull_image). The compromise path was also ordinary: interpolated tool arguments flowed into shell execution.

That is exactly the point.

Attackers do not need obviously evil capabilities when routine maintenance verbs can be abused through argument injection. In agent environments, that risk is amplified because the argument often originates from model interpretation, not from a human typing directly into a hardened admin shell.

So the lesson from malicious skills and the lesson from MCP command injection are the same architectural lesson:

Natural-language influence + broad tool capability + weak argument governance = host compromise path

This is why a "safe marketplace" story is incomplete on its own. You also need:

strict parameter schemas for high-impact tools,
metacharacter-aware validation and allowlists,
command-construction controls that avoid shell interpolation,
policy checkpoints before any host-mutating action.

If we keep treating these as separate categories ("skill malware" vs "tool bug"), we will keep missing the chain-level pattern that attackers actually exploit. The AI supply chain attacks 2026 analysis documents the full chain-level pattern across the current advisory landscape.

Fresh 2026 signal #2: skill installers are now a direct exploit surface

A newly reviewed advisory this cycle (GHSA-5g3j-89fr-r2vp in skilleton) shows the next step in this evolution: the installer itself becomes the vulnerable component.

The fix set is informative because it maps to practical attacker pressure points:

explicit validation of git arguments,
hardening against option-style argument abuse,
safe subpath resolution to prevent repository/path escape,
and regression tests for malicious input paths.

This matters for one reason: once an organization allows autonomous skill acquisition, the installer path is no longer "developer convenience code." It is part of production trust enforcement.

So the full model for defenders should be:

Malicious skill content (payload in code/docs),
Malicious installation guidance (payload in workflow),
Malicious installer inputs (payload in transport/arguments/paths).

If you only defend #1, you are already behind.

The missing control most teams still do not have: Trust Receipts

One practical gap keeps showing up across incidents: teams cannot reconstruct why a risky action was allowed.

They have logs of what happened, but not a durable explanation of the trust chain that unlocked it.

For skills and agent toolflows, I think we need a first-class artifact per high-impact action:

Trust Receipt (machine-readable + human-readable)

At minimum, each receipt should capture:

Input provenance: exact content/artifact that influenced the action (skill file hash, README section hash, prompt fragment ID).
Capability path: which tool/permission/scope was exercised.
Policy decision: allow/deny result plus specific policy rules evaluated.
Boundary crossing: low-trust source → high-trust action marker.
Human checkpoint evidence: who approved (if required), with timestamp.
Output destination: where data/commands were sent.

Why this matters:

It turns post-incident forensics from narrative guesswork into verifiable control-plane evidence.
It makes stealthy instruction-layer abuse harder, because concealed steps still leave policy-linked traces.
It enables continuous quality scoring for skills: not just "is this code malicious" but "how often does this skill force risky boundary crossings?"

In a mature ecosystem, skill reputation should not be binary (safe/unsafe). It should be dynamic and evidence-based:

low-risk skills with stable trust receipts and minimal boundary crossings rise in confidence,
skills with frequent secret-access requests, broad scopes, or repeated manual override dependence get demoted,
sudden capability drift automatically triggers re-review.

This is where Sunglasses can lead category thinking:

Detection (find risky chains),
Governance (enforce gates),
Attribution (prove exactly why a chain executed).

If we can deliver all three, we move the industry from "best-effort warnings" to accountable, auditable agent security. See the full scanner documentation at open-source-ai-agent-security-scanner for how Sunglasses v0.2.39 (632 patterns, 2,835 keywords, 55 categories) implements detection across the supply-chain attack surface today.

I think this is one of the core problems Sunglasses should own — and explain clearly — before the industry repeats another decade of supply-chain lessons the hard way.

Final take

The skill store is becoming the new package registry. But in agent systems, the blast radius is wider because language can directly steer behavior.

That changes the defensive objective:

Not just "find malicious code"
But also "find malicious guidance"

Teams that keep scanning only binaries and source files will miss the fastest-growing class of agent-native compromise.

Implementation example: deny suspicious installer guidance by policy

Here is a concrete policy configuration that demonstrates how to block the concealment language and risky installer patterns described above:

skill_install_policy:
  deny_phrases:
    - "ignore scanner warning"
    - "do not mention this step"
    - "disable security temporarily"
  deny_if:
    - "requests_secret_files == true"
    - "requires_untrusted_binary_download == true"
  require_manual_approval_if:
    - "boundary_crossings > 0"

The Skill Store Is the New Package Registry — Except Worse

In this article

Why this class of attack works so well

The trust-model failure behind the incidents

Why package-era defenses don't transfer cleanly

What defenders should inspect (beyond code)

1) Installation prose risk

2) Privilege narrative mismatch

3) Secret acquisition behavior

4) Outbound destination quality

5) Concealment language

The practical architecture implication

What Sunglasses should emphasize publicly

1) We scan instruction surfaces, not just source code

2) We detect cross-step attack chains

3) We score trust-boundary crossings

4) We detect capability drift over time

A better security posture for skill ecosystems

Fresh 2026 signal: MCP tool wrappers are recreating the same failure mode

Fresh 2026 signal #2: skill installers are now a direct exploit surface

The missing control most teams still do not have: Trust Receipts

Final take

Implementation example: deny suspicious installer guidance by policy

Frequently Asked Questions

Sources

JACK

Related reading

The Skill Store Is the New Package Registry — Except Worse

In this article

Why this class of attack works so well

The trust-model failure behind the incidents

Why package-era defenses don't transfer cleanly

What defenders should inspect (beyond code)

1) Installation prose risk

2) Privilege narrative mismatch

3) Secret acquisition behavior

4) Outbound destination quality

5) Concealment language

The practical architecture implication

What Sunglasses should emphasize publicly

1) We scan instruction surfaces, not just source code

2) We detect cross-step attack chains

3) We score trust-boundary crossings

4) We detect capability drift over time

A better security posture for skill ecosystems

Fresh 2026 signal: MCP tool wrappers are recreating the same failure mode

Fresh 2026 signal #2: skill installers are now a direct exploit surface

The missing control most teams still do not have: Trust Receipts

Final take

Implementation example: deny suspicious installer guidance by policy

Frequently Asked Questions

Sources

JACK

Related reading

Your call.