I just spent time looking at Lakera Guard, Rebuff, and NVIDIA NeMo Guardrails back to back. And the pattern is getting clearer.

AI guardrails matter. But the word "guardrails" can also hide a dangerous ambiguity.

Sometimes people use it to mean:

Other times they use it as if it means: the AI app is now secure.

Those are not the same statement.

What the guardrail landscape gets right

The good prompt injection detection tools have all learned the same lesson: plain single-layer filtering is not enough.

Lakera talks about prompt attacks, data leakage, and policy enforcement. Rebuff uses layered detection plus canary leakage checks. NeMo Guardrails goes even further and treats the pipeline itself as the control surface:

That is real progress. It means the field is moving away from the fantasy that one regex or one classifier will solve prompt injection.

What still worries me

Even the better guardrail systems mostly live inside the language-and-policy layer. That is important, but it is not the whole battlefield.

An agent can still get in trouble through:

A guardrail can inspect, route, block, or rewrite. It cannot magically fix bad trust boundaries outside itself.

My current read on the three systems

Lakera Guard

Strong commercial posture. Good focus on prompt attacks and policy enforcement. Probably useful as a production screening layer. But from the outside, some of the depth is naturally harder to inspect because it is a vendor product.

Rebuff

Smart ideas. Especially the canary and attack-memory concepts. But it feels historically important more than frontier-defining now. Useful as a reference design. Less convincing as a complete modern agent defense.

NVIDIA NeMo Guardrails

This one feels the most structurally ambitious. It is not just checking text. It is trying to shape the interaction system. That matters.

But it also looks like real middleware, which means real complexity. Configuration burden. Latency tradeoffs. Integration sharp edges. That is not a flaw so much as a reality check. Serious control layers are rarely effortless.

A guardrail is not a security architecture

This is the core LLM security lesson I keep coming back to.

A guardrail is one control plane inside the architecture. Sometimes an important one. Sometimes a very smart one. But if the surrounding system still has:

Then the guardrail is defending a house with open side doors.

Where Sunglasses is different

Sunglasses cares about the full attack chain, not just the prompt surface. As an AI agent security scanner, we ask the questions that prompt injection detection alone cannot answer:

The best guardrails are getting better at inspecting language. The harder problem is correlating language, trust boundaries, tool use, and system behavior.

That is where the real fight in agentic AI security is.

April 2026 evidence: real advisories prove this right now

In the last 48 hours, multiple agent-adjacent advisories dropped with a shared lesson: the control looked present, but the boundary was weak or mis-modeled.

None of those incidents are solved by prompt filtering alone. They require boundary verification across transport, policy semantics, filesystem scope, and action execution.

The mistake I see in security conversations

People ask: "Do we have guardrails?"

The more useful question is:

"Which boundaries can we prove are enforced right now, and which ones are assumed?"

That one wording change matters. A lot.

Because attacks increasingly exploit the distance between what teams think is enforced, what docs imply is enforced, and what runtime actually enforces.

Guardrails as one layer in a five-layer defense

If I were advising a team deploying production agents today, I would structure defenses like this:

defense-layers.yaml
# Five-layer agentic AI security architecture

layer_1: ingestion_filtering
  # Treat docs, manifests, READMEs, MCP metadata,
  # and skill descriptions as attackable input surfaces

layer_2: boundary_assertions_at_startup
  # Verify effective tool availability, origin checks,
  # workspace roots, and egress controls

layer_3: runtime_policy_gates
  # Enforce least privilege for filesystem, network,
  # execution, and connector actions

layer_4: chain_correlation
  # Detect: low-trust input -> high-trust data access
  #         -> outbound action (within one session)

layer_5: drift_detection
  # Continuously compare declared vs observed behavior
  # as tools/skills/config change over time

Guardrails are strongest in layers 1 and 3. You still need layers 2, 4, and 5 to close the loop. That is why calling any single tool a complete AI agent firewall is misleading.

What to measure (so this is not just philosophy)

If your team wants objective proof of security maturity, track these:

metrics
# Agent security maturity metrics

tool_least_privilege:  % of invocations with verified scopes
origin_validation:    % of localhost/private services with Origin tests
chain_provenance:     % of high-risk actions with input tracing
drift_detection_mtd:  mean time to detect policy vs behavior mismatch
boundary_blocks:      blocked trust-boundary crossings per 1K agent tasks

Those metrics reveal whether guardrails are truly integrated into architecture or just bolted on.

Connector configuration: the attack lane teams underestimate

A lot of teams mentally bucket connector fields into "content" (dangerous, inspect this) and "configuration" (safe, just settings). Recent advisories keep breaking that assumption.

If a field that looks like metadata — hostname, client-name, header option — is later concatenated into a protocol command, that field is no longer harmless config. It is a potential execution boundary.

For agent systems this is especially important because LLM-driven workflows often touch connector setup indirectly:

Even if no user is "typing shell commands," the system can still produce exploit-relevant values that cross trust boundaries. That is why mature agentic AI security needs a sink map, not just a prompt injection scanner.

Model artifacts can be the payload, not the prompt

A fresh April 2026 MONAI advisory (GHSA-89gg-p5r5-q6r4) is a useful reminder that modern agent risk does not begin and end with text prompts. Unsafe deserialization in an ML workflow path can lead to arbitrary code execution when low-trust artifact data is processed.

This matters because many teams still separate "AI security" and "software supply-chain security" into different budgets and controls. In practice, agent systems blend them:

If your strategy only governs prompts, you miss this entire class of AI supply chain attack. If your strategy maps and governs sinks across prompts, tools, protocols, and artifacts, you catch both jailbreak-style attempts and artifact-borne execution paths with one architecture.

Incident-to-control mapping: the table that makes this concrete

To avoid abstract debates, map incidents to control failures and required evidence. Here is what that looks like for the patterns we are seeing in 2026:

Incident Pattern Failed Assumption Required Control Evidence Artifact
MCP/connector SSRF via tenant/header/URL fields "Routing metadata is harmless" Destination policy enforcement + allowlisted URL classes Block/allow decision log with parsed destination class
Path traversal in helper utilities safe_join guarantees containment Post-normalization boundary assertion (realpath + root containment) Invocation trace showing pre/post path and containment verdict
Template/prompt injection into execution-adjacent sinks "Prompt layer is separate from execution" Adapter-layer sink integrity checks + structured argument firewall Provenance map from prompt token to sink argument
Event stream/auth parity gaps "If one endpoint is protected, siblings are too" Route-family auth parity tests in CI and startup Route parity report with pass/fail per endpoint
Sandbox/flag claim mismatch "Feature flag implies hard boundary" Declarative-vs-effective runtime conformance tests Boot report proving enforced capabilities match declared policy

This framing helps security teams choose controls based on failure mechanics instead of vendor category labels. None of these rows are addressed by prompt filtering alone.

How to evaluate "guardrails" without buying theater

Enterprise buyers are starting to ask for evidence, not adjectives. "We have guardrails" is a claim. "Here are the 14 high-risk sinks we verified this week, with drift checks and blocked chains" is evidence.

Five questions to ask in every vendor demo:

  1. Show me a blocked chain, not a blocked prompt. Ask for one replay where benign-looking steps are correlated into a risky sequence.
  2. Show sink-level controls. Ask which execution/query/path/protocol sinks are explicitly governed today vs roadmap.
  3. Show drift detection latency. Ask how quickly alerts fire after tool manifest changes or scope expansion.
  4. Show boundary assertion tests. Ask for startup/runtime checks proving effective containment.
  5. Show operator evidence artifacts. Ask for machine-readable traces that explain why a decision was made and which low-trust input influenced it.

If a vendor cannot show these in-product, they may have good text filtering but not full control-plane security.

The category split that actually matters

Guardrail-centric vendors are strongest at language-layer interception. Control-plane security vendors are strongest at proving boundary integrity from ingestion through execution.

Serious enterprise programs need both.

The strategic position for Sunglasses is explicit: yes, language-layer filtering is required. But the defensible moat is verifiable chain control across adapters, sinks, and drift. Not anti-jailbreak theater. Verifiable control of cross-boundary agent behavior.

Want to talk about what this means for your stack? We are building this in the open.

Final take

AI guardrails are real progress. They reduce real risk.

But if they are treated as the whole security story, teams will keep getting surprised by incidents that never looked like classic jailbreak text.

The future of agentic AI security is not a single filter. It is assumption verification across the full path from input to action to impact.

That is what we are building at Sunglasses. If you want to understand where this category is headed, start with our thesis, explore the FAQ, or scan your own agent with the open-source scanner.