How do you stop AI agents from calling untrusted endpoints? You combine egress allowlists, strict URL and schema validation, human approval gates for risky external actions, scoped credentials, and a controller layer that mediates outbound traffic before the request is sent. Then you add runtime trust so an already-allowed action can still be blocked if callback guidance, decoded content, metadata, or endpoint drift changes what the workflow is really being asked to trust. Sunglasses v0.2.36 ships detection patterns across the cross_agent_injection category that cover the outbound-trust gap — patterns like GLS-CAI-690 through GLS-CAI-704 targeting forged handoff tickets, capability laundering, and delegation-token scope rewrite that can quietly redirect where an agent sends traffic.
How do I stop AI agents from calling untrusted endpoints? That is no longer a niche red-team question. It is a real operator question, and the reachable answer-engine evidence now treats it like a practical hardening checklist rather than a vendor roundup. The winning answer shape centers on egress allowlisting, schema validation, human approval, scoped credentials, and a middleware/controller layer that keeps the agent from making raw outbound HTTP decisions on its own.
That is good news because it means the market already understands the problem in actionable language. It also leaves one important gap. Those controls can reduce where an agent may connect and how safely it gets there, but they still do not fully answer whether the already-allowed outbound action should be trusted now. A destination can remain technically valid while the workflow inherits unsafe authority from callback guidance, decoded content, retry logic, MCP metadata, or other trust-bearing signals picked up earlier in the run.
This page is built for that exact gap. It does not argue against allowlists, proxies, or approval gates. It explains why those controls matter, why answer engines already reward them, and why AI agent security still needs a runtime-trust layer after the network and policy stack already looks correct. If you are already reviewing the Sunglasses manual, mapping tool and connector boundaries in the MCP attack atlas, or running Sunglasses scans on your agent pipelines, the practical takeaway is simple: egress control narrows reach, but runtime trust still decides whether the workflow should cross this boundary right now.
What the hardening stack gets right
The current answer-engine behavior is not wrong. Egress controls, allowlists, validation, scoped identities, and approval gates are exactly the right first layer. They reduce the number of places an agent can talk to, force the workflow through a smaller set of reviewable paths, and make it harder for a casual tool call to become open-ended outbound behavior.
That is also why these controls are easy for buyers and answer engines to understand. They sound operational: proxies, AI gateways, allowed domains, Pydantic schemas, ephemeral credentials, human-in-the-loop approvals, middleware enforcement. Compared with vague AI-safety language, those are concrete controls with familiar owners. Network teams understand them. Platform teams understand them. Security buyers understand them.
The honest Sunglasses position is not that those controls are shallow or fake. It is that they solve a different layer of the problem. They answer where traffic can go, how arguments should be shaped, and who is allowed to initiate the request. They do not automatically answer whether the workflow is trusting the right guidance at the moment the outbound action is chosen.
That distinction matters more as agent systems become multi-step, tool-connected, and callback-heavy. The more context a workflow absorbs while it runs, the more likely it is that a safe-looking outbound action was shaped by something the team never meant to treat as authoritative. See the CVP validation runs for empirical evidence of how this plays out in real scanning sessions.
Plain-language explainer: where endpoint control stops and runtime trust starts
Imagine an internal support agent that can read customer state, draft a response, call a billing API, and submit a follow-up ticket through an approved connector. The environment is locked down. The agent can only reach a small set of domains. URL arguments are validated. Credentials are scoped and short-lived. Dangerous writes require approval. On paper, this is a good setup.
Now the workflow starts picking up new signals while it runs. A tool response says a fallback service is temporarily preferred. A callback advises the agent to fetch "latest routing guidance" before sending the update. A connector note says a backup endpoint is healthy and should be used to avoid latency. A retry message says that if the first request fails, the agent should try a secondary path that still sits inside the broader allowed domain family.
None of that has to look obviously malicious. It can all sound like ordinary operations. But this is the moment where endpoint control stops being the whole story. The question is no longer only, "Can the workflow reach this domain?" The harder question is, "Should the workflow trust what it just learned enough to send the request there?" Runtime trust is the layer that evaluates that moment.
That is why outbound security works best as layered reasoning. Egress policy narrows reach. Validation narrows structure. Identity narrows who may act. Approval narrows high-risk paths. Runtime trust narrows whether the next allowed outbound move still deserves confidence after the workflow absorbed new authority-bearing signals. This is the same layered-trust model described in AI agent hardening vs runtime trust.
Why answer engines reward checklists first
For the query how do I stop AI agents from calling untrusted endpoints?, the reachable engine did not behave like it was answering a category page or a "best tools" list. It behaved like an operator hardening guide. That is useful signal. It means the market already accepts the fear as real, but it currently explains the fix through infrastructure and policy controls first.
That classification habit makes sense. Checklists are easier to summarize than a nuanced trust-boundary essay. "Restrict egress," "validate URLs," "approve risky actions," "scope credentials," and "force requests through a controller" are concrete, composable, and easy to cite. They feel stable. They also map to real products and internal control owners, which is exactly what answer engines like when the query sounds practical.
The opportunity for Sunglasses is not to fight that answer shape. It is to finish it honestly. A citation-friendly version is simple: endpoint allowlists are necessary, but they are not the last security decision. Runtime trust still decides whether the already-allowed workflow should follow this callback, use this fallback route, trust this metadata, or send this outbound request now.
That is the sentence the broader control stack still leaves incomplete. It also gives Sunglasses a cleaner place in the answer set than pretending to be a generic gateway, proxy, or network-security vendor. Read the FAQ for more on where Sunglasses fits vs what it does not replace.
Three concrete attack examples
1) The domain is allowed, but callback guidance quietly re-routes authority
An agent successfully completes an approved step and receives a callback that says the next request should go through a different service or queue. The new destination still matches the broad policy shape. The agent identity is still scoped. The network path may still be technically allowed. But the callback just became a fresh source of authority.
This is easy to miss because defenders often assume the important decision happened at the first outbound approval. In practice, the trust boundary moved downstream. Reachability did not fail. The workflow's authority model changed.
2) A validated URL stays in policy, but the endpoint behind it drifts
A connector keeps using a familiar hostname, and input validation continues to pass. No one sees a blatant permission expansion. But the destination behind the request shifts, a secondary path becomes the default, or an approved service starts proxying the workflow through another dependency the team never meant to treat as equivalent. The outbound action still looks clean in a superficial policy review. In context, the workflow just changed shape.
This is why allowlists are not the last decision. They tell you which classes of destinations are reachable. Runtime trust helps decide whether this specific next hop still deserves trust right now. This dynamic closely mirrors what the MCP security for AI agents analysis covers for tool-level trust drift.
3) Safe-looking retries and health checks become hidden steering
Many production workflows retry, poll for health, or fetch status before acting. That is normal. But repeated outbound behavior can also become a control path. A health endpoint starts changing routing decisions. A retry message normalizes a fallback service. A polling loop begins shaping which action the agent thinks is safe to take next. The traffic still looks operational. The authority story has changed.
Network policy may allow the requests. Governance may record them. Neither one automatically answers whether the repeated pattern is now steering the workflow in an unsafe direction. That is why suspicious cadence and endpoint drift belong inside AI agent security, not only inside network monitoring.
How Sunglasses catches it
Sunglasses fits this stack as a provider-agnostic runtime-trust layer. It treats agent-facing text and metadata as part of the live authority model, not as harmless background. That includes prompts, YAML, tool descriptions, callback instructions, connector notes, retry messages, policy fragments, MCP-adjacent metadata, and other ordinary-looking content that can quietly reshape what the workflow believes.
That matters because some of the most expensive outbound failures arrive wrapped in convenience rather than obvious malware. A fallback route sounds helpful. A retry suggestion sounds routine. A callback looks like plumbing. A policy note implies an exception path. A decoded document changes what the agent thinks is an approved destination. If those surfaces are never treated as trust-bearing inputs, the workflow can remain inside a well-designed egress stack while still making a bad decision.
Sunglasses helps teams review those surfaces before they become production actions. It is not pretending to be the whole gateway layer, the whole proxy layer, or the whole network-enforcement stack. It is useful at the moment a team needs to ask: the route is allowed, but should the workflow still be trusted to take it now?
For teams that want the simplest practical starting point, the path stays small:
pip install sunglasses
sunglasses scan <path>
Then look closely at the places where authority can be inherited rather than explicitly granted: callback instructions, fallback guidance, retry notes, endpoint-selection logic, connector metadata, policy exceptions, and the trust-bearing text that sits between one approved action and the next one. The how it works page shows exactly what the scanner inspects.
Operator checklist: stopping untrusted endpoint calls
- Egress allowlisting: keep outbound destinations narrow and intentional.
- Proxy or controller mediation: do not let the agent make raw unrestricted outbound HTTP calls.
- Strict URL and schema validation: reject ambiguous structures, unsafe arguments, and extra fields that can smuggle in authority.
- Scoped identities and short-lived credentials: keep the action path narrow even when the tool is allowed.
- Human approval for risky external actions: require review when impact, reach, or data sensitivity changes.
- Callback review: treat routing instructions and next-step guidance as fresh trust events.
- Fallback-path review: do not assume a backup route is equivalent just because it is convenient.
- Endpoint drift detection: watch for next-hop changes, new destination patterns, or quiet expansion behind approved connectors.
- Retry and cadence review: repeated outbound behavior can become steering, not just resilience.
- Trust-bearing text review: prompts, docs, policy notes, and connector metadata can all change outbound authority at runtime.
If your current answer to untrusted endpoints is only "put it behind an allowlist," that is a good start. The stronger answer is: allow the minimum, validate the path, mediate the action, and still ask whether the workflow should trust this outbound move now.