The cybersecurity community is currently fixated on data lineage and leakage via LLM prompts. While SandboxAQ’s 2025 AI Security Benchmark Report confirms that 52% of security leaders identify sensitive data egress as their primary concern, this focus addresses a yesterday problem. As a cryptography and security engineer, I look at the underlying building blocks of how these systems interact. The real risk has shifted from what users tell an AI to what autonomous agents are permitted to do.
We are entering the era of shadow operations: The uncontrolled deployment of autonomous agents that execute logic, integrate with systems by calling APIs and modify states without formal security oversight.
What we are hearing directly from security leaders reinforces this shift. Many organizations have already rolled out AI across business units. They are using managed services, embedding AI into workflows and in some cases building their own agents. Yet when asked a simple question about where their agents are and what they are allowed to do or to access, the answer is often uncertain. The visibility gap is not hypothetical. It is a reality.
The rise of the OpenClaw era
We are seeing a trend toward fast adoption of agentic AI frameworks to automate and make certain processes or tasks more efficient. Moreover, open-source projects like Moltbot and the broader OpenClaw movement aim to provide tooling that can be deployed with minimal friction. While these foster innovation, they bypass the traditional “secure-by-design” principles we apply to production code.
In a shadow ops scenario, a well-meaning developer uses an agentic framework to automate a complex workflow, perhaps an Extract, Transform, Load (ETL) process or a cloud deployment script. To make it work quickly, they might grant the agent a high-privilege API key (e.g., an AWS AdministratorAccess or a GitHub Personal Access Token with full code repository scope). The result is a non-deterministic autonomous entity running in a cloud function with the keys to the kingdom, invisible to your Cloud Security Posture Management (CSPM) tools.
The risk is no longer just traditional confidentiality or data security and privacy; it is enterprise-wide operational integrity. The impact shifts from a compliance fine to direct financial loss and a breach of trust in our own technology.
This risk is amplified by how agents are introduced into environments. They are often embedded at the repository level through GitHub actions, API integrations, orchestration layers or model calls buried in application logic. If security teams only begin monitoring once code is deployed, they are starting too late. The moment of risk introduction happens at the pull request, not at runtime.
Why your current security stack is blind to it
Our existing security suite of tools is not built to solve for shadow operations. Standard Data Loss Prevention (DLP) and Identity and Access Management (IAM) solutions are often blind to agentic ephemeral identities. A CSPM might see a legitimate server running a legitimate process, but it doesn’t see the unvetted AI logic calling a third-party resource via a hardcoded API key.
We have a profound visibility gap. You cannot secure what you cannot see, and you cannot see these agents where they are born. If your security view starts when software is already running, you are looking in the wrong place. This is compounded by an increasingly complex supply chain. The recent incident involving OpenAI and its analytics vendor, Mixpanel, serves as a baseline example: A breach in a sub-processor exposed account metadata. With agentic frameworks, the supply chain expands to include every model, plugin, and external tool the agent is permitted to call.
The expansion of the supply chain is particularly significant. Agents do not operate in isolation. They call models, connect to Model Context Protocol (MCP) servers, integrate external plugins and access enterprise systems through APIs. Without a unified inventory that maps which agent is using which model, running on which host and accessing which resources, security teams cannot understand the blast radius.
This is where the concept of an AI Bill of Materials, or AI BOM, becomes operational, not theoretical. An AI BOM is a structured inventory of models, agents, orchestration layers and dependencies embedded within an application or AI system. It should identify managed third-party model calls as well as self-hosted models discovered within repositories or cloud workloads. Without this baseline inventory, governance cannot be enforced.
There is also confusion in the market about what an AI BOM can realistically capture. Some expect it to include complete training data lineage, model versions and dependency chains. In practice, training data transparency varies. Standard models may expose metadata through sources such as model cards, while fine-tuned or internally trained models may not automatically surface that lineage. Security leaders must design controls with that uneven transparency in mind.
Engineering a solution: Visibility as a primal requirement
Countering shadow operations requires evolving our security posture toward shift-left discovery. This means identifying AI assets at the pull-request level, long before they are compiled, deployed or downloaded, and executed. We must move beyond static API keys to a model of contextual least privilege and if an agent is built, its permissions must be strictly scoped to the specific task and continuously monitored for anomalous “behavioral drift.” Given that more than 75% of organizations are already integrating AI, we effectively need policy-driven guardrails that implement automated discovery and monitoring for these shadow operations across the entire infrastructure footprint.
Inventory, however, is only the first step. Visibility must be paired with qualification. Organizations need mechanisms to evaluate model behavior and assign enforceable health criteria. Structured red teaming, adversarial prompt testing and measurable model scoring allow security teams to define policy thresholds. Models that fall below defined integrity or hallucination benchmarks should not be promoted into production environments.
Enforcement must also extend into runtime. Proxy-based guardrails positioned between users and models create a control layer that can inspect prompts and responses in real time. These guardrails can detect malicious instructions, sensitive data exposure, jailbreak attempts or proprietary code leakage based on policy. Without runtime enforcement, governance depends entirely on user discipline.
This is especially relevant for AI coding assistants and agent-to-agent interactions. If developers are using external copilots or SaaS-based coding tools, sensitive source code and credentials may traverse systems outside centralized oversight. Routing traffic through enforceable proxy infrastructure enables logging, inspection and policy-based blocking where required.
The goal for 2026 is not to stifle innovation by blocking these agents, but to bring them under the umbrella of formal governance. We must ensure that the cryptographic identities and operational permissions they carry are as rigorously managed as any other critical piece of our infrastructure. By treating autonomous agents as first-class system actors with distinct, verifiable identities, we can mitigate the risk of integrity failures while allowing engineering teams to leverage the speed and efficiency of the agentic era.
Identity is the connective layer between cryptographic posture and agentic execution. Agents require credentials to access systems. If those credentials are static, overprivileged or manually provisioned, fragility becomes systemic. Just-in-time access and tightly scoped permissions enforced at machine speed are foundational to operational resilience in autonomous environments. Manual IAM workflows cannot scale to agents operating continuously.
The call for 2026: Securing the AI perimeter
The trajectory is set. With over 75% of organizations now reporting the use of AI, the pivot from simple data usage to autonomous execution is the next inevitable phase of infrastructure evolution. The risk is no longer theoretical because the tools are deployed, and the shadow operations attack surface is expanding.
We must expand our definition of AI security beyond data security and privacy to encompass operational resilience. True security cannot rely on monitoring the output; it must start where the AI is built and executed. We require continuous visibility and strict control mechanisms to ensure that agents do not become the vector for systemic disruption.
Operational resilience also requires longitudinal observability. Security posture cannot be a snapshot in time. Organizations must track issue evolution across repositories, model usage trends and configuration changes to maintain a defensible audit trail. Without that historical context, governance cannot adapt to drift.
Market pressure is reinforcing this direction. Structured AI governance artifacts are increasingly tied to regulatory scrutiny and vendor risk requirements, particularly in large financial institutions. Demonstrable inventory and enforceable runtime controls are becoming prerequisites for enterprise trust.
By enforcing strict identity governance and deep visibility now, we can capture the productivity of the agentic era without introducing a hidden layer of fragility into the heart of our enterprise-wide operations.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Read More from This Article: Shadow AI morphs into shadow operations
Source: News

