Your data left the building. Did anyone notice?

The question nobody is asking loudly enough

I keep hearing the same AI conversation everywhere I go. Better models, faster inference, more capable agents. The race is on and everyone wants in.

But something is missing.

Most organizations I work with have already moved past experimentation. AI is embedded in workflows, shaping customer interactions, processing internal documents, informing operational decisions. The question of whether AI works has largely been answered. What has not been answered, in most cases, is something far more basic.

Where does our data actually go when it flows through an LLM? Who can access it? Under which jurisdiction is it processed? Could it end up improving someone else’s model?

These are not hypothetical concerns. These are the questions that surface when a regulator asks how your organization handles personal data, when a client wants to know what happens to the documents they share with your AI-powered service, or when a board member reads about a policy change at one of the major AI providers and wants to know what it means for the business.

In my experience, most organizations cannot answer these questions clearly. Not because they do not care, but because the adoption moved faster than the governance. Teams were encouraged to experiment, pilots became production and somewhere along the way, the data conversation got left behind.

This is not a fringe problem. Recent industry data suggests that most enterprise leaders are now actively redesigning their data architectures, not because the AI did not work, but because the way it was connected to their data became a liability.

The capability conversation has dominated for the last two years. I think the next two years will be defined by a different question entirely: Not what can AI do, but who controls what it knows?

Your data is already travelling further than you think

When I talk to CIOs about AI risk, the conversation almost always starts with model accuracy, hallucinations or bias. Rarely does anyone open with: “Do we actually know where our data goes when someone on the team uses an LLM?”

That question matters more than most people realise. Not because of some hypothetical future breach, but because right now, most organizations are operating across a mix of LLM tiers and tools with no unified picture of what data is going where or under what terms.

OpenAI, Anthropic and Google all operate a two-tier system. At the enterprise and API level, based on publicly available policies, the commitments are clear: Your data is not used for model training. But those protections only apply if everyone in your organization is using the enterprise tier. In practice, that is almost never the case.

Teams sign up for free accounts to test things quickly. Employees paste internal documents into consumer-tier tools because it is faster than raising a ticket. Contractors use personal subscriptions for client work. None of this is malicious. All of it is invisible to leadership.

And the consumer tiers operate under very different rules. OpenAI’s consumer ChatGPT may use conversations for model improvement unless the user opts out. Google’s free Gemini tier works similarly. In September 2025, Anthropic introduced changes to its consumer terms: Conversations are now eligible for training by default, with data retention extending from 30 days to up to five years.

This is the shadow AI problem. Corporate data entering consumer-tier systems where it may be retained for extended periods and processed under terms nobody in the organization approved. Not because anyone made a bad decision, but because no one made a deliberate one.

When a regulator in Riyadh asks how your organization handles personal data processed through an LLM, or a client in Doha wants to know where their documents went after your team used AI to summarise them, “we think we are on the enterprise tier” is not a defensible answer. The problem is not that something has gone wrong. It is that most organizations could not prove things are going right.

The sovereignty map is more complicated than people think

Most conversations about data sovereignty still default to one question: Where is the data stored? In the context of AI, that is not enough.

I work across the UK, the Gulf and Europe. Each region is moving toward stronger data protection, but they are getting there differently, at different speeds and with different expectations. For any organization operating across borders, that creates real tension.

In Europe, GDPR set the foundation and the EU AI Act is raising the bar further. In Saudi Arabia, the PDPL is no longer a paper exercise. SDAIA issued 48 enforcement decisions in 2025 and published cross-border transfer rules requiring a four-step risk assessment before personal data leaves the Kingdom. In Qatar, the PDPPL has been in place since 2016, but enforcement was historically light. That changed in late 2024, with the National Data Privacy Office now issuing binding decisions against organizations found in violation.

Now add the LLM layer.

When an organization sends data through a cloud-based LLM, the question is not just where the data is stored. It is where the data is processed at inference time. Your infrastructure might sit in Riyadh, but if the model processes your prompt on a server in another jurisdiction, most legal frameworks would say sovereignty has not been preserved.

And as organizations move toward agentic AI, this gets harder still. Agents do not respond to a single prompt. They retrieve context from multiple sources, call external tools and chain decisions across systems. Each step is a potential jurisdiction question and a potential compliance gap that nobody mapped.

Sovereignty is not just geography. It has at least four dimensions: Where data and compute reside, who manages them, who owns the underlying technology and who governs it. Most organizations are only thinking about the first one.

The real trade-off: Pay to keep your data or pay with your data

Once an organization recognises the sovereignty problem, the natural instinct is to bring everything in-house. Run your own models, keep your data on your own infrastructure, remove the dependency on external providers entirely.

That instinct is understandable. It is also expensive.

Local models like Llama and Mistral give you full control. No data leaves your boundary. No third-party terms to worry about. No inference happening in a jurisdiction you did not choose. On paper, it solves the problem.

In practice, a production-grade on-premise deployment for a 70 billion parameter model costs anywhere from $40,000 to $190,000 in hardware alone. Self-hosting only becomes cost-effective if you are processing above roughly two million tokens per day. Below that, the API is cheaper. On top of the hardware, you need the talent to deploy, fine-tune, secure, patch and maintain these systems over time. That is not a one-off cost. It is an ongoing operational commitment that most organizations underestimate.

And there is a capability gap. The frontier models, the ones that perform best on complex reasoning, coding, analysis and multi-step tasks, are not available for self-hosting. If your use case demands the best available performance, you are using an API. That means your data is leaving your boundary, processed under someone else’s terms, in someone else’s infrastructure.

So, the trade-off is real. At the extremes, you are either paying serious money to keep your data close, or you are paying with your data by accepting terms you may not fully understand. Most organizations sit somewhere in between, but very few have made that choice deliberately. It happened by default. Someone picked a tool, someone else signed up for an account, a pilot became production and suddenly the organization is operating across a patchwork of tiers, agreements and jurisdictions that nobody designed and nobody fully controls.

This is not a technology decision. It is a strategic one. And it belongs in the boardroom, not buried in an IT procurement process.

The market is already restructuring around sovereignty

If you want to know where enterprise AI is heading, follow the money.

The sovereign cloud market is projected to grow from $154 billion in 2025 to over $800 billion by 2032. That is not a forecast driven by hype. It is driven by enterprise buyers telling their providers: We need to control where our data lives and how it is processed.

The response has been significant. Microsoft launched Foundry Local, which lets organizations run large AI models on their own hardware in fully disconnected environments, and committed to processing Copilot interactions in-country for 15 nations by the end of 2026. Google and Oracle are pushing a model where AI services move to where the data lives rather than the other way around, deploying their cloud stacks inside customer infrastructure and sovereign regions.

These are not experimental initiatives. They are multi-billion-dollar structural shifts. And they tell me something important: The providers are not leading this conversation. They are responding to it.

But it is worth being honest about what sovereign offerings deliver today. They come with cost premiums, longer deployment timelines and in some cases a reduced feature set. The trade-off does not disappear. It changes shape. CIOs still need to understand what sovereignty means for their specific context, not just trust that a sovereign label on a cloud product solves the problem.

What CIOs should be doing now

If I were advising a CIO today, I would not start with tools or vendors. I would start with visibility.

Know exactly what data flows through which LLM and under what terms. Not at the contract level, at the actual usage level. Which teams are using which tools? Are they on consumer or enterprise tiers? Who approved the terms? If you cannot answer those questions today, that is the first problem to solve.

Map your data exposure against every jurisdiction you operate in, and do not stop at storage. Understand where inference happens. Understand where context is retrieved from. If you are operating across the EU, Saudi Arabia and Qatar, those are three different regulatory frameworks with three different enforcement postures, and the LLM layer touches all of them.

Audit for shadow AI. Not as a one-off exercise, but as a recurring part of your governance. Employees are not going to stop using AI tools. The goal is not to block adoption. It is to make sure adoption happens on terms the organization has chosen deliberately.

Do not default to local models out of fear or cloud models out of convenience. Make the trade-off intentionally, with real cost and capability analysis behind it. Understand what you gain and what you give up in each direction and make sure that decision is documented and owned at the right level.

Build procurement frameworks that treat LLM data handling as a first-class requirement. Not a footnote in a vendor assessment, but a core criterion alongside security, resilience and performance. If a provider cannot clearly explain what happens to your data, that is not a gap in their documentation. It is a gap in their offering.

The readiness gap is real. 95% of enterprise leaders say they plan to build sovereign AI foundations. Based on current research, only 13% are on track. The organizations that close that gap first will scale faster, win more trust and defend their choices with confidence. The rest will have the conversation forced on them.

From AI capability to AI sovereignty

For the last couple of years, the focus has been on what AI can do. Bigger models, faster outputs, more automation. That progress is real and I do not think anyone should slow down.

But I think the next phase will be defined by something different. Not capability, but control. Not what the model can do, but whether you can prove you know where your data went, who had access to it, what terms governed it and what happens to it next.

CIOs will not be judged on whether they adopted AI. They will be judged on whether they adopted it in a way they can defend. To a regulator. To a client. To a board. In plain language, with evidence they can stand behind.

In my first article for CIO Network, I argued that explainability is the control layer that makes AI safe to scale. Data sovereignty is the other half of that equation. Explainability answers “why did the system do that?” Sovereignty answers “where did the data go and who controls it?”

If you can answer both, you can scale with confidence. If you cannot, you are building on a foundation you do not fully own.

And once that foundation is questioned, it is very difficult to rebuild.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Read More from This Article: Your data left the building. Did anyone notice?
Source: News