Your AI cloud strategy isn’t about cost. It’s about gravity

I’ve spent the better part of the last eighteen months in conference rooms with CIOs working through their AI strategy. The conversations all start in the same place — model selection, vendor evaluation, agent frameworks — and they all eventually arrive at the same uncomfortable question.

“Where is this actually going to run?”

The question lands awkwardly because it sounds like it should have been settled years ago. Most enterprises picked their cloud provider somewhere between 2015 and 2020. They standardized on AWS, Azure or GCP, signed multi-year commits, and built their application portfolio accordingly. The cloud strategy was done. So why is it suddenly back on the table?

Because the workload changed underneath it. The cloud strategy that made sense for stateless web applications doesn’t make sense for AI agents and the CIOs figuring this out fastest are the ones rebuilding their architecture around a constraint most of their procurement teams don’t even know exists yet.

The old cloud calculus is broken

For roughly a decade, cloud strategy was about where applications run. You optimized for compute price, developer velocity and managed services. Data was something you moved to where the apps were. This worked because the data-to-compute ratio was small. A typical application request moved kilobytes of structured data between the app and its database.

The architectural pattern that emerged was elegant in its simplicity: applications in one region, data in another, users somewhere else entirely and the network in between papered over the seams. Latency budgets were measured in user-perceptible terms — a 200ms page load was acceptable, a 500ms one was a problem. Cross-region calls were a tax you paid for resilience or for putting compute close to the user.

That entire model assumed the application was the thing doing the work, and the data was the thing being acted upon.

AI agents inverted that assumption.

AI inverted the ratio

Agents don’t just consume data. They live in it.

Memory, context, retrieval, embeddings — the data isn’t an input to the workload. It largely is the workload. An agent reasoning about a customer’s situation is pulling in conversation history, organizational policies, product documentation and structured records on every turn. An agent writing code is pulling in the repository, the architectural decision records, the test suite and the relevant runtime telemetry. An agent doing financial analysis is pulling in market data, internal forecasts, regulatory filings and historical context — and then producing intermediate results that feed back into the next reasoning step.

The data isn’t a thing the workload references occasionally. It’s the substrate the workload is computing on.

And that substrate has gravity.

It has regulatory gravity — sovereignty mandates, residency requirements, sector-specific compliance regimes that say data of a particular type cannot leave a particular jurisdiction. The EU AI Act, HIPAA, financial services regulations across a dozen countries — these aren’t preferences. They’re constraints that determine, before you’ve made any architectural decisions at all, where some of your data is allowed to be.

It has economic gravity — egress fees, GPU-hour pricing differentials, the brute economics of moving terabyte-scale corpora across cloud boundaries. Training data and embedding stores aren’t gigabytes anymore. Moving them isn’t a config change. It’s a project, sometimes a quarter-long one, with a real bill attached.

It has incumbency gravity — the data is where it is, and moving petabytes is not on this year’s roadmap. Most enterprises have data sprawled across systems that were never designed to be portable. The fact that your customer records live in a particular cloud isn’t because someone made a strategic decision in 2026. It’s because they made a strategic decision in 2017 and the data has been accumulating there ever since.

And it has latency gravity — and this is the one that’s quietly rewriting the architecture for everyone.

Wall time is the forcing function

Here’s the math that nobody puts in their slide decks.

A modest agentic loop (retrieve, reason, act, observe) easily does five to ten round trips per task. The agent retrieves relevant context. Reasons about it. Calls a tool. Observes the result. Reasons about that. Retrieves more context. Acts again. Each of those steps touches the data layer, the memory store, the model and back.

Now put 50 milliseconds of cross-region network latency on each hop. That’s 250 to 500 milliseconds of pure network tax on every single agent task, on top of the actual model inference and tool execution. Run that loop a hundred times an hour, across thousands of concurrent agent sessions, and you’re not looking at a minor degradation. You’re looking at the difference between an agent that feels alive and an agent that feels like dial-up.

This is why I keep telling CIOs the same thing in those conference rooms: your data, your memory store, your models and your agent runtime need to be in the same physical datacenter. Period.

Whether that physical datacenter is yours or one of the hyperscalers’ is the actual question worth debating. But they have to be co-located. If you’re spreading these across regions or providers to chase a procurement discount, you’re sabotaging your own AI strategy before it ships.

I want to head off two objections before the comments section gets to them.

“What about agents that legitimately need to span regions? Say, a global customer service agent that needs to retrieve from regional data stores?”

Those aren’t really one agent. They’re a federation of regional agents with a routing layer on top, and the wall-time math applies within each region. The federation is the architecture. Pretending it’s one agent stretched across geographies is how you end up with the dial-up problem.

“What about hyperscaler private connectivity? Direct Connect, ExpressRoute. That gets cross-region latency down to single-digit milliseconds?”

Single-digit milliseconds still compounds across an agentic loop more than it did for human-driven activity. Five hops at 5ms are 25ms of network tax per task, which adds up across millions of tasks.

And private connectivity doesn’t solve the other gravities. It doesn’t make data residency mandates go away. It doesn’t change egress economics for the data itself. It just makes a single dimension of the problem somewhat better.

The constraint is physics, not procurement. You can’t negotiate with the speed of light.

That’s why the cloud market fragmented

Once you accept that agents have to run physically next to their data, memory and models, the recent fragmentation of the AI cloud market starts to make sense.

Sovereign clouds aren’t winning on patriotism. They’re winning where regulatory gravity dominates and the data is already on a particular side of a particular border. Neoclouds aren’t winning on a vibe shift. They’re winning where economic gravity dominates and GPU-hour pricing makes the math work. Private clouds aren’t winning because on-prem is back in fashion. They’re winning where incumbency gravity dominates and the data is already in your datacenter and isn’t going anywhere. Hyperscalers are still winning where developer gravity and managed services dominate, and where the data is already in their object storage from a decade of cloud migration.

These aren’t competing on the old dimensions. They’re each winning in scenarios where a different gravity is the binding constraint.

The right question isn’t which cloud you should pick. It’s which gravity dominates for each workload, and therefore where the whole stack (data, memory, model, agent runtime) needs to be co-located. Some agents will run in three places. Some agents will need to move between them. That’s why deployment flexibility matters more than it ever did when we were just running stateless apps.

What CIOs should actually do this quarter

Stop picking a cloud. Start mapping your agent portfolio against the four gravities and let the architecture fall out of that.

For each AI workload you’re planning to put into production over the next twelve months, work through four questions:

Where does the data live, or where is it going to end up? Not where you wish it lived. Where it actually is, or where regulatory or business reality is forcing it to be. This is the answer that constrains everything else.
Which gravity is dominant? If regulatory mandates are non-negotiable, that’s your binding constraint. If GPU economics are the issue, that’s your binding constraint. If you have ten petabytes of historical data sitting in a particular cloud and moving it is a multi-year project, that’s your binding constraint.
What’s the wall-time budget for the agent loop? If it’s a batch workload, you have flexibility. If it’s a real-time customer-facing agent, you need everything in the same datacenter and you need to design for it from day one.
What’s the portability requirement? As model providers compete and pricing shifts, can you move the agent runtime without moving the data? Can you move the data without rewriting the agent? Lock-in used to be denominated in egress fees. Now it’s denominated in token pricing, embedding model compatibility and agent framework portability.

The CIOs who get this wrong won’t lose because they chose the wrong cloud. They’ll lose because they chose a cloud. Singular, monolithic, picked once in 2019 when the right answer was a portfolio architected around the gravities of each workload.

Cloud strategy stopped being a procurement decision the day agents became the workload. It became a physics problem. And the physics doesn’t care which vendor you signed with.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Read More from This Article: Your AI cloud strategy isn’t about cost. It’s about gravity
Source: News