How poor data foundations can undermine AI success

The promise of AI is immense, but poor-quality data undermines every attempt to derive any value from it. Without the right inputs, AI produces unreliable, incomplete, and even misleading outcomes.

For the average enterprise, data exists in many forms across many systems, says Brian Sathianathan, CTO at Iterate.ai, and integrating structured and unstructured data is harder than most AI pilots account for. “Structured data from operational systems is rarely as tidy as teams are assuming, and unstructured data, like scanned documents and forms, requires a different preparation process before it can be matched and used effectively,” he says, adding this might explain why businesses hit a wall when trying to move beyond POC.

Organizations with impressive POCs typically succeed because they rely on curated datasets, manual workarounds, and tightly controlled environments, says Rhian Letts, head of group technology strategy at Investec. The real challenge lies in converting pilots into reliable, production-grade implementations. Scaling, she adds, requires resilient pipelines, consistent definitions, operational support, and integration into real workflows. It also raises the bar for governance.

“Many data governance frameworks were designed for human-paced consumption,” she says. “AI significantly increases both the speed and volume of data demand and introduces non-human consumers. Governance, therefore, needs to evolve to become more automated, real-time, and explicit about provenance and permissions.”

For Daniel Acton, CTO at technology firm ADG, too many organizations rush to do something with AI without properly analyzing what they actually want to do with it. “AI can be useful, but if you feed AI data that’s incomplete and inaccurate, or if it doesn’t have the data needed to teach the machine to do what you want it to do, the results will be underwhelming,” he says.

Another core issue is a lack of standardized, high-fidelity metadata. “The quality of metadata is the hardest challenge to overcome,” says Brett Pollak, executive director for workplace technology and infrastructure services at UC San Diego. “Metadata is the essential connective tissue that allows an AI agent to interpret a user’s prompt and map it correctly to the intersection of specific columns and rows. Most organizations have unique, institution-specific interpretations of data that are rarely documented properly or kept current.” This creates a translation gap where an agent might have access to the data but lacks the context to understand what a specific field represents in a business context.

Data, data everywhere

Just because obstacles exist, though, doesn’t mean progress needs to pause. “AI use should be aligned to current maturity,” says Letts. “Rather than treating imperfect data as a constraint, organizations can ask how AI might help improve and better connect the data they already have.” Sathianathan agrees, adding that within the new LLM world, even small amounts of accurate data can have significant value. “With traditional machine learning just a few years ago, you needed a lot of data to train models,” he says. “Today, since most LLMs come with highly pre-packaged knowledge, all you need is sufficient amounts of the right data to get it ready for your domain.”

For organizations that have already deployed structured data warehousing, the new barrier is the transition from human-centric storage to machine-actionable delivery, says Pollak. “Readiness now means ensuring your data is wrapped in specific metadata, exposed via modern protocols like MCP servers, and governed by a selective exposure strategy that ensures agents only act on what’s governed,” he says.

Shift your mindset around data

Today, many organizations want to quickly move from data disorder to being data-driven. But if that’s the end goal, CIOs and tech leaders need to be mindful of treating data like a first-class citizen within your organization. As part of this shift, data can no longer be seen as a by-product of business systems, but rather as a core output that should be managed with the same level of care as any other product or service. When this happens, business leaders can unlock insights and value they didn’t know existed.

Also, according to Letts, a use-case-led approach is critical. Trying to fix every dataset across an organization is neither practical nor necessary. Meaningful value can be unlocked even where data is imperfect by focusing on the right use cases. By prioritizing five to 10 high-value use cases and mapping the data required to deliver them in production, it’s easier to focus efforts. Foundations can then be strengthened to serve those priorities.

With AI, the threshold for what’s good enough has lowered for many use cases, particularly those focused on productivity and knowledge work, she adds. AI models can extract value from context and connect dots, even where data isn’t perfectly structured. But higher-stakes use cases demand higher quality and stronger controls. “The key is to be explicit about purpose, risk, and operational dependency,” she says. “Lower-risk use cases can move faster with well-described and well-governed context, while higher-risk applications require tighter thresholds.”

Prioritize ownership, governance, and security

All governance frameworks, policies, standards and procedures should be reviewed with AI in mind, adds Letts. Many were designed for human-paced consumption, whereas AI increases speed, scale, and integration across both structured and unstructured data. So validating ownership of critical data elements and establishing a shared business understanding of their meaning is essential to progress. Standardized definitions and metadata should also ensure questions like what it means and where did it come from can always be answered. “AI access must be secure by default,” she adds. “This means having least privilege, audit trails, handling of sensitive data, and strong controls around retrieval. It should always be demonstrable what a model can and cannot access.”

Additionally, organizations must be mindful of data privacy when using AI, too. “Agentic AI systems require a different level of data access than traditional enterprise apps,” says Sathianathan. “Data needs to be analyzed, not just queried, at scale. That’s a big change to privilege models, and IT and security leaders need to think carefully about where all that data is going and what access the AI system really requires.” The same is true, he adds, if the LLM processing that data is running within or outside an organization’s four walls, and such decisions should be considered before deployment, not after.

Use AI to fill in the gaps

In areas where the business might be falling short, consider using AI to draft and update your organization-specific data definitions, suggests Pollak. “Prioritize establishing a rigorous human-in-the-loop process to ensure this connective tissue is accurate and current.” Additionally, it’s possible to use LLMs and smaller language models to clean up data in certain areas with restrictive prompts, adds Sathianathan. This way, you can process data efficiently and avoid wasting resources by pumping massive amounts of data into large cloud-based LLMs.

Being AI-ready isn’t a one-time milestone, says Letts. AI capabilities are evolving quickly, which means the threshold for readiness shifts over time. It’s essential to improve end-to-end lineage, build shared semantics and ontology so data is consistently understood, increase interoperability across platforms and domains, and tighten how AI systems access data so it remains secure, auditable, and fit for purpose. “Thresholds change as use cases evolve,” she says, “so data readiness must be treated as an ongoing discipline rather than a completed task.”

Read More from This Article: How poor data foundations can undermine AI success
Source: News

How poor data foundations can undermine AI success

Data, data everywhere

Shift your mindset around data

Prioritize ownership, governance, and security

Use AI to fill in the gaps

Related posts