Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

When it comes to AI, not all data is created equal

Gen AI is becoming a disruptive influence on nearly every industry, but using the best AI models and tools isn’t enough. Everybody’s using the same ones but what really creates competitive advantage is being able to train and fine-tune your own models, or provide unique context to them, and that requires data.

Your company’s extensive code base, documentation, and change logs? That’s data for your coding agents. Your library of past proposals and contracts? Data for your writing assistants. Your customer databases and support tickets? Data for your customer service chatbot.

But just because all this data exists, doesn’t mean it’s good.

“It’s so easy to point your models to any data that’s available,” says Manju Naglapur, SVP and GM of cloud, applications, and infrastructure solutions at Unisys. “For the past three years, we’ve seen this mistake made over and over again. The old adage garbage in, garbage out still holds true.”

According to a Boston Consulting Group survey released in September, 68% of 1,250 senior AI decision makers said the lack of access to high-quality data was a key challenge when it came to adopting AI. Other recent research confirms this. In an October Cisco survey of over 8,000 AI leaders, only 35% of companies have clean, centralized data with real-time integration for AI agents. And by 2027, according to IDC, companies that don’t prioritize high-quality, AI-ready data will struggle scaling gen AI and agentic solutions, resulting in a 15% productivity loss.

Losing track of the semantics

Another problem using data that’s all lumped together is that the semantic layer gets confused. When data comes from multiple sources, the same type of information can be defined and structured in many ways. And as the number of data sources proliferates due to new projects or new acquisitions, the challenge increases. Even just keeping track of customers — the most critical data type — and basic data issues are difficult for many companies.

Dun & Bradstreet reported last year that more than half of organizations surveyed have concerns about the trustworthiness and quality of the data they’re leveraging for AI. For example, in the financial services sector, 52% of companies say AI projects have failed because of poor data. And for 44%, data quality is their biggest concern for 2026, second only to cybersecurity, based on a survey of over 2,000 industry professionals released in December.

Having multiple conflicting data standards is a challenge for everybody, says Eamonn O’Neill, CTO at Lemongrass, a cloud consultancy.

“Every mismatch is a risk,” he says. “But humans figure out ways around it.”

AI can also be configured to do something similar, he adds, if you understand what the challenge is, and dedicate time and effort to address it. Even if the data is clean, a company should still go through a semantic mapping exercise. And if the data isn’t perfect, it’ll take time to tidy it up.

“Take a use case with a small amount of data and get it right,” he says. “That’s feasible. And then you expand. That’s what successful adoption looks like.”

Unmanaged and unstructured

Another mistake companies make when connecting AI to company information is to point AI at unstructured data sources, says O’Neill. And, yes, LLMs are very good at reading unstructured data and making sense of text and images. The problem is not all documents are worthy of the AI’s attention.

Documents could be out of date, for example. Or they could be early versions of documents that haven’t been edited yet, or that have mistakes in them.

“People see this all the time,” he says. “We connect your OneDrive or your file storage to a chatbot, and suddenly it can’t tell the difference between ‘version 2’ and ‘version 2 final.’”

It’s very difficult for human users to maintain proper version control, he adds. “Microsoft can handle the different versions for you, but people still do ‘save as’ and you end up with a plethora of unstructured data,” O’Neill says.

Losing track of security

When CIOs typically think of security as it relates to AI systems, they might consider guardrails on the models, or protections around the training data and the data used for RAG embeddings. But as chatbot-based AI evolves into agentic AI, the security problems get more complex.

Say for example there’s a database of employee salaries. If an employee has a question about their salary and asks an AI chatbot embedded into their AI portal, the RAG embedding approach would be to collect only the relevant data from the database using traditional code, embed it into the prompt, then send the query off to the AI. The AI only sees the information it’s allowed to see and the traditional, deterministic software stack handles the problem of keeping the rest of the employee data secure.

But when the system evolves into an agentic one, the AI agents can query the databases autonomously via MCP servers, and since they need to be able to answer questions from any employee, they require access to all employee data, and keeping it from getting into the wrong hands becomes a big task.

According to the Cisco survey, only 27% of companies have dynamic and detailed access controls for AI systems, and fewer than half feel confident in safeguarding sensitive data or preventing unauthorized access.

And the situation gets even more complicated if all the data is collected into a data lake, says O’Neill.

“If you’ve put in data from lots of different sources, each of those individual sources might have its own security model,” he says. “When you pile it all into block storage, you lose that granularity of control.”

Trying to add the security layer in after the fact can be difficult. The solution, he says, is to go directly to the original data sources and skip the data lake entirely.

“It was about keeping history forever because storage was so cheap, and machine learning could see patterns over time and trends,” he says. “Plus, cross-disciplinary patterns could be spotted if you mix data from different sources.”

In general, data access changes dramatically when instead of humans, AI agents are involved, says Doug Gilbert, CIO and CDO at Sutherland Global, a digital transformation consultancy.

“With humans, there’s a tremendous amount of security that lives around the human,” he says. “For example, most user interfaces have been written so if it’s a number-only field, you can’t put a letter in there. But once you put in an AI, all that’s gone. It’s a raw back door into your systems.”

The speed trap

But the number-one mistake Gilbert sees CIOs making is they simply move too fast. “This is why most projects fail,” he says. “There’s such a race for speed.”

Too often, CIOs look at data issues as slowdowns, but all those things are massive risks, he adds. “A lot of people doing AI projects are going to get audited and they’ll have to stop and re-do everything,” he says.

So getting the data right isn’t a slowdown. “When you put the proper infrastructure in place, then you speed through your innovation, you pass audits, and you have compliance,” he says.

Another area that might feel like an unnecessary waste of time is testing. It’s not always a good strategy to move fast, break things, and then fix them later on after deployment.

“What’s the cost of a mistake that moves at the speed of light?” he asks. “I would always go to testing first. It’s amazing how many products we see that are pushed to market without any testing.”

Putting AI to work to fix the data

The lack of quality data might feel like a hopeless problem that’s only going to get worse as AI use cases expand.

In an October AvePoint report based on a survey of 775 global business leaders, 81% of organizations have already delayed deployment of AI assistants due to data management or data security issues, with an average delay of six months.

Meanwhile, not only the number of AI projects continues to grow but also the amount of data. Nearly 52% of respondents also said their companies were managing more than 500 petabytes of data, up from just 41% a year ago.

But Unisys’ Naglapur says it’s going to become easier to get a 360-degree view of a customer, and to clean up and reconcile other data sources, because of AI.

“This is the paradox,” he says. “AI will help with everything. If you think about a digital transformation that would take three years, you can do it now in 12 to 18 months with AI.” The tools are getting closer to reality, and they’ll accelerate the pace of change, he says.


Read More from This Article: When it comes to AI, not all data is created equal
Source: News

Category: NewsJanuary 14, 2026
Tags: art

Post navigation

PreviousPrevious post:Beyond the hype: 4 critical misconceptions derailing enterprise AI adoptionNextNext post:“전기료 인상, 주민에 떠넘기지 않겠다” MS, AI 데이터센터 지역 상생 원칙 제시

Related posts

샤오미, MIT 라이선스 ‘미모 V2.5’ 공개···장시간 실행 AI 에이전트 시장 겨냥
April 29, 2026
SAS makes AI governance the centerpiece of its agent strategy
April 29, 2026
The boardroom divide: Why cyber resilience is a cultural asset
April 28, 2026
Samsung Galaxy AI for business: Productivity meets security
April 28, 2026
Startup tackles knowledge graphs to improve AI accuracy
April 28, 2026
AI won’t fix your data problems. Data engineering will
April 28, 2026
Recent Posts
  • 샤오미, MIT 라이선스 ‘미모 V2.5’ 공개···장시간 실행 AI 에이전트 시장 겨냥
  • SAS makes AI governance the centerpiece of its agent strategy
  • The boardroom divide: Why cyber resilience is a cultural asset
  • Samsung Galaxy AI for business: Productivity meets security
  • Startup tackles knowledge graphs to improve AI accuracy
Recent Comments
    Archives
    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.