Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Thanks to AI, the data reckoning has arrived

In the race to build the smartest LLM, the rallying cry has been “more data!” That same mantra has made its way to company boardrooms, too. As businesses hurry to harness AI to gain a competitive edge, finding and using as much company data as possible may feel like the most reasonable approach.

After all, if more data leads to better LLMs, shouldn’t the same be true for AI business solutions?

The short answer is no. A mad rush to throw data at AI is shortsighted. Instead, your business needs to understand the challenges of existing data and the steps needed to ensure you have and use good data to power your AI solutions. The data reckoning has arrived, and you must reckon not only with how much data you use, but also with the quality of that data.

The urgency of now

The rise of artificial intelligence has forced businesses to think much more about how they store, maintain, and use large quantities of data. One of the realities businesses quickly face when implementing AI solutions is that once data is used in an LLM or SLM, there is no going back.

Traditionally, companies struggling with large amounts of data used data lakes to store and process it. While the data was stored, there was often no significant management of sources, recent updates, and other key governance measures to ensure data integrity.

That approach to data storage is a problem for enterprises today because if they use outdated or inaccurate data to train an LLM, those errors get baked into the model. The consequence is not hallucinating—the model is working properly—instead, the data training the model is wrong.

Equally concerning, since the data is within the LLM’s black box, will anyone even know that the answer is wrong? If users have nothing else to compare the answer to, they often just take the answer for granted. This example drives home that we may need more data to power AI, but not if the data is wrong.

Today’s challenges

There are several major challenges with business data today:

1. Provenance

Housing mass amounts of data in data lakes has caused much uncertainty about enterprise data. Who created this data? Where did it come from? When was it last updated? Is it a trusted source? Knowing the lineage of a dataset is a crucial first step in trusting and using the data with confidence.

2. Data classification

As data gets housed in data lakes and other increasingly connected ways, another challenge is classification. Who is allowed to look at particular data? From government security classifications to confidential HR information, data shouldn’t be accessible to everyone. Data must be properly classified, and those categories and the limits they entail must be maintained and live on as companies integrate and harness data in new ways.

3. Stability

A lot of data is transient. If you’re taking data from sensors, for example, you need to understand how often you’ll refresh the data based on sensor readings. This is an issue of data stability, as constantly changing data may lead to different results.

Data is also aging. For example, imagine you had a specific process for raising a job requisition for a new employee for nine years, but you revised the process last year. If you use all 10 years’ worth of data to train a model and then ask how to open a job requisition, most of the time, you will get a wrong answer because most of the data is outdated.

This is a clear example of how more data is not always better. Ten years’ worth of data spanning major process changes is less valuable than a smaller chunk of data that accurately captures existing processes.

4. Replicating bias

As you start using data to train AI, you run the risk of training your models on how things are now rather than the desired outcome. For example, imagine your HR department is using AI to screen job applicants. If you use your company’s existing data to train the model on what an ideal candidate would look like, your model may end up replicating existing biases in your workforce related to age or gender, for example.

You want to train the model not based on the reality in the dataset, but on the outcome you want to achieve, which starts with a clear understanding of the data and its limitations.

Dangers of problematic data

Using problematic data to train your LLMs can have serious dangers. At a basic level, it can increase hallucinations and undermine your confidence in the results. You may get inaccuracies or systems that don’t function the way you want them to. When that happens, employee trust and willingness to use systems may decline.

Using bad data could even cause reputational damage. If you use data to train a customer-facing tool that performs poorly, you may hurt customer confidence in your company’s capabilities.

Using compromised data to produce reports on the company or other public information may even become a government and compliance issue. And if data gets misclassified, you risk exposing personal information. All these scenarios can be costly, both financially and reputationally.

Act today

Your business can take the following data management steps today to capitalize on the AI revolution:

1. Strengthen your data governance process

Every enterprise needs a robust data governance process. You must define the rules around handling, storing, and updating your data by answering questions such as:

  • Who is responsible for the classification of data?
  • Who is responsible for looking at the access rights of your data?
  • Who is going to control the stewardship of that data?
  • Will you appoint a chief data officer, an analytics team, or someone else?
  • How long will you keep data, and who makes those decisions?

Your business will benefit by answering these questions before you start using company data for AI solutions.

2. Ensure your compliance processes

Your enterprise should partner your robust governance processes with equally strong compliance processes. When data is being targeted for consumption, do you have a compliance process to confirm that the person submitting it has gone through the appropriate governance checks?

As you start adopting AI tools, properly storing data isn’t enough. You must ensure your policies and procedures around data integrity extend to everywhere data is being accessed and used.

Taken together, governance and compliance processes are central to maintaining data integrity, and they will only grow in importance given the staggering amounts of data companies are amassing.

For example, as Brian Eastwood notes: “[t]he average hospital produces roughly 50 petabytes of data every year. That’s more than twice the amount of data housed in the Library of Congress, and it amounts to 137 terabytes per day.” When data is critical to your company, especially when it is also growing rapidly, you need clear planning and role responsibilities to protect, manage, and harness it.

3. Know your data

The question of how much data to use shouldn’t be based on how much data you have, but instead on understanding your data and your goals. In the early days of AI, the conventional wisdom was that more data meant a better LLM. Then there was a trend toward small language models that were highly tuned using more accurate data. Deciding which approach to take will depend on the situation at hand. But you cannot make an informed decision if you don’t first have a strong understanding of your data and its limitations.

Agentic AI’s data reckoning

The next great frontier is how to use data with agentic AI. Will it be more effective to have AI agents using LLMs or one master agent coordinating multiple AI agents, each with its own SLM?

It’s exciting to think about the possibilities that agentic AI will deliver for businesses. Regardless of which approach wins out, agentic AI will rest on the back of strong data governance and compliance processes. Strong data integrity will enable AI to truly deliver.

In the rush to train AI models, we cannot just yell, “More data!” Instead, let’s demand quality data, knowing that setting high standards now will deliver optimized results in the future.


Read More from This Article: Thanks to AI, the data reckoning has arrived
Source: News

Category: NewsApril 23, 2025
Tags: art

Post navigation

PreviousPrevious post:카스퍼스키, 북한 라자루스 그룹의 한국 조직 공격 경고 “최소 6곳 이상 노렸다”NextNext post:Has Oracle knocked SAP off the ERP throne?

Related posts

애플 디자인 철학, AI로 이어질까···오픈AI, 조니 아이브 기업 ‘IO’ 인수
May 22, 2025
PwCのCITO(最高情報技術責任者)が語る「CIOの魅力」とは
May 21, 2025
M&S says it will respond to April cyberattack by accelerating digital transformation plans
May 21, 2025
AI and load balancing
May 21, 2025
Basis Technologies launches Klario to help automate SAP change management
May 21, 2025
The AI-native generation is here. Don’t get left behind
May 21, 2025
Recent Posts
  • 애플 디자인 철학, AI로 이어질까···오픈AI, 조니 아이브 기업 ‘IO’ 인수
  • PwCのCITO(最高情報技術責任者)が語る「CIOの魅力」とは
  • M&S says it will respond to April cyberattack by accelerating digital transformation plans
  • AI and load balancing
  • Basis Technologies launches Klario to help automate SAP change management
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.