Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

5 Types of Costly Data Waste and How to Avoid Them

Do you know someone who bought a lot of fancy exercise equipment but doesn’t use it? Turns out, exercise equipment doesn’t provide many benefits when it goes unused.

The same principle applies to getting value from data. Organizations may acquire a lot of data, but they aren’t getting much value from it. This is a widespread issue that cuts across different sectors. It is estimated that nearly 75% of the data that enterprises collect remains unused, and thus, the value is not realized. So, what is the problem?

In the fitness example, the problem is typically not the exercise equipment; it’s an issue with the user’s habits.  Similarly, getting value from data often is not a problem with the data itself. Rather, problems arise from limitations imposed by data infrastructure and data practices that block effective and efficient use. In other words, poor choices in data infrastructure and data habits can lead to data waste. 

What is data waste, and why does it happen?

Fundamentally, data waste means missing an opportunity to get value from data or paying too much to acquire, store, and use data. In large-scale systems, data waste comes in many forms. Some are surprising, most are expensive, and almost all are avoidable. 

To avoid unnecessary data waste in your organization, first you must recognize it. The following describes five common ways that waste occurs:

  • Data is used and then thrown away

A common data habit that results in missed opportunity is assuming data has no further value once it’s been used for the particular purpose. Data is ingested, processed, transformed (perhaps for a specific report or to be stored in a traditional database), and then the raw or partially processed data is discarded. It isn’t practical to save all your data, but it is important to realize data may be valuable for other projects. You lose that add-on value when you throw data away. 

This type of data waste results in missing out on the second project advantage. For example, AI and machine learning projects offer great potential value, but they are speculative. Lowering the entry cost by re-using data and infrastructure already in place for other projects makes trying many different approaches feasible. That, in turn, makes it more likely to find the ones that pay off. Fortunately, learning-based projects typically use data collected for other purposes. 

It’s also important to go back to raw data to ask new questions and train new models, particularly as the world is constantly changing. Features that you didn’t think were valuable at first may later be just what you need. You’ve lost that opportunity if the data has been thrown away.  

  • You have data but don’t use it

Why does valuable data so often go unused? One reason is people don’t know where it is or even possibly that it exists at all. Lack of annotation with the right metadata is a contributing factor. Another is poor communication between projects or business units.

An even larger issue is that people may not know how to see value in data. Recognizing what data can tell you is an acquired skill for people beyond just data scientists. New approaches are being developed to understand and use unstructured data, for instance. But to get the benefits data has to offer, you must learn to use it, just like you need to know how to use exercise equipment before it can do you any good. 

Another factor that keeps people from fully using and re-using data is data infrastructure requiring specialized tools. This limitation makes it inconvenient for data to be used by different types of applications or different analytics and AI tools. Increasingly, people look for ways to unify their data layer and have flexible access in order to build a data-first environment.

  • You have data but not where it is needed

Data in the wrong place is about the same as data that does not exist. And “wrong place” can mean more than one thing. It may be that data is held by a different business unit, making it difficult to identify or challenging to get the permissions and access needed to share that data. Once again, there is a cost for not using data because it is somewhere other than you’d like it to be. 

Another way data is in the wrong place is in a more literal sense: geolocation. For large systems, major data motion from edge to data center or between data centers that are located in different cities or countries is challenging, especially if you do not have data infrastructure designed to do move data  automatically. Coding data motion into applications is not an adequate alternative except in the simplest of cases. To avoid data waste, you must have a way to efficiently move data to where it is needed. Otherwise, hand-coding of data motion can lead to additional problems, including unwanted duplication.

  • Your system involves unwanted duplication

Having unnecessary duplication of large data sets is clearly a waste of the resources used to store and access data, but it involves waste in other ways as well. Duplication of data also entails duplication of effort, which is an additional cost. And the problem is not just a matter of too many copies of data. Approximately duplicated data sets may introduce uncertainty about data quality. Near duplicates immediately raise the question of which is authoritative and why there are differences, and that leads to mistrust about data quality. 

Hand-coded data motion by many different users creates its own problems, as this is hard to do accurately at scale. Resultant data sets can introduce unintentional variation in data even where a verbatim copy is intended. 

Another related problem is the creation of data silos in large systems. Unwillingness to share data often points to the lack of a uniform data layer with flexibility in data access. Siloed data not only results in avoidable costs, but it also limits the understanding and insights data scientists and analysts can draw from the data. Siloing and poor data discovery capabilities are wasteful through opportunity cost plus the cost of redundant storage and duplicated effort.

A special example of data waste through unnecessary duplication occurs when an enterprise buys data that could have been obtained for free. This waste happens because people may not know what data options are available.

  • Disconnect between data producers and data consumers

One problem with connecting data producers and data consumers is that those who produce data or even those responsible for data ingestion often do not know how it will be used. That disconnect makes it harder for those who need data to know where to find it or to know what the data actually consists of when they do find it. Data producers are challenged with annotating data appropriately without knowing the ways it will be used. This disconnect between data producers and data consumers leads to a classic type of data waste in the sense of missed opportunity or unnecessary effort and expense required to track down data.  

Reducing data waste

How can you address the issues listed above in order to reduce data waste? You need to develop a comprehensive data strategy that includes a unifying data infrastructure engineered to support flexible data access, data sharing, and efficient data motion. HPE Ezmeral Data Fabric is a software-defined and hardware agnostic data technology used to store, manage, and move data at scale across an enterprise — from edge to data center, on premises, or in the cloud. As such, it serves as a unifying data layer that supports a wide range of applications and tools, thus inviting the re-use of data. In addition, data fabric handles data motion automatically at a platform level. 

Other solutions come in the form of better use of metadata to aid in data discovery and understanding, along with new data initiatives to better connect data producers with data consumers. One new initiative is the Agstack Foundation, an open-source digital infrastructure for agriculture. Another example is Dataspaces, a new service platform that helps data producers and data consumers integrate diverse data sets, enhance data discovery, and access and improve data governance and trust.

These solutions can help you reduce costly data waste and take better advantage of the value data offers. Making better use of your exercise equipment, however, is still up to you.

To find out more about data infrastructure that can help you reduce data waste, read this technical paper.

____________________________________

About Ellen Friedman

ellenfcr
Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.


Read More from This Article: 5 Types of Costly Data Waste and How to Avoid Them
Source: News

Category: NewsMarch 29, 2022
Tags: art

Post navigation

PreviousPrevious post:Royal Caribbean revamps IT work models in pandemic pushNextNext post:CIO exit interview: Steve Hodgkinson, Victoria Health and Human Services

Related posts

휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
May 9, 2025
Epicor expands AI offerings, launches new green initiative
May 9, 2025
MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
May 9, 2025
오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
May 9, 2025
SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
May 8, 2025
IBM aims to set industry standard for enterprise AI with ITBench SaaS launch
May 8, 2025
Recent Posts
  • 휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
  • Epicor expands AI offerings, launches new green initiative
  • MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
  • 오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
  • SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.