Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Unlocking the hidden value of dark data

IT leaders seeking to derive business value from the data their companies collect face myriad challenges. Perhaps the least understood is the lost opportunity of not making good on data that is created, and often stored, but seldom otherwise interacted with.

This so-called “dark data,” named after the dark matter of physics, is information routinely collected in the course of doing business: It’s generated by employees, customers, and business processes. It’s generated as log files by machines, applications, and security systems. It’s documents that must be saved for compliance purposes, and sensitive data that should never be saved, but still is.

According to Gartner, the majority of your enterprise information universe is composed of “dark data,” and many companies don’t even know how much of this data they have. Storing it increases compliance and cybersecurity risks, and, of course, doing so also increases costs.

Figuring out what dark data you have, where it is kept, and what information is in it is an essential step to ensuring the valuable parts of this dark data are secure, and those that shouldn’t be kept are deleted. But the real advantage to unearthing these hidden pockets of data may be in putting it to use to actually benefit the business.

But mining dark data is no easy task. It comes in a wide variety of formats, can be completely unformatted, locked away in scanned documents or audio or video files, for example.

Here is a look at how some organizations are transforming dark data into business opportunities, and what advice industry insiders have for IT leaders looking to leverage dark data.

Coded audio from race car drivers

For five years, Envision Racing has been collecting audio recordings from more than 100 Formula E races, each with more than 20 drivers.

“The radio streams are available on open frequencies for anyone to listen to,” says Amaresh Tripathy, global leader of analytics at Genpact, a consulting company that helped Envision Racing make use of this data.

Previously the UK-based racing team’s race engineers tried to use these audio transmissions in real-time during races, but the code names and acronyms drivers used made it difficult to figure out what was being said and how it could be made use of, as understanding what other drivers were saying could help Envision Racing’s drivers with their racing strategy, Tripathy says.

“Such as when to use the attack mode. When to overtake a driver. When to apply brakes,” he says.

Envision Racing was also collecting sensor data from its own cars, such as from tires, batteries, and breaks, and purchasing external data from vendors, such as wind speed and precipitation.

Genpact and Envision Racing worked together to unlock the value of these data streams, making use of natural language processing to build deep learning models to analyze them. The process took six months, from preparing the data pipeline, to ingesting the data, to filtering out noise, to deriving meaningful conversations.

Tripathy says humans take five to ten seconds to figure out what they’re listening to, a delay that made the radio communications irrelevant. Now, thanks to the AI model’s predictions and insights, they can now respond in one to two seconds.

In July, at the ABB FIA Formula E World Championship in New York, the Envision Racing team took first and third places, a result Tripathy credits to making use of what was previously dark data.

Dark data gold: Human-generated data

Envision Racing’s audio files are an example of dark data generated by humans, intended for consumption by other humans — not by machines. This kind of dark data can be extremely useful for enterprises, says Kon Leong, co-founder and CEO of ZL Technologies, a data archiving platform provider.

“It is incredibly powerful for understanding every element of the human side of the enterprise, including culture, performance, influence, expertise, and engagement,” he says. “Employees share absolutely massive amounts of digital information and knowledge every single day, yet to this point it’s been largely untapped.”

The information contained in emails, messages, and files can help organizations derive insights such as who are the most influential people are in the organization. “Eighty percent of company time is spent communicating. Yet analytics often deals with data that only reflects 1% of our time spent,” Leong says.

Processing human-generated unstructured data is uniquely challenging. Data warehouses aren’t typically set up to handle these communications, for example. Moreover, collecting these communications can create new issues for companies to deal with, having to do with compliance, privacy, and legal discovery.

“These governance capabilities are not present in today’s concept of a data lake, and in fact by collecting data into a data lake, you create another silo which increases privacy and compliance risks,” Leong says.

Instead companies can also leave this data where it currently resides, simply adding a layer of indexing and metadata for searchability. Leaving the data in place will also keep it within existing compliance structures, he says.

Effective governance is key

Another approach to handling dark data of questionable value and origin is to start with traceability.

“It’s a positive development in the industry that dark data is now recognized as an untapped resource that can be leveraged,” says Andy Petrella, author of Fundamentals of Data Observability, currently available in pre-release form from O’Reilly. Petrella is also the founder of data observability provider Kensu.

“The challenge with utilizing dark data is the low levels of confidence in it,” he says, in particular around where and how the data is collected. “Observability can make data lineage transparent, hence traceable. Traceability enables data quality checks that lead to confidence in employing these data to either train AI models or act on the intelligence that it brings.”

Chuck Soha, managing director at StoneTurn, a global advisory firm specializing in regulatory, risk, and compliance issues, agrees that the common approach to tackling dark data — throwing everything into a data lake — poses significant risks.

This is particularly true in the financial services industry, he says, where companies have been sending data into data lakes for years. “In a typical enterprise, the IT department dumps all available data at their disposal into one place with some basic metadata and creates processes to share with business teams,” he says.

That works for business teams that have the requisite analytics talent in-house or that bring in external consultants for specific use cases. But for the most part these initiatives are only partially successful, Soha says.

“CIOs transformed from not knowing what they don’t know to knowing what they don’t know,” he says.

Instead, companies should begin with data governance to understand what data there is and what issues it might have, data quality chief among them.

“Stakeholders can decide whether to clean it up and standardize it, or just start over with better information management practices,” Soha says, adding that investing in extracting insights from data that contains inconsistent or conflicting information would be a mistake.

Soha also advises connecting the dots between good operational data already available inside individual business units. Figuring out these relationships can create rapid and useful insights that might not require looking at any dark data right away, he says. “And it might also identify gaps that could prioritize where in the dark data to start to look to fill those gaps in.”

Finally, he says, AI can be very useful in helping make sense of the unstructured data that remains. “By using machine learning and AI techniques, humans can look at as little as 1% of dark data and classify its relevancy,” he says. “Then a reinforcement learning model can quickly produce relevancy scores for the remaining data to prioritize which data to look at more closely.”

Using AI to extract value

Common AI-powered solutions for processing dark data include Amazon’s Textract, Microsoft’s Azure Cognitive Services, and IBM’s Datacap, as well as Google’s Cloud Vision, Document, AutoML, and NLP APIs.

In Genpact’s partnership with Envision Racing, Genpact coded the machine learning algorithms in-house, Tripathy says. This required knowledge of Docker, Kubernetes, Java, and Python, as well as NLP, deep learning, and machine learning algorithm development, he says, adding that an MLOps architect managed the complete process.

Unfortunately, these skills are hard to come by. In a report released last fall by Splunk, only 10% to 15% of more than 1,300 IT and business decision makers surveyed said their organizations are using AI to solve the dark data problem. Lack of necessary skills was a chief obstacle to making use of dark data, second only to the volume of the data itself.

A problem (and opportunity) on the rise

In the meantime, dark data remains a mounting trove of risk — and opportunity. Estimates of the portion of enterprise data that is dark vary from 40% to 90%, depending on industry.

According to a July report from Enterprise Strategy Group, and sponsored by Quest, 47% of all data is dark data, on average, with a fifth of respondents saying more than 70% of their data is dark data. Splunk’s survey showed similar findings, with 55% of all enterprise data, on average, being dark data, and a third of respondents saying that 75% or more of their organization’s data is dark.

And the situation is likely to get worse before it gets better, as 60% of respondents say that more than half of the data in their organization is not captured at all and much of it is not even understood to exist. As that data is found and stored, the amount of dark data is going to continue to go up.

It’s high time CIOs put together a plan on how to deal with it — with an eye toward making the most of any dark data that shows promise in creating new value for the business.

Analytics, Data Management, Data Science


Read More from This Article: Unlocking the hidden value of dark data
Source: News

Category: NewsAugust 11, 2022
Tags: art

Post navigation

PreviousPrevious post:Sitecore’s Release of XM Cloud Empowers Brands and Marketers to Maximize Consumer EngagementNextNext post:3 Reasons Why the Future of Business Will be Composable

Related posts

Oracle NetSuite announces AI coding skills for SuiteCloud developers
April 29, 2026
Your AI agent is ready to go. Is your infrastructure?
April 29, 2026
독일 소버린 AI 대표주자 알레프 알파, 코히어와 손잡고 글로벌 연합 선택
April 29, 2026
Las empresas se están replanteando Kubernetes
April 29, 2026
Enterprises still chase incremental, not transformational, AI gains
April 29, 2026
SAP 2027 deadline for S/4HANA out of reach for most customers
April 29, 2026
Recent Posts
  • Oracle NetSuite announces AI coding skills for SuiteCloud developers
  • Your AI agent is ready to go. Is your infrastructure?
  • 독일 소버린 AI 대표주자 알레프 알파, 코히어와 손잡고 글로벌 연합 선택
  • Las empresas se están replanteando Kubernetes
  • Enterprises still chase incremental, not transformational, AI gains
Recent Comments
    Archives
    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.