Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

11 dark secrets of data management

Some call data the new oil. Others call it the new gold. Philosophers and economists may argue about the quality of the metaphor, but there’s no doubt that organizing and analyzing data is a vital endeavor for any enterprise looking to deliver on the promise of data-driven decision-making.

And to do so, a solid data management strategy is key. Encompassing data governance, data ops, data warehousing, data engineering, data analytics, data science, and more, data management, when done right, can provide businesses in every industry a competitive edge.

The good news is that many facets of data management are well-understood and are grounded in sound principles that have evolved over decades. For example, they may not be easy to apply or simple to comprehend but thanks to bench scientists and mathematicians alike, companies now have a range of logistical frameworks for analyzing data and coming to conclusions. More importantly, we also have statistical models that draw error bars that delineate the limits of our analysis.

But for all the good that’s come out of the study of data science and the various disciplines that fuel it, sometimes we’re still left scratching our heads. Enterprises are often bumping up to the limits of the field. Some of the paradoxes relate to the practical challenges of gathering and organizing so much data. Others are philosophical, testing our ability to reason about abstract qualities. And then there is the rise of privacy concerns around so much data being collected in the first place.

Following are some of the dark secrets that make data management such a challenge for so many enterprises.

Unstructured data is difficult to analyze

Much of the data stored away in the corporate archives doesn’t have much structure at all. One of my friends yearns to use an AI to search through the text notes taken by call center staff at his bank. These sentences may contain insights that could help improve the bank’s lending and services. Perhaps. But the notes were taken by hundreds of different people with different ideas of what to write down about a given call. Moreover, staff members have different writing styles and abilities. Some didn’t write much at all. Some write down too much information about their given calls. Text by itself doesn’t have much structure to begin with, but when you’ve got a pile of text written by hundreds or thousands of employees over dozens of years, then whatever structure there is might be even weaker.

Even structured data is often unstructured

Good scientists and database administrators guide databases by specifying the type and structure of each field. Sometimes, in the name of even more structure, they limit the values in a given field to integers in certain ranges or to predefined choices. Even then, the people filling out the forms that the database stores find ways to add wrinkles and glitches. Sometimes fields are left empty. Other people put in a dash or the initials “n.a.” when they think a question doesn’t apply. People even spell their names differently from year to year, day to day, or even line to line on the same form. Good developers can catch some of these issues through validation. Good data scientists can also reduce some of this uncertainty through cleansing. But it’s still maddening that even the most structured tables have questionable entries — and that those questionable entries can introduce unknowns and even errors in analysis.

Data schemas are either too strict or too loose

No matter how hard data teams try to spell out schema constraints, the resulting schemas for defining the values in the various data fields are either too strict or too loose. If the data team adds tight constraints, users complain that their answers aren’t found on the narrow list of acceptable values. If the schema is too accommodating, users can add strange values with little consistency. It’s almost impossible to tune the schema just right.

Data laws are very strict

Laws about privacy and data protection are strong and are only getting stronger. Between regulations such as the GDPR, HIPPA, and a dozen or so more, it can be very difficult to assemble data, and even more dangerous to keep it lying around waiting for a hacker to break in. In many cases, it’s easier to spend more money on lawyers than programmers or data scientists. These headaches are why some companies simply dispose of their data as soon as they can get rid of it.

Data cleansing costs are huge

Many data scientists will confirm that 90% of the job is just collecting the data, putting it in a consistent form, and dealing with the endless holes or mistakes. The person with the data will always say, “It’s all in a CSV and ready to go.” But they don’t mention the empty fields or the mischaracterizations. It’s easy to spend 10 times as much time on cleaning up data for use in a data science project than just starting up the routine in R or Python to actually perform the statistical analysis.

Users are increasingly suspicious of your data practices

End users and customers are getting evermore suspicious about a company’s data management practices, and some AI algorithms and their use are only amplifying the fear, leaving many people very uneasy about what’s happening to the data capturing their every move. Those fears are fueling regulation and often snagging companies and even well-meaning data scientists into public relations blowback. Not only that, but people are deliberately jamming data collection with fake values or wrong answers. Sometimes half of the work is dealing with malicious partners and customers.

Integrating outside data can reap rewards — and bring disaster

It’s one thing for a company to take ownership of the data it gathers. The IT department and data scientists have control over that. But increasingly aggressive companies are figuring out how to integrate their homegrown information with third-party data and the vast seas of personalized information floating on the internet. Some tools openly promise to suck in data about each and every customer to build personalized dossiers on each purchase. Yes, they use the same words as the spy agencies going after terrorists to track your fast-food purchases and credit scores. Is it any wonder that people fret and panic?

Regulators are cracking down on data use

No one knows when clever data analsyis crosses some line, but once it does the regulators show up. In one recent example from Canada, the government explored how some of the doughnut shops were tracking customers who were also shopping at competitors. A recent news release announced, “The investigation found that Tim Hortons’ contract with an American third-party location services supplier contained language so vague and permissive that it would have allowed the company to sell ‘de-identified’ location data for its own purposes.” And for what? To sell more doughnuts? Regulators are increasingly taking notice of anything involving personal information.

Your data scheme may not be worth it

We imagine that a brilliant algorithm may make everything more efficient and profitable. And sometimes such an algorithm is actually possible, but the price can also be too high. For instance, consumers — and even companies — are increasingly questioning the value of targeted marketing that comes from elaborate data management schemes. Some point to the way that we often see ads for something we already purchased because the ad trackers haven’t figured out that we’re not in the market anymore. The same fate often awaits other clever schemes. Sometimes a rigorous data analysis identifies the worst performing factory, but it doesn’t matter because the company signed a 30-year lease on the building. Companies need to be ready for the likelihood that all that genius of data science might produce an answer that isn’t acceptable.

In the end, data decisions are often just judgment calls

Numbers can offer plenty of precision, but how humans interpret them is often what matters. After all the data analysis and AI magic, most algorithms require a decision to be made about whether some value is over or under a threshold. Sometimes scientists want a p-value lower than 0.05. Sometimes a cop is looking to give tickets to cars going 20% over the speed limit. These thresholds are often just arbitrary values. For all the science and mathematics that can be applied to data, many “data-driven” processes have more gray area in them than we would like to believe, leaving decisions up to what amounts to gut instinct despite all the resources a company may have put into its data management practices.

Data storage costs are exploding

Yes, disk drives keep getting fatter and the price per terabyte keeps dropping, but the programmers are gathering bits faster than the prices can fall. The devices from the internet of things (IoT) keep uploading data and users expect to browse a rich collection of these bytes forever. In the meantime, compliance officers and regulators keep asking for more and more data in case of future audits. It would be one thing if someone actually looked at some of the bits, but we only have so much time in the day. The percentage of data that is actually accessed again keeps dropping lower and lower. Yet the price for storing the expanding bundle keeps drifting up.

Data Management


Read More from This Article: 11 dark secrets of data management
Source: News

Category: NewsJune 28, 2022
Tags: art

Post navigation

PreviousPrevious post:One of the Best Things You Can Do as a CIONextNext post:Why Multi-Factor Authentication is Key to Modern Cybersecurity

Related posts

휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
May 9, 2025
Epicor expands AI offerings, launches new green initiative
May 9, 2025
MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
May 9, 2025
오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
May 9, 2025
SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
May 8, 2025
IBM aims to set industry standard for enterprise AI with ITBench SaaS launch
May 8, 2025
Recent Posts
  • 휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
  • Epicor expands AI offerings, launches new green initiative
  • MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
  • 오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
  • SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.