Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

How can CIOs protect Personal Identifiable Information (PII) for a new class of data consumers?

Industries increasingly rely on data and AI to enhance processes and decision-making. However, they face a significant challenge in ensuring privacy due to sensitive Personally Identifiable Information (PII) in most enterprise datasets. Safeguarding PII is not a new problem. Conventional IT and data teams query data containing PII, but only a select few require access. Rate-limiting access, role-based access protection, and masking have been widely adopted for traditional BI applications to govern sensitive data access. 

Protecting sensitive data in the modern AI/ML pipeline has different requirements. The emerging and ever-growing class of data users consists of ML data scientists and applications requiring larger datasets. Data owners need to walk a tightrope to ensure parties in their AI/Ml lifecycle get appropriate access to the data they need while maximising the privacy of that PII data.

Enter the new class 

ML data scientists require large quantities of data to train machine learning models. Then the trained models become consumers of vast amounts of data to gain insights to inform business decisions. Whether before or after model training, this new class of data consumers relies on the availability of large amounts of data to provide business value.

In contrast to conventional users who only need to access limited amounts of data, the new class of ML data scientists and applications require access to entire datasets to ensure that their models represent the data with precision. And even if they’re used, they may not be enough to prevent an attacker from inferring sensitive information by analyzing encrypted or masked data patterns. 

The new class often uses advanced techniques such as deep learning, natural language processing, and computer vision to analyze and extract insights from the data. These efforts are often slowed down or blocked as they face sensitive PII data entangled within a large proportion of datasets they require. Up to 44% of data is reported to be inaccessible in an organization. This limitation blocks the road to AI’s promised land in creating new and game-changing value, efficiencies, and use cases. 

The new requirements have led to the emergence of techniques such as differential privacy, federated learning, synthetic data, and homomorphic encryption, which aim to protect PII while still allowing ML data scientists and applications to access and analyze the data they need. However, there is still a market need for solutions deployed across the ML lifecycle (before and after model training) to protect PII while accessing vast datasets – without drastically changing the methodology and hardware used today.

Ensuring privacy and security in the modern ML lifecycle

The new breed of ML data consumers needs to implement privacy measures at both stages of the ML lifecycle: ML training and ML deployment (or inference).

In the training phase, the primary objective is to use existing examples to train a model.

The trained model must make accurate predictions, such as classifying data samples it did not see as part of the training dataset. The data samples used for training often have sensitive information (such as PII) entangled in each data record. When this is the case, modern privacy-preserving techniques and controls are needed to protect sensitive information.

In the ML deployment phase, the trained model makes predictions on new data that the model did not see during training; inference data. While it is critical to ensure that any PII used to train the ML model is protected and the model’s predictions do not reveal any sensitive information about individuals, it is equally critical to protect any sensitive information and PII within inference data samples as well. Inferencing on encrypted data is prohibitively slow for most applications, even with custom hardware. As such, there is a critical need for viable low-overhead privacy solutions to ensure data confidentiality throughout the ML lifecycle.

The modern privacy toolkit for ML and AI: Benefits and drawbacks

Various modern solutions have been developed to address PII challenges, such as federated learning, confidential computing, and synthetic data, which the new class of data consumers is exploring for Privacy in ML and AI. However, each solution has differing levels of efficacy and implementation complexities to satisfy user requirements.

Federated learning

Federated learning is a machine learning technique that enables training on a decentralized dataset distributed across multiple devices. Instead of sending data to a central server for processing, the training occurs locally on each device, and only model updates are transmitted to a central server.

  • Limitation: Research conducted in 2020 from the Institute of Electrical and Electronics Engineers  shows that an attacker could infer private information from model parameters in federated learning. Additionally, federated learning does not address the inference stage, which still exposes data to the ML model during cloud or edge device deployment.

Differential privacy

Differential privacy provides margins on how much a single data record from a training dataset contributes to a machine-learning model. A membership test on the training data records ensures that if a single data record is removed from the dataset, the output should not change beyond a certain threshold.

  • Limitation: While training with differential privacy has benefits, it still requires the data scientist’s access to large volumes of plain-text data. Additionally, it does not address the ML inference stage in any capacity. 

Homomorphic encryption

Homomorphic encryption is a type of encryption that allows computation to be performed on data while it remains encrypted. For modern users, this means that machine learning algorithms can operate on data that has been encrypted without the need to decrypt it first. This can provide greater privacy and security for sensitive data since the data never needs to be revealed in plain text form. 

  • Limitation: Homomorphic encryption is prohibitively costly because it operates on encrypted data rather than plain-text data, which is computationally intensive. Homomorphic encryption often requires custom hardware to optimize performance, which can be expensive to develop and maintain. Finally, data scientists use deep neural networks in many domains, often difficult or impossible to implement in a homomorphically encrypted fashion.

Synthetic data

Synthetic data is computer-generated data that mimic real-world data. It is often used to train machine learning models and protect sensitive data in healthcare and finance. Synthetic data can generate large amounts of data quickly and bypass privacy risks. 

  • Limitation: While synthetic data may help train a predictive model, it only adequately covers some possible real-world data subspaces. This can result in accuracy loss and undermine the model’s capabilities in the inference stage. Also, actual data must be protected in the inference stage, which synthetic data cannot address. 

Confidential computing

Confidential computing is a security approach that protects data during use. Major companies, including Google, Intel, Meta, and Microsoft, have joined the Confidential Computing Consortium to promote hardware-based Trusted Execution Environments (TEEs). The solution isolates computations to these hardware-based TEEs to safeguard the data. 

  • Limitation: Confidential computing requires companies to incur additional costs to move their ML-based services to platforms that require specialized hardware. The solution is also partially risk-free. An attack in May 2021 collected and corrupted data from TEEs that rely on Intel SGX technology.

While these solutions are helpful, their limitations become apparent when training and deploying AI models. The next stage in PII privacy needs to be lightweight and complement existing privacy measures and processes while providing access to datasets entangled with sensitive information. 

Balancing the tightrope of PII confidentiality with AI: A new class of PII protection 

We’ve examined some modern approaches to safeguard PII and the challenges the new class of data consumers faces. There is a balancing act in which PII can’t be exposed to AI, but the data consumers must use as much data as possible to generate new AI use cases and value. Also, most modern solutions address data protection during the ML training stage without a viable answer for safeguarding real-world data during AI deployments.

Here, we need a future-proof solution to manage this balancing act. One such solution I have used is the stained glass transform, which enables organisations to extract ML insights from their data while protecting against the leakage of sensitive information. The technology developed by Protopia AI can transform any data type by identifying what AI models require, eliminating unnecessary information, and transforming the data as much as possible while retaining near-perfect accuracy. To safeguard users’ data while working on AI models, enterprises can choose stained glass transform to increase their ML training and deployment data to achieve better predictions and outcomes while worrying less about data exposure.  

More importantly, this technology also adds a new layer of protection throughout the ML lifecycle – for training and inference. This solves a significant gap in which privacy was left unresolved during the ML inference stage for most modern solutions.

The latest Gartner AI TriSM guide for implementing Trust, Risk, and Security Management in AI highlighted the same problem and solution. TRiSM guides analytics leaders and data scientists to ensure AI reliability, trustworthiness, and security. 

While there are multiple solutions to protect sensitive data, the end goal is to enable enterprises to leverage their data to the fullest to power AI.

Choosing the right solution(s) 

Choosing the right privacy-preserving solutions is essential for solving your ML and AI challenges. You must carefully evaluate each solution and select the ones that complement, augment, or stand alone to fulfil your unique requirements. For instance, synthetic data can enhance real-world data, improving the performance of your AI models. You can use synthetic data to simulate rare events that may be difficult to capture, such as natural disasters, and augment real-world data when it’s limited.

Another promising solution is confidential computing, which can transform data before entering the trusted execution environment. This technology is an additional barrier, minimizing the attack surface on a different axis. The solution ensures that plaintext data is not compromised, even if the TEE is breached. So, choose the right privacy-preserving solutions that fit your needs and maximize your AI’s performance without compromising data privacy.

Wrap up

Protecting sensitive data isn’t just a tech issue – it’s an enterprise-wide challenge. As new data consumers expand their AI and ML capabilities, securing Personally Identifiable Information (PII) becomes even more critical. To create high-performance models delivering honest value, we must maximize data access while safeguarding it. Every privacy-preserving solution must be carefully evaluated to solve our most pressing AI and ML challenges. Ultimately, we must remember that PII confidentiality is not just about compliance and legal obligations but about respecting and protecting the privacy and well-being of individuals.

Data Privacy, Data Science, Machine Learning
Read More from This Article: How can CIOs protect Personal Identifiable Information (PII) for a new class of data consumers?
Source: News

Category: NewsMarch 22, 2023
Tags: art

Post navigation

PreviousPrevious post:Closing the gender gap: What needs to be doneNextNext post:ServiceNow continues workflow platform expansion with Utah release

Related posts

Barb Wixom and MIT CISR on managing data like a product
May 30, 2025
Avery Dennison takes culture-first approach to AI transformation
May 30, 2025
The agentic AI assist Stanford University cancer care staff needed
May 30, 2025
Los desafíos de la era de la ‘IA en todas partes’, a fondo en Data & AI Summit 2025
May 30, 2025
“AI 비서가 팀 단위로 지원하는 효과”···퍼플렉시티, AI 프로젝트 10분 완성 도구 ‘랩스’ 출시
May 30, 2025
“ROI는 어디에?” AI 도입을 재고하게 만드는 실패 사례
May 30, 2025
Recent Posts
  • Barb Wixom and MIT CISR on managing data like a product
  • Avery Dennison takes culture-first approach to AI transformation
  • The agentic AI assist Stanford University cancer care staff needed
  • Los desafíos de la era de la ‘IA en todas partes’, a fondo en Data & AI Summit 2025
  • “AI 비서가 팀 단위로 지원하는 효과”···퍼플렉시티, AI 프로젝트 10분 완성 도구 ‘랩스’ 출시
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.