Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Making the gen AI and data connection work

With all the hype surrounding gen AI, it’s no surprise it’s a dominating AI solution for companies, according to a Gartner survey released in May. Twenty-nine percent of 644 executives at companies in the US, Germany, and the UK said they were already using gen AI, and it was more widespread than other AI-related technologies, such as optimization algorithms, rule-based systems, natural language processing, and other types of ML.

The real challenge, however, is to “demonstrate and estimate” the value of projects not only in relation to TCO and the broad-spectrum benefits that can be obtained, but also in the face of obstacles such as lack of confidence in tech aspects of AI, and difficulties of having sufficient data volumes. But these are not insurmountable challenges.

Privacy protection

The first step in AI and gen AI projects is always to get the right data. “In cases where privacy is essential, we try to anonymize as much as possible and then move on to training the model,” says University of Florence technologist Vincenzo Laveglia. “A balance between privacy and utility is needed. If after anonymization the level of information in the data is the same, the data is still useful. But once personal or sensitive references are removed, and the data is no longer effective, a problem arises. Synthetic data avoids these difficulties, but they’re not exempt from the need of a trade-off. We have to make sure there’s a balance between various classes of information, otherwise the model becomes an expert on one topic and very uncertain on others.”

The umbrella of synthetic data includes data generated using data-augmentation methods, or the process of artificially generating new data from existing data, which is used to train ML models.

“When applicable, data augmentation solves the problem of insufficient data or compliance with privacy and intellectual property regulations,” says Laveglia.

Gartner agrees that synthetic data can help solve the data availability problem for AI products, as well as privacy, compliance, and anonymization challenges. Synthetic data can be generated to reflect the same statistical characteristics as real data, but without revealing personally identifiable information, thereby complying with privacy-by–design regulations and other sensitive details. The alternative to synthetic data is to manually anonymize and de-identify data sets, but this requires more time and effort and has a higher error rate.

The European AI Act also talks about synthetic data, citing them as a possible measure to mitigate the risks associated with the use of personal data for training AI systems.

“The level of attention on protections of personal data in AI has risen significantly in recent months,” says Chiara Bocchi, TMT, commercial, and data protection lawyer and counsel at Dentons. “Looking at AI models for general purposes, the spotlight is currently on data scraping, from those who carry it out and those who are subjected to it. The Italian authority has adopted some measures to prevent this activity.”

The complexities of compliance

In May, the Italian Data Protection Authority highlighted how training models on which gen AI systems are based always require a huge amount of data, often obtained by web scraping, or a massive and indiscriminate collection carried out on the web, it says. Web scraping activity can be direct, carried out by the same subject who develops the model, or indirect, carried out from third-party data lakes. So it becomes complicated for CIOs to ensure that data has been collected in a compliant manner and, above all, that they can use it.

“From the point of view of legislation on the protection of personal data and copyright, it isn’t complex to understand whether a piece of data is protected,” says Bocchi. “The complexity on the privacy side is guaranteeing the use of public or publicly accessible data for purposes other than those that determined its dissemination. Looking only at the legal basis of the processing, obtaining the consent of all the subjects from whom personal data can be collected with the scraping technique is essentially impossible.”

This is why privacy authorities are trying to find guidelines.

“In particular, the question, and assessment, is whether the legal basis of legitimate interest can be applicable to processing personal data, collected by scraping, for the purpose of training AI systems,” adds Bocchi. “The Italian data protection authority announced that it’ll soon rule on the lawfulness of web scraping of personal data based on legitimate interest.” 

The Dutch Data Protection Authority and the French Data Protection Authority (CNIL) have already intervened on this issue. CNIL has indicated that synthetic data and anonymization and pseudonymization techniques are valid measures to limit the risks associated with processing personal data to train gen AI systems.

Strategies to mitigate AI risk

Amid the complexities, capitalizing on gen AI’s potential while mitigating risks is an ongoing high-wire act.

“A winning strategy is to define solutions that ensure compliance with privacy regulations from the design phase of the gen AI system, starting from the training database,” says Bocchi.

Another effective initiative is to structure the company in a way to foster greater collaboration among upper management. “To increase trust in new technologies, many companies are taking action to create internal ethics committees, which are also assigned functions of support and promotion of innovation governance,” she adds.

On the training of AI models and data storage, CNIL also suggests that companies focus on the transparent development of AI systems and their auditability, and that the model development techniques are subjected to effective peer review.

Navigating technology and change management

When it comes to trust in AI technology, CIOs are mindful of hallucinations and discrimination risk. So in order to trust results, it’s necessary to ensure the quality of the dataset, as well as appropriately limit data storage to prevent personal or sensitive information from being leaked.

Given these premises, however, University of Florence’s Laveglia says AI is a completely reliable tool, provided the system is well built, the performance on test data is reassuring, and that the dataset used is representative of the actual distribution of data.

“An example is Alpha Fold, widely used in structural biology and bioinformatics,” he says. “It’s a program based entirely on AI techniques developed by DeepMind to predict the 3D structure of proteins starting from their amino acid sequence. It’s revolutionary because it carries out tasks in a day that would take researchers months or years with a very low error rate even if the training dataset is large. But it doesn’t have an order of magnitude comparable to the datasets used to train modern LLMs.”

Companies can move in a similar way with a pre-trained model, which ensures an optimal configuration, fine-tuning, and adaption to their use case. Starting from scratch with your own model, in fact, requires much more data collection work and a lot of skills. But using the products incorporated in the big tech suites, on the other hand, is a more immediate solution but less customizable as it could force CIOs into the boundaries of some applications. Downloading a pre-trained model and then refining it with one’s own data is a good compromise for the creativity of the IT team, as long as, together with the business, the use cases that have the potential to bring advantage to the company have first been identified.

Adopting AI in a mature way in the company means spreading this technology on a large scale in processes and functions, and trying to generate benefits that go beyond increased productivity. IT also needs to focus on AI engineering, or technological development and concrete implementation.

Plus, projects must be accompanied by upskilling and change management activities because the way teams are organized and how they work is destined to change significantly. According to the recent PwC AI Jobs Barometer study, the demand for skills that make use of AI is up 25%, which means that rather than being replaced by AI, people will have to learn better ways to work with it, something corroborated by another PwC study, the Global CEO Survey 2024, which says for 69% of the sample, AI will require the majority of employees to develop new skills.


Read More from This Article: Making the gen AI and data connection work
Source: News

Category: NewsAugust 9, 2024
Tags: art

Post navigation

PreviousPrevious post:Data transformation takes flight at Atlanta’s Hartsfield-Jackson airportNextNext post:Five generative AI tips for every business leader

Related posts

휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
May 9, 2025
Epicor expands AI offerings, launches new green initiative
May 9, 2025
MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
May 9, 2025
오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
May 9, 2025
SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
May 8, 2025
IBM aims to set industry standard for enterprise AI with ITBench SaaS launch
May 8, 2025
Recent Posts
  • 휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
  • Epicor expands AI offerings, launches new green initiative
  • MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
  • 오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
  • SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.