Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

LatticeFlow launches first comprehensive evaluation framework for compliance with the EU AI Act

The European Union (EU) AI Act, passed in August, has been touted as a milestone for AI development.

However, the framework has also been criticized as being vague, non-technical, and broad. Although it identifies six ethical principles, these haven’t been translated into codified benchmarks, and there have yet to be any concrete standards or recommendations issued.

To help provide some clarity and give AI makers a grasp of how well their models may fare, LatticeFlow, ETH Zurich, and the Institute for Computer Science, Artificial Intelligence and Technology (INSAIT) Wednesday announced Compl-AI. They call it the first evaluation framework for determining compliance with the AI Act.

The site has so far ranked models from the likes of OpenAI, Meta, Mistral, Anthropic and Google on more than two dozen technical specifications. Other model makers are also urged to request evaluations of their models’ compliance.

“We reveal shortcomings in existing models and benchmarks, particularly in areas like robustness, safety, diversity, and fairness,” researchers from LatticeFlow, INSAIT and ETH Zurich wrote in a technical paper. “Compl-AI for the first time demonstrates the possibilities and difficulties of bringing the act’s obligations to a more concrete, technical level.”

Most models struggle with diversity, non-discrimination

Under the EU AI Act, models and systems will be labeled as unacceptable, high, limited, and minimal risk. Notably, an unacceptable label would ban a model’s development and deployment. Model makers could also face large fines if found not in compliance.

Researchers point out that the act is expected to have an impact beyond EU borders due to its “wide extraterritorial effects.”

The act defines six ethical principles: human agency and oversight; technical robustness and safety; privacy and data governance; transparency; diversity, nondiscrimination, and fairness; and social and environmental well-being.

Addressing these principles, Compl-AI’s free, open-source framework evaluates LLM responses across 27 technical areas, including “prejudiced answers,” “general knowledge,” “biased completions,” “following harmful instructions,” “truthfulness,” “copyrighted material memorization,” “common sense reasoning,” “goal hijacking and prompt leakage,” “denying human presence” and “recommendation consistency.”

At its launch today, the platform had already evaluated 11 top models from seven prominent model makers: Anthropic’s Claude 3 Opus, OpenAI’s GPT-3.5 and GPT-4, Meta’s Llama 2 family, Google’s Gemma, Mistral’s 7B family, Qwen, and Yi.

Models are judged on a scale from 0 (no compliance at all) to 1 (full compliance). N/A scores apply when there is insufficient data. The researchers pointed out that “no model achieves perfect marks.”

Of the models evaluated so far, GPT-4 Turbo and Claude 3 Opus rank as most in compliance, both with aggregate scores of 0.89. Gemma 2 9B ranked the lowest, with an aggregate score of 0.72.

Other aggregate model scores:

–Llama 2 7B chat (the smallest Llama model): 0.75

–Mistral 7B Instruct: 0.76

–Mistral 8X7B Instruct: 0.77

–Qwen 1.5 72B Chat: 0.77

–Llama 7 13B Chat (the mid-sized Llama model): 0.77

–Llama 2 70B Chat (the largest and most capable Llama model): 0.78

–Yi 34B Chat: 0.78

–GPT-3.5 Turbo: 0.81

Researchers noted that nearly all models struggled with diversity, non-discrimination, and fairness. Also, smaller models generally score poorly on technical robustness and safety.

“A likely reason for this is the disproportional focus on model capabilities at the expense of other relevant concerns,” the researchers wrote.

Top LLMs vary widely on benchmark performance

Compl-AI said that all the models fared well in not following harmful instructions and not producing prejudiced answers. All scored a 1 for user privacy protection, and all were 0.98 or above when it came to lack of copyright infringement.

On the other hand, most models struggled with recommendation consistency, as well as cyberattack resilience, and fairness (the average was only around 0.50 for the latter). Mistral 7B-Instruct fared the worst at 0.27; Claude 3 Opus the best at 0.80.

All models scored a 0 in traceability. N/A also applied to all when it came to suitable data training. Interestingly, Claude 3 Opus was the only model to score an N/A for interpretability.

“We expect that the EU AI Act will influence providers to shift their focus, leading to a more balanced development of LLMs,” the researchers wrote. They pointed out that some benchmarks are comprehensive, others are often “simplistic and brittle,” which leads to inconclusive results. “This is another area where we expect the EU AI Act to have a positive impact, shifting the focus towards neglected aspects of model evaluation.”

Martin Vechev, professor at ETH Zurich and founder and scientific director of INSAIT, has invited researchers, developers, and regulators to help advance the evolving project, and to even add new benchmarks. Also, he noted, “the methodology can be extended to evaluate AI models against future regulatory acts, making it a valuable tool for organizations working across different jurisdictions.”

Regulators have so far reacted positively to the ranking system. Thomas Regnier, the European Commission’s spokesperson for digital economy, research, and innovation, said in a statement that his agency ​“welcomes this study and AI model evaluation platform as a first step in translating the EU AI Act into technical requirements, helping AI model providers implement the AI Act.”


Read More from This Article: LatticeFlow launches first comprehensive evaluation framework for compliance with the EU AI Act
Source: News

Category: NewsOctober 17, 2024
Tags: art

Post navigation

PreviousPrevious post:“전 세계 전기차 출하량, 2025년 33% 증가··· BEV는 35%” 가트너NextNext post:블로그 | 보고 싶은 게시물, 왜 소셜 미디어 업체가 결정하는가?

Related posts

휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
May 9, 2025
Epicor expands AI offerings, launches new green initiative
May 9, 2025
MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
May 9, 2025
오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
May 9, 2025
SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
May 8, 2025
IBM aims to set industry standard for enterprise AI with ITBench SaaS launch
May 8, 2025
Recent Posts
  • 휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
  • Epicor expands AI offerings, launches new green initiative
  • MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
  • 오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
  • SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.