Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Getting infrastructure right for generative AI

Facts, it has been said, are stubborn things. For generative AI, a stubborn fact is that it  consumes very large quantities of compute cycles, data storage, network bandwidth, electrical power, and air conditioning. As CIOs respond to corporate mandates to “just do something” with genAI, many are launching cloud-based or on-premises initiatives. But while the payback promised by many genAI projects is nebulous, the costs of the infrastructure to run them is finite, and too often, unacceptably high.

Infrastructure-intensive or not, generative AI is on the march. According to IDC, genAI workloads are increasing from 7.8% of the overall AI server market in 2022 to 36% in 2027. In storage, the curve is similar, with growth from 5.7% of AI storage in 2022 to 30.5% in 2027. IDC research finds roughly half of worldwide genAI expenditures in 2024 will go toward digital infrastructure. IDC projects the worldwide infrastructure market (server and storage) for all kinds of AI will double from $28.1 billion in 2022 to $57 billion in 2027.

But the sheer quantity of infrastructure needed to process genAI’s large language models (LLMs), along with power and cooling requirements, is fast becoming unsustainable.

“You will spend on clusters with high-bandwidth networks to build almost HPC [high-performance computing]-like environments,” warns Peter Rutten, research vice president for performance-intensive computing at IDC. “Every organization should think hard about investing in a large cluster of GPU nodes,” says Rutten, asking, “What is your use case? Do you have the data center and data science skill sets?”

Shifting to small language models, hybrid infrastructure

Savvy IT leaders are aware of the risk of overspending on genAI infrastructure, whether on-premises or in the cloud. After taking a hard look at their physical operations and staff capabilities as well as the fine print of cloud contracts, some are coming up with strategies that are delivering positive return on investment.  

Seeking to increase the productivity of chronically understaffed radiology teams, Mozziyar Etemadi, medical director of advanced technologies at Northwestern Medicine undertook a genAI project designed to speed the interpretation of X-ray images. But instead of piling on compute, storage, and networking infrastructure to handle massive LLMs, Northwestern Medicine shrank the infrastructure requirements by working with small language models (SLMs).

Etemadi began by experimenting with cloud-based services but found them unwieldy and expensive. “I tried them, but we couldn’t get [generative AI] to work in a favorable cost envelope.” That led Etimadi to the realization that he would have to spearhead a dedicated engineering effort.

Heading a team of a dozen medical technologists, Etemadi built a four-node cluster of Dell PowerEdge XE9680 servers with eight Nvidia H100 Tensor Core GPUs, connected with Nvidia Quantum-2 InfiniBand networking. Running in a colocation facility, the cluster ingests multimodal data, including images, text, and video, which trains the SLM on how to interpret X-ray images. The resulting application, which was recently patented, generates highly accurate interpretations of the pictures, feeding them to a human-in-the-loop (HITL) for final judgement.

“It’s multimodal, but tiny. The number of parameters is approximately 300 million. That compares to ChatGPT, which is at least a trillion,” says Etimadi, who envisions building on the initial X-ray application to interpret CT scans, MRI images, and colonoscopies.

He estimates that using a cloud-based service for the same work would cost about twice as much as it costs to run the Dell cluster. “On the cloud, you’re paying by the hour and you’re paying a premium.” In contrast, he asserts, “Pretty much any hospital in the US can buy four computers. It’s well within the budget.”

When it comes to data storage, Northwestern Medicine uses both the cloud and on-premises infrastructure for both temporary and permanent storage. “It’s about choosing the right tool for the job. With storage, there is really no one-size-fits-all,” says Etemadi, adding, “As a general rule, storage is where cloud has the highest premium fee.”

On premises, Northwestern Medicine is using a mix of Dell NAS, SAN, secure, and hyperconverged infrastructure equipment. “We looked at how much data we needed and for how long. Most of the time, the cloud is definitely not cheaper,” asserts Editmadi. 

The cost calculus of GPU clusters

Faced with similar challenges, a different approach was taken by Papercup Technologies, a UK company that has developed genAI-based language translation and dubbing services. Papercup clients seeking to globalize the appeal of their products use the company’s service to generate convincing voice-overs in many languages for use in commercial videos. Before a job is complete, an HITL examines output for accuracy and cultural relevance. The LLM work started in a London office building, which was soon outgrown by the infrastructure demands of generative AI.

“It was quite cost-effective at first to buy our own hardware, which was a four-GPU cluster,” says Doniyor Ulmasov, head of engineering at Papercup. He estimates initial savings between 60% and 70% compared with cloud-based services. “But when we added another six machines, the power and cooling requirements were such that the building could not accommodate them. We had to pay for machines we could not use because we couldn’t cool them,” he recounts.

And electricity and air conditioning weren’t the only obstacles. “Server-grade equipment requires know-how for things like networking setup and remote management. We expended a lot of human resources to maintain the systems, so the savings weren’t really there,” he adds.

At that point, Papercup decided the cloud was needed. The company now uses Amazon Web Services, where translation and dubbing workloads for customers are handled, to be reviewed by an HITL. Simpler training workloads are still run on premises on a mixture of servers powered by Nvidia A100 Tensor Core, GeForce RTX 4090, and GeForce RTX 2080Ti hardware. More resource-intensive training is handled on a cluster hosted on Google Cloud Platform. Building on its current services, Papercup is exploring language translation and dubbing for live sports events and movies, says Ulmasov.

For Papercup, infrastructure decisions are driven as much by geography as by technology requirements. “If we had a massive warehouse outside the [London] metro area, you could make the case [for keeping work on-premises]. But we are in the city center. I would still consider on-premises if space, power, and cooling were not issues,” says Ulmasov.

Beyond GPUs

For now, GPU-based clusters are simply faster than CPU-based configurations, and that matters. Both Etimadi and Ulmasov say using CPU-based systems would cause unacceptable delays that would keep their HITL experts waiting. But the high energy demands of the current generation of GPUs will only increase, according to IDC’s Rutten.

“Nvidia’s current GPU has a 700-watt power envelope, then the next one doubles that. It’s like a space heater. I don’t see how that problem gets resolved easily,” says the analyst.  

The reign of GPUs in genAI and other forms of AI could be challenged by an emerging host of AI co-processors, and eventually perhaps, by quantum computing.

“The GPU was invented for graphics processing so it’s not AI-optimized. Increasingly, we’ll see AI-specialized hardware,” predicts Claus Torp Jensen, former CIO and CTO and currently a technology advisor. Although he does not anticipate the disappearance of GPUs, he says future AI algorithms will be handled by a mix of CPUs, GPUs, and AI co-processors, both on-premises and in the cloud.

Another factor working against unmitigated power consumption is sustainability. Many organizations have adopted sustainability goals, which power-hungry AI algorithms make it difficult to achieve. Rutten says using SLMs, ARM-based CPUs, and cloud providers that maintain zero-emissions policies, or that run on electricity produced by renewable sources, are all worth exploring where sustainability is a priority.

For implementations that require large-scale workloads, using microprocessors built with field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs) are a choice worth considering.

“They are much more efficient and can be more powerful. You have to hardware-code them up front and that takes time and work, but you could save significantly compared to GPUs,” says Rutten.

Until processors that run significantly faster while using less power and generating less heat emerge, the GPU is a stubborn fact of life for generative AI, and implementing cost-effective genAI implementations will require ingenuity and perseverance. But as Etimadi and Ulmasov demonstrate, the challenge is not beyond the reach of strategies utilizing small language models and a skillful mix of on-premises and cloud-based services.


Read More from This Article: Getting infrastructure right for generative AI
Source: News

Category: NewsJune 3, 2024
Tags: art

Post navigation

PreviousPrevious post:El gasto mundial en transformación digital rozará los 4 billones de dólares en 2027NextNext post:3 methods to forge stronger business partner alliances

Related posts

AI agent adoption and the future of the enterprise
June 19, 2025
Democratizing the data lifecycle: Data visualization tools that drive impact
June 19, 2025
What does it take to unlock true hybrid?
June 19, 2025
CDP is now a must-have in 2025’s MoSCoW
June 19, 2025
Los empleados públicos del Gobierno andaluz usarán una IA propia
June 19, 2025
How Babson College went all-in on AI in higher education
June 19, 2025
Recent Posts
  • AI agent adoption and the future of the enterprise
  • Democratizing the data lifecycle: Data visualization tools that drive impact
  • What does it take to unlock true hybrid?
  • CDP is now a must-have in 2025’s MoSCoW
  • Los empleados públicos del Gobierno andaluz usarán una IA propia
Recent Comments
    Archives
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.