The early bills for generative AI experimentation are coming in, and many CIOs are finding them more hefty than they’d like — some with only themselves to blame.
“We’re getting back into this frenetic spend mode that we saw in the early days of cloud,” observed James Greenfield, vice president of AWS Commerce Platform, at the FinOps X conference in San Diego in June.
J.R. Storment, executive director of the FinOps Foundation, echoed the concern.
“It’s very reminiscent of the early days of cloud when it was ‘weapons-free’ on spending with everyone trying to implement cloud — and now genAI — everywhere but with little to no cost control or governance,” he says.
To counteract this, and in anticipation of further forays with the technology, some CIOs are exploring a range of technologies and methods to curb the cost of generative AI experimentation and applications.
According to IDC’s “Generative AI Pricing Models: A Strategic Buying Guide,” the pricing landscape for generative AI is complicated by “interdependencies across the tech stack.” But there are no ways around the significant premium pricing being placed on generative AI workloads because the core infrastructure necessary to train and tune generative AI models are “largely supplied by one company: Nvidia,” IDC notes.
As customers await more ample supply of GPUs, many are looking to AI-specific service providers, as well as public and private cloud offerings for hosting genAI workloads, including Nvidia’s cloud, AWS Trainium and Infertia, and Google Tensor Processor Units, according to IDC Market Glance: Generative AI Foundation Models. CIOs are also turning to OEMs such as Dell Project Helix or HPE GreenLake for AI, IDC points out.
The AI service providers, sometimes dubbed AI hyperscalers, offer GPU-as-a-service, enabling enterprises to purchase GPU power on demand to limit spending. These AI service providers include CoreWeave, Equinix, Digital Realty, and Paperspace, as well as GPU leader Nvidia and, to an extent, cloud hyperscalers Microsoft, Google, and AWS.
IBM, Oracle, Dell, and Hewlett Packard Enterprise also offer GPU-as-a-service.
Given Nvidia’s overwhelming dominance in the GPU market, CIOs are looking at GPUaaS alternatives now rather than wait for the other top chip companies to catch up. This on-demand approach also vastly reduces the upfront costs of buying processors and scales up or down based on workload, notes Tom Richer, a former CIO and current CEO of CloudBench, a Google Partner and CIO consultancy.
“To meet CIO needs, vendors will offer various options like virtual machine instances with different GPU configurations and spot instances for discounted compute power,” Richer says, adding that containerized AI frameworks can also help IT leaders ensure efficient resource utilization. “By understanding their options and leveraging GPU-as-a-service, CIOs can optimize genAI hardware costs and maintain processing power for innovation.”
Richer also believes that cloud-based GPU access will help enterprises free up IT resources for other critical tasks and “potentially streamline the development process for genAI projects.”
The cost equation
But CIOs zeroing in on GPUaaS — and other cloud-based generative AI solutions — are likely to face familiar problems when it comes to cost containment, FinOps’ Storment argues.
“We’ve already seen the cost of AI really start to negatively impact cloud budgets,” he says. “In the end, it’s still unclear to many CIOs though of the value they are getting out of the AI experimenting, so we’re seeing the costs of AI spiraling for many and feeding a wave of interest in how to do ‘FinOps for AI’ by applying cost visibility principles already prevalent in FinOps for other cloud costs.”
Brian Shield, CTO of the Boston Red Sox and Fenway Sports Management, say that to keep costs in check CIOs should selectively deploy genAI solutions to key areas of the business and implement a thoughtful genAI evaluation process to prevent overlap and proliferation.
Shield is also looking to negotiate cost based on the quality of the output. “I have proposed paying genAI vendors on a per-use-case basis. In other words, if the tool performs well, that is, production-worthy, I’ll pay you X. For solutions with less than 90% accuracy, if there are still viable use cases, I’ll pay you Y,” Shield says. “If you can improve your tool, I’ll move you to the higher-paid group. All vendors balk, but I’m still in conversations with others.”
John Marcante, US CIO in Residence at Deloitte and former Global CIO at Vanguard, sees innovation in the marketplace also coming to CIOs’ aid.
“The heart of generative AI lies in GPUs. These chips are evolving rapidly to meet the demands of real-time inference and training. As we delve deeper into this innovation cycle, expect GPUs to become even more efficient, capable, and specialized for AI workloads,” he says.
GPU-as-a-service providers and platforms are also beginning to offer turnkey solutions for marketing, finance, legal, and client processes to enable businesses to focus on their core competencies, Marcante says.
He points out that some organizations will build their own generative AI platforms, tailoring them to their unique requirements. “This approach ensures ownership and customization,” he says, noting that collaborating with AI providers in the same way enterprises partner with cloud providers today is another path. “These models will range from renting GPUs to comprehensive full-stack AI services.”
Models make a difference
The rapid acceleration, experimentation, and evolution of large language models (LLMs) has also provided insights about tailoring outcomes and reducing costs.
For example, CIOs on a budget can reduce generative AI costs by using open-source models, such as OpenAI and Lambda, which can be accessed from various marketplaces and offer several advantages, says Bern Elliott, a distinguished analyst at Gartner.
“Open source is one way CIOs can definitely keep the costs low,” he says, pointing out that open-source models are also transparent and can be customized. “For many enterprises, that’s where the cost is. If the cost of running it is low, the margins become better.”
Making use of smaller, domain-specific models for smaller scope tasks is another way CIOs are curbing the cost of generative AI.
“If you look at the GPUs, they are incredibly expensive, and especially if you’re a large language model. Everyone’s on a journey to figure out what the right answer is for them, because the answer of not using genAI is not on the table,” says Chris Bedi, chief digital information officer at ServiceNow. “Having domain-specific models helps keep our costs under control, which then we’re able to pass along that benefit to our customers.”
RunPod is a GPU-as-a-service for developers that is very cost effective for universities and startups. Students at OpenCV University and an AI consultancy spinoff, for instance, use RunPod to train AI models, says Satya Mallick, Ph.D. and CEO of OpenCV.org.
“For a small business like ours that needs multiple high-end GPUs for only a few days to a few weeks at a time, RunPod’s solution is extremely cost-effective as we do not incur the huge upfront cost of purchasing GPUs,” Mallick says, noting that his team is also evaluating RunPod’s serverless product.
Energy consumption: Another cost consideration
CIOs are also mindful of the massive energy consumption of generative AI applications, which is another hefty cost to consider.
“AI is compute-intensive and it’s impacting data centers globally,” says Bryan Muehlberger, former CIO at Vuori and currently CTO of Schumacher Homes. “Unless we solve our energy problems nationally, this will eventually become a much bigger problem and the costs will be passed down the companies using the services.”
As AI continues to evolve, innovative solutions for managing hardware costs and maximizing processing power are likely to emerge, Cloudbench’s Richer says, adding that the environmental impact of running powerful GPUs will be a concern for some organizations.
“Cloud providers are increasingly focusing on sustainable practices, and utilizing cloud-based GPUs can be a more energy-efficient solution compared to on-premise hardware,” says Richers, “However, it’s crucial for CIOs to carefully evaluate the trade-offs between cost, performance, and data security when choosing a GPUaaS solution.”
Other CIOs are enjoying the cost-saving benefits of their enterprise license agreements with major cloud and AI providers such as Microsoft, Google, and AWS.
“We have chosen MS Copilot for wide applicability, which we believe will cater to about 80% of our use cases,” says Bob Brizendine, CIO of American Honda. “This is part of our existing licensing agreement with Microsoft, allowing us to streamline costs effectively. Others may not be in that same situation.”
Read More from This Article: GenAI sticker shock sends CIOs in search of solutions
Source: News