For IT leaders, the question of where to run AI workloads and how to do so affordably are fast becoming top of mind — especially at scale. But for Rob Clark, president and CTO of AI developer Seekr, such questions are business-critical.
Seekr’s main business is building and training AIs that are transparent to enterprise and other users. The company needs massive computing power with CPUs and GPUs that are optimized for AI development, says Clark, adding that Seekr looked at the infrastructure it would need to build and train its huge AI models and quickly determined that buying and maintaining the hardware would be prohibitively expensive.
So, Seekr signed on with a regional colocation provider to host several GPU- and CPU-enabled systems. But with a growing number of customers and the increasing size of AI models, Clark and company recognized the need to shift to a cloud provider that could scale to its needs.
“We really began last year looking at what it would really take in terms of hardware to scale our business,” Clark says. “We were looking for like-minded, leading-edge, AI-focused hardware at the same time.”
The company quickly gravitated to the Intel Developer Cloud, a sandbox and infrastructure service specifically aimed at developers looking to build and deploy AI at scale. And when Seekr began comparing the cost of Intel Developer Cloud to running AI workloads itself or using another cloud service, the cost savings ranged from about 40% percent to about 400%, Clark says.
Benchmarking the cloud
The Intel Developer Cloud, launched in September, has become a popular platform for developers to both evaluate Intel hardware and deploy AI workloads. Since launch, it has attracted more than 16,000 users, according to Intel.
Intel’s cloud gives developers access to thousands of the latest Intel Gaudi AI accelerator and Xeon CPU chips, combined to create a supercomputer optimized for AI workloads, Intel says. It is built on open software, including Intel’s oneAPI, to support the benchmarking of large-scale AI deployments.
After it began evaluating cloud providers in December, Seekr ran a series of benchmarking tests before committing to the Intel Developer Cloud and found it resulted in 20% faster AI training and 50% faster AI inference than the metrics the company could achieve on premises with current-generation hardware.
“Ultimately for us, it comes down to, ‘Are we getting the latest-generation AI compute, and are we getting it at the right price?’” Clark says. “Building [AI] foundation models at multibillion-parameters scale takes a large amount of compute.”
Intel’s Gaudi 2 AI accelerator chip has previously received high marks for performance. The Gaudi 2 chip, developed by the Intel acquired Habana Labs, outperformed Nvidia’s A100 80GB GPU in tests run in late 2022 by AI company Hugging Face.
Seekr’s collaboration with Intel isn’t all about performance, however, says Clark. While Seekr needs cutting-edge AI hardware for some workloads, the cloud model also enables the company to limit its use to just the computing power it needs in the moment, he notes.
“The goal here is not to use the extensive AI compute all of the time,” he says. “Training a large foundation model versus inferencing on a smaller, distilled model take different types of compute.”
Seekr’s use of Intel Developer Cloud demonstrates how enterprises with demanding AI workloads can run on the most recent Intel accelerators, says Markus Flierl, corporate vice president of Intel Developer Cloud. By using several AI-optimized compute instances, Seekr shows how companies can cut deployment costs and increase performance of application workloads, he adds.
As for Intel’s cloud, its thousands of users run a variety of workloads, according to Flierl. “Workloads range from high-end training to fine-tuning and inferencing across a range of models,” he says.
Clark says Seekr will migrate its AI workloads to Intel Developer Cloud in a phased deployment. As part of the integration phase, Seekr is planning to use large-scale Gaudi 2 clusters with Intel’s Kubernetes Service to train its large language model (LLM) AI, and the company will add CPU and GPU compute capacity to develop and train trusted AI models for customers.
Test before buying
While Seekr has had a positive experience with Intel, Clark says CIOs and other tech leaders should do their homework before committing to a cloud contract for large workloads.
First, they should benchmark the workloads in the cloud and compare the results to the alternatives. They should test the cloud while running the workloads that matter most to their companies, he advises.
CIOs should also ask the cloud provider for a hardware roadmap, Clark adds. “It’s not just where are we now, but also where is this going?” he says. “When’s the next generation coming out?”
Finally, CIOs should pay close attention to the support they will receive from the cloud provider, Clark says. “Every cloud needs that level of reliability,” he adds. “What are you getting for your spend in terms of enterprise-class support?”
He recommends enterprises conduct an incident test, or “a little chaos engineering,” before signing a contract. “As mischievous as it sounds, it really shows you what you’re getting in the SLA,” he says. “There’s no doubt in my case.”
Artificial Intelligence, Cloud Computing, Intel, Technology Industry
Read More from This Article: Seekr finds the AI computing power it needs in Intel’s cloud
Source: News