Enterprises moving their artificial intelligence projects into full scale development are discovering escalating costs based on initial infrastructure choices. Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex.
The reality is that the cloud is not a hammer that should be used to hit every AI nail. The cloud is great for experimentation when data sets are smaller and model complexity is light. But over time, data sets and AI models grow more complex as companies seek greater accuracy from the models. Data gravity creeps in generated data is kept on premises and AI training models remain in the cloud’; this causes escalating costs in the form of compute and storage, and increased latency in developer workflow.
In the IDC 2020 Cloud Pulse Survey, 84% of businesses said they were repatriating workloads from the public cloud back to on-premises infrastructure due to data gravity, concerns about security and sovereignty, or the need for a higher frequency of model training.
Potential headaches of DIY on-prem infrastructure
However, this repatriation can mean more headaches for data science and IT teams to design, deploy and manage infrastructure optimized for AI as the workloads return on premises. Often the burden of platform development can fall on data science and developer teams who know what they need for their projects, but whose skills are better served focusing on experimentation with algorithms instead of systems development.
“When data scientists and developers spend cycles doing systems integration, software stack engineering and IT support, they are spending precious OpEx on things that you’d rather they didn’t,” says Tony Paikeday, senior director of AI systems at NVIDIA.
Time and budget spent on things other than data science include tasks such as:
- Software engineering
- Platform design
- Hardware and software integration
- Troubleshooting
- Software optimization
- Designing and building for scale
- Continual software re-optimization
- Designing for scale
“Taking a DIY approach to platform and tools ends up getting overshadowed by the sweat equity spent on things that have nothing to do with data science, which ultimately delays the ROI of AI,” says Paikeday.
Alternate approach: Colocation services for AI infrastructure
Companies looking for an alternative to on-premises or cloud-only environments should consider colocation-based managed services for high-performance AI infrastructure. These services offer ease of access, as well as infrastructure experts who can ensure 24/7/365 uptime with secure on-demand resource delivery in a convenient OpEx-based model.
Companies such as Cyxtera, Digital Realty and Equinix, among others, offer hosting, managing and operations services for AI infrastructure. Paikeday says it’s like handing the keys of a car to a chauffeur: You get the benefits of the ride without having to worry about the actual driving, maintenance and management.
The NVIDIA DGX Foundry solution, which is offered through Equinix, gives data scientists a premium AI development experience without the struggle. The solution includes NVIDIA Base Command software to manage developer workflow and resource orchestration, and access to fully managed NVIDIA infrastructure based on the DGX SuperPOD architecture, available for rent.
“Organizations that may be fearful of the technology churn and the pace of innovation happening in computing infrastructure should consider services like DGX Foundry delivered in a colocation facility,” says Paikeday. “Through this OpEx-based approach, you can procure a super-scaled, high-performance infrastructure that is dedicated and carved out for you, delivered with the simplicity and ease of access of cloud, and without any burden on your IT team.”
Click here to learn about how colocation services can give you the benefits of an AI infrastructure without all of the heavy lifting, with NVIDIA DGX Systems, powered by DGXA100 Tensor core GPUs and AMD EPYC CPUs.
About Keith Shaw:
Keith is a freelance digital journalist who has written about technology topics for more than 20 years.
Cloud Architecture, IT Leadership
Read More from This Article: Your New Cloud for AI May Be Inside a Colo
Source: News