All industries and modern applications are undergoing rapid transformation powered by advances in accelerated computing, deep learning, and artificial intelligence. The next phase of this transformation requires an intelligent data infrastructure that can bring AI closer to enterprise data.
The challenges of integrating data with AI workflows
When I speak with our customers, the challenges they talk about involve integrating their data and their enterprise AI workflows. The core of their problem is applying AI technology to the data they already have, whether in the cloud, on their premises, or more likely both.
Imagine that you’re a data engineer. You pull an open-source large language model (LLM) to train on your corporate data so that the marketing team can build better assets, and the customer service team can provide customer-facing chatbots. The data is spread out across your different storage systems, and you don’t know what is where. You export, move, and centralize your data for training purposes with all the associated time and capacity inefficiencies that entails. You build your model, but the history and context of the data you used are lost, so there is no way to trace your model back to the source. And all of that data is stored on premises, but your training is taking place on the cloud where your GPUs live.
These challenges are quite common for the data engineers and data scientists we speak to. NetApp is already addressing many of these challenges. However, as model training becomes more advanced and the need increases for ever more data to train, these problems will be magnified.
What does the next generation of AI workloads need?
As the next generation of AI training and fine-tuning workloads takes shape, limits to existing infrastructure will risk slowing innovation. Some challenges include data infrastructure that allows scaling and optimizing for AI; data management to inform AI workflows where data lives and how it can be used; and associated data services that help data scientists protect AI workflows and keep their models clean.
Scalable data infrastructure
As AI models become more complex, their computational requirements increase. Enterprises need infrastructure that can scale and provide the high performance required for intensive AI tasks, such as training and fine-tuning large language models. At the same time, optimizing nonstorage resource usage, such as maximizing GPU usage, is critical for cost-effective AI operations, because underused resources can result in increased expenses. Maximizing GPU use is critical for cost-effective AI operations, and the ability to achieve it requires improved storage throughput for both read and write operations. And finally, training data is typically stored on the premises, while AI models are often trained in the cloud, meaning that AI workloads often span across on-premises and various cloud environments. This means that the infrastructure needs to provide seamless data mobility and management across these systems.
Universal data management
AI workloads often require access to vast amounts of data, which can be scattered across an enterprise in different systems and formats. This challenge becomes even greater as businesses use their proprietary data spread across their data infrastructure for fine-tuning and retrieval-augmented generation (RAG) use cases. Data silos make it difficult to aggregate and analyze data effectively for AI. Managing the lifecycle of AI data, from ingestion to processing to storage, requires sophisticated data management solutions that can manage the complexity and volume of unstructured data. For AI to be effective, the relevant data must be easily discoverable and accessible, which requires powerful metadata management and data exploration tools.
Intelligent data services
With the rise of AI, there is an increasing need for robust security and governance to protect sensitive data and to comply with regulatory requirements, especially in the face of threats like ransomware. Models built from poisoned data or intentional tampering have the potential to cause great harm to business operations that increasingly rely on AI. As with any enterprise workload, data needs to be available and protected from natural disasters and system outages in order to continue operations and prevent costly downtime.
How NetApp supports AI workloads today
Today, NetApp is a recognized leader in AI infrastructure. For well over a decade, innovative customers have been extracting AI-powered insights from data managed on NetApp solutions. As a long-time partner with NVIDIA, NetApp has delivered certified NVIDIA DGX SuperPOD and NetApp® AIPod™ architectures and has seen rapid adoption of AI workflows on first-party cloud offerings at the hyperscalers. As the leader in unstructured data storage, customers trust NetApp with their most valuable data assets.
How did we achieve this level of trust? Through relentless innovation. As customers entrust us with their data, we see even more opportunities ahead to help them operationalize AI and high-performance workloads. That’s why we’re introducing a new disaggregated architecture that will enable our customers to continue pushing the boundaries of performance and scale. An enhanced metadata management engine helps customers understand all the data assets in their organization so that they can simplify model training and fine tuning. An integrated set of data services helps manage that data and infrastructure, protecting it from natural and human-made threats. It’s all built on NetApp ONTAP®, the leading unified storage architecture, to provide a unified architecture that integrates all of your data infrastructure.
The core DNA of NetApp has always enabled us to evolve and adopt new technologies while maintaining the robust security, enterprise features, and ease of use that our customers depend on. I’m excited to give you a preview of what’s around the corner for ONTAP.
NetApp’s vision for data management to drive AI
Our vision of a unified AI data management engine will revolutionize how organizations approach and harness the power of AI. Our data management engine will be designed to eliminate data silos by providing a unified view of data assets, automating the capture of changes in data for rapid inferencing, and tightly integrating with AI tools for end-to-end AI workflows. NetApp is also innovating at the infrastructure layer with scalable, high-performance systems and at the intelligence layer with policy-based governance and security.
Planned innovations:
- Disaggregated storage architecture. To enhance system throughput and reduce costs, NetApp is developing a storage architecture that enables more efficient sharing of storage back ends. This architecture aims to optimize the utilization of network and flash resources, allowing a more flexible and cost-effective approach to storage. This innovation will facilitate a significant improvement in aggregate throughput across the cluster, while simultaneously reducing rack space and power utilization. The architecture is designed to enable independent scaling of computing and storage resources, which is particularly beneficial for AI workloads that require high levels of flexibility and scalability.
- Performance enhancements. NetApp is committed to delivering industry-leading performance through its upcoming enhancements to the NetApp AFF series systems and the ONTAP software. These improvements are geared toward managing the most intense AI workloads with ease so that enterprises can execute their AI strategies without performance bottlenecks. The enhancements will include advanced capabilities for managing and processing large datasets, which are essential for tasks such as generative AI and LLM training.
- Seamless data integration. The AI data management engine is designed to offer a cohesive and comprehensive view of an organization’s data assets. This unified approach is critical for the integration of data across on-premises settings, cloud environments, and hyperscaler platforms. By facilitating seamless data integration, NetApp enables organizations to manage the complete lifecycle of AI data more effectively, from initial data collection to model deployment and analysis. With this new AI data management engine, only NetApp will be able to offer customers a unified, structured, query-able view of all their ONTAP data assets. That’s true whether the data assets are structured or unstructured, wherever they are—on-premises, in the cloud, or across on premises and any of our hyperscaler partners, Amazon, Microsoft, and Google.
- Vector embedding and databases. The AI data management engine will automatically capture changes to your data, generate highly compressed vector embeddings, and store them in an integrated vector database, making that data available for searches and RAG inferencing workloads. All of this is done automatically, inline, and in place for simplicity and efficiency.
- AI ecosystem integration. Recognizing the significance of a unified AI workflow, we are focusing on integrating our data services with the broader AI tool ecosystem. This integration will streamline the entire AI workflow, from data labeling and model training to orchestration and deployment. By creating a seamless workflow, we are helping organizations reduce the complexity of AI projects and accelerate time to value.
- Responsible AI. With the growing awareness of the ethical implications of AI, NetApp is placing a strong emphasis on responsible AI practices. The company is developing integrated model data traceability and governance features that will enable organizations to implement AI solutions that are not only effective but also ethical and transparent.
Conclusion
At NetApp, we foresee a future in which data scientists can sit down at their AI tool of choice and fine-tune a model using a catalog of data that covers their entire data estate. They won’t need to know where it’s stored; the catalog will have that detail. Additionally, the catalog will even block data that is too sensitive for model training.
Training data will be captured in state with a space-efficient point-in-time NetApp Snapshot™ copy, allowing data scientists to return and analyze the data in its original state if they need to understand a model’s decisions. They will be able to do all of this from the cloud of their choice, regardless of whether the training data is in the same cloud, another cloud, or stored on-premises.
Meanwhile, the infrastructure that serves the data will provide the scale and performance needed to fully saturate the rest of the AI infrastructure, optimizing the use of critical resources and delivering fine-tuned models quickly. This future is not far-fetched or far off; NetApp has already built much of this infrastructure and is building for the next stage of AI today.
We are unwavering in our pursuit to advance the capabilities of ONTAP, aiming to meet and exceed the demands of AI-driven enterprises. By creating a unified data environment, enhancing AI tool integration, automating intelligent data management, and prioritizing performance and scalability, we are reinforcing our leadership position in data storage and management for AI.
These strategic advances are designed to simplify AI project complexities, expand data accessibility, enhance data availability and security, and reduce associated costs, thereby making AI technologies more accessible to diverse organizations.
To learn more about the coming developments for NetApp ONTAP and our AI data management engine, read the whitepaper: ONTAP – pioneering data management in the era of Deep Learning.
Learn from the SVP of Platform, Krish Vitaldevara, at INSIGHT 2024 as he shares how NetApp is making your infrastructure AI-ready. Discover our intelligent data services and innovations for secure, agile AI
Read More from This Article: Comprehensive data management for AI: The next-gen data management engine that will drive AI to new heights
Source: News