Cloud, sustainability, scale, and exponential data growth—these major factors that set the tone for high performance computing (HPC) in 2022 will also be key in driving innovation for 2023. As more organizations rely on HPC to speed time to results, especially for their data-intensive applications, the $40B market[1] faces challenges and opportunities. Fortunately, the HPC community is both collaborative and transparent in our work to apply and advance supercomputing technologies.
Recently members of our community came together for a roundtable discussion, hosted by Dell Technologies, about trends, trials, and all the excitement around what’s next.
Answers to Supercomputing Challenges
We identified the following challenges and here provide leading thoughts on each concern.
Sustainability: As the HPC market grows, so do the implications of running such energy-intensive and complex infrastructure. In an effort to achieve sustainability, industry leaders are prioritizing ways to reduce CO2 impact and even decarbonize HPC—not an easy task with total power usage increasing. What’s motivating our move to energy reduction:
- A dependence on expensive fossil fuels that makes supercomputing more costly overall
- The high and unfavorable environmental impact of fossil fuel-based energy
- Pressure from governmental agencies to design more efficient solutions and data centers
Though HPC customers want to measure their own energy usage, the tools to do so may not yet offer sufficient metrics at the application scale. Alternatively, some in the European Union are shifting their priority from trying to run operations faster, to trying to run them with lower power consumption. This calls for experimenting with more efficient choices like running workloads with different chips.
Cooling: Over the next few years, we will see an increase in the use of silicon processors and accelerators which require significantly more power—and thus generate more heat. As leaders in the HPC industry, we are worried about how to cool these data centers.
Many are looking at innovative data center designs including modular centers and colocation. Another big focus is on liquid cooling.[2] Direct liquid cooling offers superior thermal management and five times the cooling capacity of air flow. Immersion cooling leverages oil-based engineered fluids for a high performance, complex cooling solution. Deployed successfully around the globe, liquid cooling is becoming essential to future proofing data centers.
Scaling out and developing large-scale systems: To meet demand, the HPC industry is developing and honing strategies to effectively scale and deploy large systems that are both efficient and reliable. It’s a tall order, and one that will hinge on a few factors:
- Accelerator deployment and management at scale
- Changes to power and cooling design decisions at very large scale
- Open-source deployment of high-performance clusters to run simulation, AI, and data analytics workloads
What’s New and Growing Among HPC Users?
In the HPC industry, we are experiencing and driving massive shifts in terms of what we do, and how and where we do it. Here are the shifts we noted:
Delivery Models: Moving from an almost strictly on-premises systems approach (with some remote access services), HPC is embracing remote delivery models. Customer interest in HPC delivery models like colocation, managed services, and cloud computing is driven by tremendous growth in service-based models (including IaaS/PaaS/SaaS) along with on-demand and subscription payment models. Of course, data center challenges are also driving demand for these alternatives. New solutions, including Dell APEX HPC Services and HPC on Demand, address these customer requirements and wants.
Workflow Optimization from Edge to Core/Cloud and Back: Integration with edge devices as well as integration with different HPC systems is currently designed in-house or otherwise customized. Moving forward, we will see workflows that are more capable and widely adopted to facilitate edge-core-cloud needs like generating meshes, performing 3D simulations, performing post-simulation data analysis, and feeding data into machine learning models—which support, guide, and in some case replace the need for simulation.
Advances in Artificial Intelligence and Machine Learning (AI/ML): AI/ML will continue growing as an important workload in HPC. Due to the rapid growth in data sizes, there’s an increased need for HPC solutions that can run large training models. At the same time, these models can complement simulation, guiding targets or reducing parameter space for some problems. In the HPC community, we recognize a need for tools to support machine learning operations and data science management; these tools must be able to scale and integrate with HPC software, compute and storage environments.
Data Processing Units: We anticipate a jump in DPU usage but must support customers in figuring out which use cases offer quantifiable advantages in price/performance and performance/watt. It’s important to note that more research and benchmark comparisons are needed to help customers make the best decisions. Some examples of when DPUs can be advantageous for HPC workloads include:
- Collective operations
- Bare metal provisioning for maximum HPC performance by moving the hypervisor to the DPU, freeing up CPU cycles
- Improving communications by task offload; If codes are task-based, the user can potentially move tasks to less busy nodes
Composable Infrastructure: We note the benefits in resource utilization offered by composable infrastructure, but still see uncertainty about whether it’s the future or a niche. As with DPUs, more research and quantifiable comparisons are needed to support customers in determining whether composable infrastructure is right for their next system. Yes, specific AI workflows require special hardware configurations. Though composable infrastructure may remove the restrictions of traditional architectures, there’s debate[3] about whether it can scale and whether the ROI would be reached by increased flexibility and utilization.
Quantum computing:
As an HPC community, we share a growing consensus that quantum computing systems (QC) will and must be integrated with ‘classical’ HPC systems. QC systems are superior only at certain types of calculations and thus may best serve as accelerators. At Dell Technologies we have developed a hybrid classical/quantum platform that leverages Dell PowerEdge servers with Qiskit Dell Runtime, and IonQ Aria quantum processing units. With the platform, classical and quantum simulation workloads can execute on‐premises, while quantum workloads, such as modeling larger and more complex molecules for pharmacological development, can be executed with IonQ QPUs.[4]
The HPC Outlook for 2023
The impressively large HPC market continues to grow at a healthy rate, fueled by commercial demand for data processing and AI/ML training. HPC workloads and delivery models are more diverse than ever, leading to a more diverse customer community. And in HPC, community is important. Though we face some of the greatest challenges in application, system and data center scaling, HPC technologies remain at the leading edge of computing.
As a community, we keep sharing information and we stay on top of developments to realize the maximum benefits of HPC. A robust resource for sharing, learning and networking can be found with the Dell HPC Community, found at dellhpc.org. This global community of HPC customers, users and companies comes together for weekly online events that are open to all as well as in-person meetings scheduled three times a year.
Engage with the Dell HPC community by visiting dellhpc.org.
[1] https://www.hpcwire.com/2022/05/30/hyperion-hpc-market-is-stabilizing-and-headed-to-50b-by-2026/
[2] https://www.dell.com/en-us/dt/servers/power-and-cooling.htm#tab0=0
[3] https://sc22.supercomputing.org/presentation/?id=pan117&sess=sess183
[4] https://www.delltechnologies.com/asset/en-us/products/ready-solutions/briefs-summaries/hybrid-quantum-solution-brief.pdf.external
***
Intel® Technologies Move Analytics Forward
Data analytics is the key to unlocking the most value you can extract from data across your organization. To create a productive, cost-effective analytics strategy that gets results, you need high performance hardware that’s optimized to work with the software you use.
Modern data analytics spans a range of technologies, from dedicated analytics platforms and databases to deep learning and artificial intelligence (AI). Just starting out with analytics? Ready to evolve your analytics strategy or improve your data quality? There’s always room to grow, and Intel is ready to help. With a deep ecosystem of analytics technologies and partners, Intel accelerates the efforts of data scientists, analysts, and developers in every industry. Find out more about Intel advanced analytics.
IT Leadership
Read More from This Article: What’s New and What’s Next in 2023 for HPC
Source: News