Chinese AI startup DeepSeek made a big splash last week when it unveiled an open-source version of its reasoning model, DeepSeek-R1, claiming performance superior to OpenAI’s o1 generative pre-trained transformer (GPT).
The news caused NVIDIA, leading maker of GPUs used to power AI in data centers, to shed nearly $600 billion of its market cap on Monday because DeepSeek’s innovations, according to Gartner, appear to use significantly less advanced hardware and computing power resources, while still offering performance comparable to other leading LLMs at a faction of the cost.
CIOs are now reassessing the strategies to transform their organizations with gen AI, but it’s not exactly time to throw out the work that’s already been done.
“DeepSeek’s advancements could lead to more accessible and affordable AI solutions, but they also require careful consideration of strategic, competitive, quality, and security factors,” says Ritu Jyoti, group VP and GM, worldwide AI, automation, data, and analytics research with IDC’s software market research and advisory practice.
That echoes a statement issued by NVIDIA on Monday: “DeepSeek is a perfect example of test time scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking.”
Open to interpretation
Chirag Dekate, a VP analyst at Gartner who specializes in quantum technologies, AI, digital R&D, and emerging tech, believes the market is overreacting to both technical details of what was required to train DeepSeek, and the source of the innovation itself.
“It feeds into this perception of us versus some unknown them, and also into a narrative of jingoism or nationalism,” he says. “These narratives are taking hold because they capture the imagination faster than anybody actually double-clicking into the technical report, because when they see the details, they’re less glamorous than the headlines made them out to be.”
That’s not to disregard DeepSeek’s innovations, however. In a research note, Gartner said DeepSeek challenges the prevailing gen AI cost structures and methodologies, underscoring the inefficiencies in current leading vendor pricing models that can lead to negative ROI for high-value use cases deployed at scale.
“DeepSeek’s R1 model thus represents a pivotal shift, suggesting that the future of gen AI lies in innovative, cost-efficient approaches rather than the traditional paradigm of scaling through sheer computational force,” Gartner researchers, including Haritha Khandabattu, Jeremy D’Hoinne, Rita Sallam, Leinar Ramos, and Arun Chandrasekaran, wrote in a research note Wednesday.
Peter Rutten, research VP, performance intensive computing, and worldwide infrastructure research at IDC, says the key takeaway from the DeepSeek results is the current approach to AI training — that AI can only improve with bigger, more, and faster architecture — is not justified.
“New approaches to algorithm, framework, and software for AI development deliver comparable or even better results than, for example, the latest version of ChatGPT, with the same accuracy and at a fraction of the infrastructure cost,” says Rutten. “What this means is that AI training doesn’t need to be the sole domain of hyperscalers who can afford to invest billions of dollars into large infrastructure buildouts.”
Instead, he adds, the approach DeepSeek developed shows that large AI development is within reach for enterprises from a cost and footprint perspective.
“Medium-sized or small AI initiatives also become significantly more affordable, including customizing or finetuning a model, as well as inferencing on a model,” he says. “I believe AI will become affordable — perhaps, over time, as affordable as any other workload, thanks to the type of technologies that DeepSeek developed.”
Deep interest for CIOs
Dekate believes the DeepSeek news is yet another reminder of the speed at which AI innovation is accelerating, and that CIOs need to engage with gen AI now, if they haven’t already, or risk becoming obsolete.
“CIOs have a choice to either jump in, start experimenting, start creating gen AI strategies, implementation, and deployment strategies today, or fall so far behind that catching up isn’t even an option,” he says.
Even if the market is overreacting to the degree to which DeepSeek disrupts the current gen AI landscape, Dekate says it’s a clear sign CIOs can’t afford to wait any longer.
“DeepSeek is showcasing that the cost vectors of gen AI will eventually become more effective and more approachable,” he says.
IDC’s Jyoti notes that Kai-Fu Lee, chairman and CEO of Sinovation Ventures, who was the founding director of Microsoft Research Asia and is former president of Google China, predicted last year that Chinese AI startups would focus on creating efficiencies.
“Digging through their secret sauce, it’s evident it’s all about RL [reinforcement learning] and how they used it,” Jyoti adds. “Most language models use a combination of pre-training, supervised fine-tuning, and then some RL to polish things up. DeepSeek’s approach has showed that LLMs are capable of reasoning with RL alone.”
Making the distinction
DeepSeek-R1 is a new open-weight LLM based on the DeepSeek-V3 base model. DeepSeek-R1-Zero is an interim model trained solely via RL. Gartner says it demonstrates that model providers can use pure RL to increase capabilities in certain domains, like math and coding, where answers are hard to generate but easy to verify.
But Gartner researchers said the DeepSeek model doesn’t represent a new model paradigm. Rather it builds on the existing LLM training architecture, layering on technical and architectural optimizations to make training and inference more efficient. Nor does DeepSeek set a new state-of-the-art for model performance. The Gartner researchers added it often matches, but doesn’t surpass, existing state-of-the-art models. They also said DeepSeek isn’t proof that scaling models via additional compute and data doesn’t matter. Instead, it shows it pays off to scale a more efficient model.
“DeepSeek’s R1 launch and its dramatically lower inference pricing compared to OpenAI’s o1-preview model go hand in hand with the broader commoditization of the LLM model layer, they wrote. “That means efficiency isn’t about cost per token anymore,” the researchers added. “It’s about which model can reason the cheapest, without impacting accuracy and latency. So the focus will soon turn to efficient scaling of AI versus how much compute you can assemble to build it.”
The researchers, agreeing with their colleague Dekate, note that in the wake of the DeepSeek announcement, other model builders like Meta are in their war rooms devising plans to follow. CIOs, therefore, should expect a rapid short- to mid-term reduction in the cost and price of LLMs, but only to a degree.
“These software and algorithmic-driven innovations also allow model vendors to do more with more powerful hardware,” they wrote. “The most advanced new models will still have high R&D and compute costs that’ll be passed on to early adopters.”
IDC’s Jyoti offers five key takeaways for CIOs:
- Cost efficiency: DeepSeek’s AI models claim to achieve high performance at a faction of the cost compared to traditional models. This could mean that companies might not need to invest as heavily in infrastructure and hardware, potentially lowering the barriers to entry for advanced AI capabilities.
- Competitive landscape: DeepSeek’s emergence as a strong competitor to established AI giants like OpenAI and Meta suggests the AI landscape is becoming more competitive. This could drive innovation and force existing players to improve their offerings and reduce costs.
- Open-weight models: DeepSeek’s decision to release its models as “open-weight” allows developers and researchers to access and build upon its technology. This openness could foster a more collaborative environment in the AI community, accelerating advancements and applications.
- Strategic re-evaluation: With DeepSeek demonstrating that high-performance AI can be achieved with less data and lower costs, CIOs might need to reassess their AI strategies. This includes evaluating current investments in AI infrastructure and considering more cost-effective alternatives.
- Data privacy and security: Given that DeepSeek is based in China, there may be concerns about data privacy and security. CIOs should carefully consider the implications of integrating technology from companies that operate under different regulatory environments.
Forrester principal analysts Carlos Casanova, Michele Pelino, and Michele Goetz further note that CIOs should expect DeepSeek to impact edge computing technologies, AIOps, and IT operations. In particular, DeepSeek has the ability to explain its answers by default, delivering transparency that’s crucial to building trust and understanding in AI-driven decisions in AIOps solutions.
“With LLMs running on edge devices, AIOps and observability can achieve new levels of real-time insight and automation,” they wrote. “The integration of smaller-footprint LLMs that can run at the edge — such as DeepSeek R1 — with AIOps can also lead to more proactive and predictive maintenance of devices and infrastructure, or injection of risk-mitigating actions with no human intervention.”
Read More from This Article: What CIOs should learn now that DeepSeek is here
Source: News