Nvidia unveils preview of DeepSeek-R1 NIM microservice

Nvidia stock plummeted 17% on Monday after Chinese AI developer, DeepSeek, unveiled its DeepSeek-R1 LLM. On Thursday, the chipmaker turned around and announced the DeepSeek-R1 model is now available as a preview Nvidia inference microservice (NIM) on build.nvidia.com.

NIM is a set of containers and tools to help developers deploy and manage gen AI models across clouds, data centers, and workstations. For example, the company recently announced three new NIM microservices aimed at helping enterprises boost safety, security, and compliance for AI agents.

DeepSeek-R1 is a new open-weight LLM based on the DeepSeek-V3 base model. Investors rushed to shed Nvidia stock on Monday because DeepSeek benchmarks rivaled those of the OpenAI o1 model but used much less powerful and advanced hardware and computing power sources. Investors feared the news might curb the demand for Nvidia’s highest-end GPUs and sow chaos with the pricing strategies of commercial AI vendors.

Nvidia stock has recovered somewhat since then — it rose nearly 9% on Tuesday — with industry watchers noting that while the Chinese LLM adds a healthy dose of competition to the gen AI landscape with its innovations, the market may have overreacted. Still, analysts believe DeepSeek’s entrance heralds the possibility for more affordable gen AI initiatives.

A shot to the system

Sidestepping the frenzy, Nvidia on Thursday said it was making the DeepSeek-R1 NIM available to help developers experiment with its logic inference, reasoning, mathematics, coding, and language capabilities for customizing their own specialized AI agents. The NIM runs on eight H200 GPUs connected via Nvidia NVLink and NVLink Switch.

“Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus, and search methods to generate the best answer,” Erik Pounds, director of product marketing at Nvidia, wrote in a blog post Thursday.

Using this sequence of inference passes is called test-time scaling.

“As models are allowed to iteratively ‘think’ through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale,” he wrote. “Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments.”

Pounds noted that DeepSeek-R1 delivers high levels of accuracy and inference efficiency for tasks that demand logical inference, reasoning, math, coding, and language understanding.

The 671-billion-parameter DeepSeek-R1 model is now available as a preview NIM and can deliver up to 3,872 tokens per second on a single Nvidia HGX H200 system.

Pounds said Nvidia’s next-generation Blackwell architecture will give a “giant boost” to test-time scaling on reasoning models like DeepSeek-R1, with fifth-generation Tensor Cores that can deliver up to 20 petaflops of peak FP4 compute performance and a 72-GPU NVLink domain specifically optimized for performance.

Read More from This Article: Nvidia unveils preview of DeepSeek-R1 NIM microservice
Source: News

Nvidia unveils preview of DeepSeek-R1 NIM microservice

A shot to the system

Related posts