Nvidia’s ‘hard pivot’ to AI reasoning bolsters Llama models for agentic AI

At Nvidia GTC 2025 in San Jose today, Nvidia launched a new family of open reasoning AI models for building agentic AI platforms.

The company has post-trained its new Llama Nemotron family of reasoning models to improve multistep math, coding, reasoning, and complex decision-making. The enhancements aim to provide developers and enterprises with a business-ready foundation for creating AI agents that can work independently or as part of connected teams.

Post-training is a set of processes and techniques for refining and optimizing a machine learning model after its initial training on a dataset. It is intended to improve a model’s performance and efficiency and sometimes includes fine-tuning a model on a smaller, more specific dataset.

“We did a really hard pivot in January and started training this family for reasoning and we’re really excited about the results,” said Kari Briski, vice president of AI product software management at Nvidia. “Llama is the most widely used open model across every enterprise, but it didn’t have reasoning.”

Reasoning models, such as like DeepSeek, which burst onto the AI scene in January, are foundational models that don’t just generate a statistically probable output like standard large language models (LLMs) do. Instead, they use logical reasoning to break complex questions down into smaller steps, and then they explore and validate various approaches using a “chain of thought” process to provide an accurate answer. This process, which is more human-like than approaches taken by other generative AI models, allows reasoning models to show how they reached their conclusions.

Reasoning model announcements have been accelerating in 2025, especially in the wake of DeepSeek’s January unveiling. OpenAI released its o3-mini reasoning model in late January, after a December 2024 preview. Alibaba announced its QwQ-32B compact reasoning model earlier this month. Microsoft is reportedly developing its own reasoning capabilities, and Baidu unveiled Ernie XI earlier this week.

Nvidia’s Briski said the company’s “hard pivot” to reasoning has boosted the accuracy of its Llama Nemotron models up to 20% compared with the base model. Inference speed has also been optimized by 5x compared with other leading open reasoning models, she claimed. These improvements in inference performance make the family of models capable of handling more complex reasoning tasks, Briski said, which in turn reduce operational costs for enterprises.

The Llama Nemotron family of models are available as Nvidia NIM microservices in Nano, Super, and Ultra sizes, which enable organizations to deploy the models at scales suited to their needs. Nano microservices are optimized for deployment on PCs and edge devices. Super microservices are for high throughput on a single GPU. Ultra microservices are for multi-GPU servers and data-center-scale applications.

Partners extend reasoning to Llama ecosystem

Nvidia’s partners are also getting in on the action. Microsoft is expanding its Azure AI Foundry model catalog with Llama Nemotron reasoning models and NIM microservices to enhance services such as the Azure AI Agent Service for Microsoft 365. SAP is leveraging them for SAP Business AI solutions and its Joule copilot. It’s also using NeMo microservices to increase code completion accuracy for SAP ABAP programming language models. ServiceNow said Llama Nemotron models will provide its AI agents with greater performance and accuracy.

Service providers such as Accenture and Deloitte said they, too, are drawing on Llama Nemotron reasoning models for their offerings. Accenture has made the models available on its AI Refinery platform, and Deloitte is incorporating the models in its just-released Zora agentic AI platform.

The new models are part of the Nvidia AI Enterprise software platform, along with new elements including:

Nvidia AI-Q Blueprint, which connects AI agents to enterprise knowledge using Nvidia NeMo Retriever for multimodal information retrieval and Nvidia AgentIQ toolkit for agent and data connections, optimization, and transparency
Nvidia AI Data Platform, which provides a customizable reference design for enterprise infrastructure with AI query agents

The Llama Nemotron Nano and Super models and NIM microservices are available now as a hosted API from build.nvidia.com and Hugging Face.

Nvidia said members of the Nvidia Developer Program can access them for free for development, testing, and research. Enterprises can use Nvidia AI Enterprise on accelerated data center and cloud infrastructure to run Llama Nemotron NIM microservices in production.

Read More from This Article: Nvidia’s ‘hard pivot’ to AI reasoning bolsters Llama models for agentic AI
Source: News

Nvidia’s ‘hard pivot’ to AI reasoning bolsters Llama models for agentic AI

Partners extend reasoning to Llama ecosystem

Related posts