Nvidia has launched Nemotron 3 Super, an open AI model with 120 billion parameters and a context window of one million tokens.
With Nemotron 3 Super, Nvidia aims to meet the growing demand for powerful AI models that can be deployed for advanced agentic systems. The model focuses on multi-agent applications and is designed for large-scale automation, higher efficiency, and accuracy in complex workflows. Thanks to its open nature, organizations can freely deploy, adapt, and optimize the model for their own applications.
Innovations in architecture and performance
Nemotron 3 Super uses a hybrid mixture-of-experts (MoE) architecture where only twelve billion of the 120 billion parameters are active during inference. This achieves up to five times higher throughput and up to twice the accuracy compared to previous models. Mamba layers provide fourfold efficiency in memory and computing power, while transformer layers enable advanced reasoning.
Additionally, Nvidia introduces the Latent MoE technique, which activates four experts simultaneously at no extra cost. Thanks to multi-token prediction, the model can predict multiple words at once, resulting in three times faster inference. On the Blackwell platform, the model runs in NVFP4 precision, which reduces memory consumption and speeds up inference by up to four times compared to FP8, without loss of accuracy.
Applications and availability
Companies like Perplexity, CodeRabbit, and Greptile are integrating Nemotron 3 Super into their AI agents for tasks such as search, software development, and scientific analysis. Industrial players like Palantir and Siemens are applying the model for automation in sectors such as telecom, cybersecurity, and chip design.
The model is immediately available to companies and developers via various cloud platforms such as Nvidia’s own platform, Perplexity, OpenRouter, and Hugging Face. Partners like Dell and HPE also offer it. Cloud providers such as Google Cloud, Oracle Cloud, and soon Amazon Web Services and Microsoft Azure also support the model.
