Nemotron 3 Super

NVIDIA · March 2026

activeOpen Weighthybrid mamba transformertext
Parameters120B (12B active)
Context Window1M tokens

Description

The mid-range model in NVIDIA's Nemotron 3 family, designed for multi-agent applications where multiple AI models collaborate to solve complex tasks. Uses the same hybrid Mamba-Transformer MoE architecture as Nano but scaled up to 120B total parameters with 12B active, and supports a 1 million token context window — enough to process roughly 750,000 words at once.

Key Innovations

MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Agentic
AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.

Family Tree

Related Research (2)

MambaArchitecture
2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…

2019 · NVIDIA

Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.