Nemotron 3 Nano
NVIDIA · December 2025
● activeOpen Weighthybrid mamba transformertext
Parameters30B (3B active)
Context Window1M tokens
Description
The first model in NVIDIA's Nemotron 3 family, using a hybrid architecture that combines Mamba (a new type of sequence model that processes text in linear time, making it much faster for long sequences) with traditional Transformer attention, arranged as a Mixture-of-Experts. Has 30B total parameters but only activates 3B at a time, making it efficient enough to run on edge devices.
Key Innovations
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Agentic
AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.
Family Tree
Built On
Lineage
Successors (1)
Related Research (2)
MambaArchitecture
2023 · Carnegie Mellon University / Princeton
Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…
Megatron-LMScaling
2019 · NVIDIA
Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.