Nemotron 3 Nano

NVIDIA · December 2025

● activeOpen Weighthybrid mamba transformertext

Parameters30B (3B active)

Context Window1M tokens

Description

The first model in NVIDIA's Nemotron 3 family, using a hybrid architecture that combines Mamba (a new type of sequence model that processes text in linear time, making it much faster for long sequences) with traditional Transformer attention, arranged as a Mixture-of-Experts. Has 30B total parameters but only activates 3B at a time, making it efficient enough to run on edge devices.

Key Innovations

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Agentic

AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.

Distillation

DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Related Research (2)

MambaArchitecture

2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…

Megatron-LMScaling

2019 · NVIDIA

Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.