Nemotron-4 15B
NVIDIA · March 2024
● activeOpen Weightdecoder onlytext
Parameters15B
Context Window8K tokens
Description
NVIDIA's multilingual language model trained on an enormous 8 trillion tokens of text — roughly 6 trillion words across multiple languages. Designed as a capable mid-size model for both research and enterprise deployment, it represents NVIDIA's push beyond hardware into building their own AI models.
Key Innovations
Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.
Instruction Tuning
Instruction TuningFine-tuning a model on instruction-response pairs so it follows user commands more reliably.
Family Tree
Built On
Lineage
Successors (1)
Related Research (1)
Megatron-LMScaling
2019 · NVIDIA
Pioneered efficient model parallelism techniques enabling training of multi-billion parameter Transformers across GPUs.