LLM Treeof Life

NVLM 1.0

NVIDIA · October 2024

● activeOpen Weightdecoder onlymultimodal

Parameters72B

Context Window4K tokens

Description

A family of multimodal language models from NVIDIA that can process both text and images. Uniquely, adding vision capabilities actually improved the model's text performance — a rare achievement, since most multimodal models sacrifice some text quality when learning to handle images.

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Family Tree

Built On

Nemotron-4 340B

Lineage

Megatron-Turing NLG→Nemotron-4 15B→Nemotron-4 340B→NVLM 1.0

External Links

More from NVIDIA Nemotron

Megatron-Turing NLG2021-10 · 530B

Nemotron-4 15B2024-03 · 15B

Nemotron-4 340B2024-06 · 340B

Llama-3.1-Nemotron-70B2024-10 · 70B

Nemotron 3 Nano2025-12 · 30B (3B active)

Nemotron 3 Super2026-03 · 120B (12B active)

Nemotron 3 Ultra2026-05 · 550B (55B active)

Cosmos 1.02025-01 · —

PreviousLlama-3.1-Nemotron-70B