Llama-3.1-Nemotron-70B

NVIDIA · October 2024

activeOpen Weightdecoder onlytext
Parameters70B
Context Window128K tokens

Why It Matters

Showed that NVIDIA's post-training techniques could make an open model outperform GPT-4o on many benchmarks.

Description

NVIDIA's enhanced version of Meta's LLaMA 3.1 70B, fine-tuned using a novel REINFORCE-style reward training approach (a technique from reinforcement learning that optimizes the model by rewarding good responses). Demonstrated that advanced post-training techniques could make an already-strong open model competitive with top proprietary models like GPT-4o.

Key Innovations

RLHF
RLHFReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs.
Instruction Tuning
Instruction TuningFine-tuning a model on instruction-response pairs so it follows user commands more reliably.

Family Tree

Built On

Lineage

LLaMALLaMA 2LLaMA 3LLaMA 3.1Llama-3.1-Nemotron-70B

External Links