LLaMA 3.1
Meta · July 2024
Why It Matters
The 405B model was the largest openly available language model at launch, proving that open-weight models could compete head-to-head with the best closed systems like GPT-4o.
Description
Introduced the massive 405B parameter flagship — the largest openly available model at the time — alongside updated 8B and 70B versions. Extended context window to 128K tokens (roughly 100,000 words), enabling processing of entire books or large codebases in a single prompt. First open model to rival GPT-4o in overall capability.
Notable Milestones
- ▸Largest open-weight model at time of release (405B)
- ▸First open model to rival GPT-4o
- ▸Adopted as distillation teacher for smaller open models
Benchmark Scores
Key Innovations
Family Tree
Built On
Related Research (2)
Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…
Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…