LLaMA 3
Meta · April 2024
Why It Matters
Closed the quality gap between open and closed models, proving that openly available models could rival the best proprietary systems on many benchmarks.
Description
A major leap in open-model quality, available in 8B and 70B sizes. Trained on 15 trillion tokens of text data — roughly 7 times more than LLaMA 2 — which dramatically improved its ability to reason, write code, and follow instructions. Approached GPT-4-level performance on many tasks.
Notable Milestones
- ▸Approached GPT-4-class performance as an open model
- ▸Trained on 15T tokens — 7x more data than LLaMA 2
- ▸Widely deployed via Hugging Face and cloud providers
Key Innovations
Related Research (4)
Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…
Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…
Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…
Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.