LLaMA 2
Meta · July 2023
Why It Matters
First major open-weight model licensed for commercial use, enabling startups and enterprises to build products on top of a frontier-class model without API fees.
Description
Meta's second-generation open model, available in 7B to 70B parameter sizes. Fine-tuned using RLHF (reinforcement learning from human feedback) — a technique where human reviewers rate the model's outputs to teach it to give more helpful, safer responses. First major open model licensed for commercial use.
Notable Milestones
- ▸First open model with commercial license
- ▸Partnered with Microsoft for Azure distribution
- ▸Widely adopted as base model for enterprise fine-tuning
Key Innovations
Family Tree
Built On
Lineage
Related Research (7)
Found that model performance follows power laws in compute, parameters, and data. Provided the mathematical framework for scaling decisions.
Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.
Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…
Provided the most detailed public documentation of how to train, fine-tune, and safety-align a large language model, including their full RLHF methodo…
Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…
Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…
Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.