LLaMA 2

Meta · July 2023

legacyOpen Weightdecoder onlytextAPI Available
Parameters7B - 70B
Context Window4K tokens
Variants7B, 13B, 70B

Why It Matters

First major open-weight model licensed for commercial use, enabling startups and enterprises to build products on top of a frontier-class model without API fees.

Description

Meta's second-generation open model, available in 7B to 70B parameter sizes. Fine-tuned using RLHF (reinforcement learning from human feedback) — a technique where human reviewers rate the model's outputs to teach it to give more helpful, safer responses. First major open model licensed for commercial use.

Notable Milestones

  • First open model with commercial license
  • Partnered with Microsoft for Azure distribution
  • Widely adopted as base model for enterprise fine-tuning

Key Innovations

RLHF
RLHFReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Instruction Tuning
Instruction TuningFine-tuning a model on instruction-response pairs so it follows user commands more reliably.

Related Research (7)

2020 · OpenAI

Found that model performance follows power laws in compute, parameters, and data. Provided the mathematical framework for scaling decisions.

ChinchillaScaling
2022 · DeepMind

Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.

LLaMAScaling
2023 · Meta AI

Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…

LLaMA 2Scaling
2023 · Meta AI

Provided the most detailed public documentation of how to train, fine-tune, and safety-align a large language model, including their full RLHF methodo…

RoPEArchitecture
2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

SwiGLUArchitecture
2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.