LLaMA 2

Meta · July 2023

◌ legacyOpen Weightdecoder onlytextAPI Available

Parameters7B - 70B

Context Window4K tokens

Variants7B, 13B, 70B

Why It Matters

First major open-weight model licensed for commercial use, enabling startups and enterprises to build products on top of a frontier-class model without API fees.

Description

Meta's second-generation open model, available in 7B to 70B parameter sizes. Fine-tuned using RLHF (reinforcement learning from human feedback) — a technique where human reviewers rate the model's outputs to teach it to give more helpful, safer responses. First major open model licensed for commercial use.

Notable Milestones

▸First open model with commercial license
▸Partnered with Microsoft for Azure distribution
▸Widely adopted as base model for enterprise fine-tuning

Key Innovations

RLHF

RLHFReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Instruction Tuning

Instruction TuningFine-tuning a model on instruction-response pairs so it follows user commands more reliably.

Family Tree

Built On

LLaMA

Lineage

LLaMA→LLaMA 2

Successors (5)

LLaMA 3 Dolphin (Eric Hartford)Hermes (Nous Research)MythoMax-L2-13B CodeLlama

Related Research (7)

Scaling Laws (Kaplan)Scaling

2020 · OpenAI

Found that model performance follows power laws in compute, parameters, and data. Provided the mathematical framework for scaling decisions.

ChinchillaScaling

2022 · DeepMind

Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.

LLaMAScaling

2023 · Meta AI

Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…

LLaMA 2Scaling

2023 · Meta AI

Provided the most detailed public documentation of how to train, fine-tune, and safety-align a large language model, including their full RLHF methodo…

RoPEArchitecture

2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

Grouped-Query AttentionArchitecture

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

SwiGLUArchitecture

2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.

External Links

Research Paper Announcement

More from Meta LLaMA

LLaMA2023-02 · 7B - 65B

LLaMA 32024-04 · 8B / 70B

LLaMA 3.12024-07 · 8B / 70B / 405B

LLaMA 3.22024-09 · 1B / 3B / 11B / 90B

LLaMA 3.32024-12 · 70B

LLaMA 42025-04 · 17B active (Scout) / larger (Maverick)

MusicGen2023-06 · 3.3B

CodeLlama2023-08 · 7B - 70B

PreviousMusicGen

NextCodeLlama