LLaMA

Meta · February 2023

legacyOpen Weightdecoder onlytext
Parameters7B - 65B
Context Window2K tokens
Variants7B, 13B, 33B, 65B

Why It Matters

Democratized AI research by releasing powerful models openly. Spawned an entire ecosystem of open-source AI development including Alpaca, Vicuna, and hundreds of community fine-tunes.

Description

Meta's first openly released large language model, available in sizes from 7 billion to 65 billion parameters. Despite being smaller than many competitors, it outperformed models like GPT-3 by training more efficiently on higher-quality data — proving that smarter training matters more than sheer size.

Notable Milestones

  • Sparked the open-source LLM movement
  • Basis for Stanford Alpaca and UC Berkeley Vicuna
  • Proved smaller well-trained models can beat larger ones

Key Innovations

Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Scaling Laws
Scaling LawsMathematical relationships showing how model performance improves predictably with more data, compute, and parameters.

Family Tree

Related Research (6)

TransformerTransformer
2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

2020 · OpenAI

Found that model performance follows power laws in compute, parameters, and data. Provided the mathematical framework for scaling decisions.

ChinchillaScaling
2022 · DeepMind

Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.

LLaMAScaling
2023 · Meta AI

Showed that smaller models trained on significantly more data (following Chinchilla scaling laws) could match or exceed the performance of much larger…

RoPEArchitecture
2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

SwiGLUArchitecture
2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.