Phi-2

Microsoft Research · December 2023

activeOpen Sourcedecoder onlytext
Parameters2.7B
Context Window2K tokens

Why It Matters

Microsoft's proof that small models trained on high-quality data could outperform models 25x their size. Fundamentally challenged the assumption that bigger always means better.

Description

A 2.7 billion parameter model that matched or outperformed models 5-10x its size on reasoning and language benchmarks. Built on the same philosophy as Phi-1 — using carefully selected, high-quality training data instead of brute-force scale. Proved that small models could rival much larger ones when trained smartly.

Notable Milestones

  • Outperformed Llama 2 70B on some benchmarks despite being 25x smaller
  • Helped establish the small language model category

Key Innovations

Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Built On

Lineage

Phi-1Phi-2

Successors (1)