Phi-1

Microsoft Research · June 2023

activeOpen Sourcedecoder onlycode
Parameters1.3B
Context Window2K tokens

Why It Matters

Launched Microsoft's 'small but mighty' research program, demonstrating that training data quality could be more important than model size — a finding that influenced the entire field.

Description

A tiny 1.3 billion parameter model focused on code generation, trained exclusively on 'textbook-quality' data — carefully curated examples designed to teach programming concepts clearly. Despite being over 100x smaller than GPT-4, it performed surprisingly well on coding benchmarks, proving that what a model learns from matters as much as its size.

Notable Milestones

  • Proved textbook-quality data approach for code generation
  • Inspired the 'small language model' movement

Key Innovations

Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Successors (1)

External Links