Phi-3

Microsoft Research · April 2024

activeOpen Sourcedecoder onlytext
Parameters3.8B - 14B
Context Window128K tokens
VariantsMini (3.8B), Small (7B), Medium (14B)

Why It Matters

Brought high-quality AI to edge devices and mobile phones, proving that models small enough to run locally could compete with cloud-based giants.

Description

A family of small models (3.8B to 14B parameters) designed to run on phones, laptops, and edge devices while delivering quality rivaling much larger cloud-based models. The Mini variant can process up to 128K tokens of context (roughly 96,000 words), making it one of the most capable small models available.

Notable Milestones

  • On-device AI for mobile applications
  • Edge computing in IoT devices
  • Privacy-preserving local inference

Key Innovations

Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Family Tree

Built On

Lineage

Phi-1Phi-2Phi-3

Successors (2)