Phi-3
Microsoft Research · April 2024
● activeOpen Sourcedecoder onlytext
Parameters3.8B - 14B
Context Window128K tokens
VariantsMini (3.8B), Small (7B), Medium (14B)
Why It Matters
Brought high-quality AI to edge devices and mobile phones, proving that models small enough to run locally could compete with cloud-based giants.
Description
A family of small models (3.8B to 14B parameters) designed to run on phones, laptops, and edge devices while delivering quality rivaling much larger cloud-based models. The Mini variant can process up to 128K tokens of context (roughly 96,000 words), making it one of the most capable small models available.
Notable Milestones
- ▸On-device AI for mobile applications
- ▸Edge computing in IoT devices
- ▸Privacy-preserving local inference
Key Innovations
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.