Phi-4 Multimodal

Microsoft · February 2025

activeOpen Sourcedecoder onlymultimodal
Parameters14B
Context Window128K tokens

Description

Multimodal variant of Phi-4 that can understand images, charts, and documents alongside text. One of the smallest models capable of genuine multimodal reasoning — processing both visual and textual information to answer complex questions about what it sees.

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.

Family Tree

Built On

Lineage

Phi-1Phi-2Phi-3Phi-4Phi-4 Multimodal

External Links