Phi-4 Multimodal
Microsoft · February 2025
● activeOpen Sourcedecoder onlymultimodal
Parameters14B
Context Window128K tokens
Description
Multimodal variant of Phi-4 that can understand images, charts, and documents alongside text. One of the smallest models capable of genuine multimodal reasoning — processing both visual and textual information to answer complex questions about what it sees.
Key Innovations
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.