LLM Treeof Life

Phi-4 Multimodal

Microsoft · February 2025

● activeOpen Sourcedecoder onlymultimodal

Parameters14B

Context Window128K tokens

Description

Multimodal variant of Phi-4 that can understand images, charts, and documents alongside text. One of the smallest models capable of genuine multimodal reasoning — processing both visual and textual information to answer complex questions about what it sees.

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Reasoning

ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.

Family Tree

Built On

Lineage

Phi-1→Phi-2→Phi-3→Phi-4→Phi-4 Multimodal

External Links

More from Microsoft Phi

Phi-12023-06 · 1.3B

Phi-22023-12 · 2.7B

Phi-32024-04 · 3.8B - 14B

Phi-42025-02 · 14B

MAI-12024-05 · ~500B

Phi-4 Mini2025-02 · 3.8B

PreviousPhi-4 Mini