Molmo

Allen Institute for AI · September 2024

● activeOpen Sourcedecoder onlymultimodal

Parameters7B - 72B

Context Window4K tokens

Variants7B, 72B

Why It Matters

Proved that open multimodal models could match GPT-4V quality when trained with carefully curated, human-annotated visual data.

Description

Allen AI's open multimodal model that can understand both text and images. Available in sizes ranging from 7B to 72B parameters. Trained with carefully curated, human-annotated visual data rather than relying solely on synthetic data from other models, which helped it achieve quality competitive with proprietary models like GPT-4V (OpenAI's vision model).

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Built On

OLMo

Lineage

OLMo→Molmo

More from Allen AI

OLMo2024-02 · 7B

OLMo 22024-11 · 7B - 13B

Tülu 32024-11 · 8B - 70B

PreviousOLMo

NextOLMo 2