Molmo

Allen Institute for AI · September 2024

activeOpen Sourcedecoder onlymultimodal
Parameters7B - 72B
Context Window4K tokens
Variants7B, 72B

Why It Matters

Proved that open multimodal models could match GPT-4V quality when trained with carefully curated, human-annotated visual data.

Description

Allen AI's open multimodal model that can understand both text and images. Available in sizes ranging from 7B to 72B parameters. Trained with carefully curated, human-annotated visual data rather than relying solely on synthetic data from other models, which helped it achieve quality competitive with proprietary models like GPT-4V (OpenAI's vision model).

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Built On

Lineage

OLMoMolmo