Molmo
Allen Institute for AI · September 2024
● activeOpen Sourcedecoder onlymultimodal
Parameters7B - 72B
Context Window4K tokens
Variants7B, 72B
Why It Matters
Proved that open multimodal models could match GPT-4V quality when trained with carefully curated, human-annotated visual data.
Description
Allen AI's open multimodal model that can understand both text and images. Available in sizes ranging from 7B to 72B parameters. Trained with carefully curated, human-annotated visual data rather than relying solely on synthetic data from other models, which helped it achieve quality competitive with proprietary models like GPT-4V (OpenAI's vision model).
Key Innovations
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.