Gemma 3

Google DeepMind · March 2025

activeOpen Sourcedecoder onlymultimodal
Parameters1B / 4B / 12B / 27B
Context Window128K tokens
Variants1B, 4B, 12B, 27B

Why It Matters

Brought multimodal capabilities to the open-weight world at accessible sizes. Proved that small, downloadable models could understand images and handle very long documents.

Description

First multimodal Gemma — capable of understanding both text and images, not just text. Features a 128K-token context window (roughly 96,000 words) and supports over 140 languages. Available in 1B, 4B, 12B, and 27B sizes, all designed to run efficiently on a single GPU or TPU. Shipped alongside ShieldGemma 2 for content safety filtering.

Notable Milestones

  • Supports 140+ languages for multilingual applications
  • ShieldGemma 2 safety classifier included for responsible deployment

Benchmark Scores

MATHMATH benchmark — competition-level problems
69.0%
GPQAGraduate-level science QA
42.4%

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Family Tree

Built On

Lineage

GemmaGemma 2Gemma 3

Successors (1)

Related Research (1)

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

External Links