Gemma 3

Google DeepMind · March 2025

● activeOpen Sourcedecoder onlymultimodal

Parameters1B / 4B / 12B / 27B

Context Window128K tokens

Variants1B, 4B, 12B, 27B

Why It Matters

Brought multimodal capabilities to the open-weight world at accessible sizes. Proved that small, downloadable models could understand images and handle very long documents.

Description

First multimodal Gemma — capable of understanding both text and images, not just text. Features a 128K-token context window (roughly 96,000 words) and supports over 140 languages. Available in 1B, 4B, 12B, and 27B sizes, all designed to run efficiently on a single GPU or TPU. Shipped alongside ShieldGemma 2 for content safety filtering.

Notable Milestones

▸Supports 140+ languages for multilingual applications
▸ShieldGemma 2 safety classifier included for responsible deployment

Benchmark Scores

MATHMATH benchmark — competition-level problems

69.0%

GPQAGraduate-level science QA

42.4%

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Long Context

Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Family Tree

Built On

Gemma 2

Lineage

Gemma→Gemma 2→Gemma 3

Successors (1)

Gemma 4

Related Research (1)

Grouped-Query AttentionArchitecture

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

External Links

Announcement

More from Google Gemma

Gemma2024-02 · 2B / 7B

Gemma 22024-06 · 2B / 9B / 27B

Gemma 42026-04 · 31B (A4B / E4B / E2B)

CodeGemma2024-04 · 2B / 7B

PreviousGemma 2

NextGemma 4