Gemini 1.5 Pro

Google DeepMind · February 2024

activeClosedmixture of expertsmultimodalAPI Available
Context Window2M tokens
VariantsPro, Flash

Why It Matters

Shattered context window records with 1M tokens (later 2M), enabling entirely new use cases like analyzing full codebases or hour-long videos in a single prompt. Proved that mixture-of-experts could enable practical ultra-long-context processing.

Description

Broke the context window barrier with a 1-million-token context — enough to process entire codebases, hour-long videos, or several novels in a single prompt. Used a mixture-of-experts architecture (where only a fraction of the model activates for each query) to handle this massive input efficiently. Later expanded to 2 million tokens.

Notable Milestones

  • First model to process 1 million tokens of context
  • Can analyze hour-long videos and full codebases in one prompt
  • Flash variant became one of the most cost-effective frontier models

Benchmark Scores

MMLUMassive Multitask Language Understanding — 57 subjects
85.9%
MATHMATH benchmark — competition-level problems
86.5%
GPQAGraduate-level science QA
59.1%

Key Innovations

Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Family Tree

Built On

Lineage

PaLMPaLM 2Gemini 1.0Gemini 1.5 Pro

Successors (1)

Related Research (1)

GeminiScaling
2023 · Google DeepMind

Introduced the Gemini family with native multimodal training from the ground up, achieving SOTA on 30+ benchmarks.

Enabled By

TPU v5e / v5pGOOGLE · August 2023
v5e: 197 TFLOPS FP8 / v5p: 459 TFLOPS bfloat16