Gemini 1.0

Google DeepMind · December 2023

activeClosedmixture of expertsmultimodalAPI Available
Context Window32K tokens
VariantsUltra, Pro, Nano

Why It Matters

Google's answer to GPT-4 and the first major model designed as multimodal from inception rather than adding vision as an afterthought. Gemini Ultra briefly claimed the top spot on key benchmarks.

Description

Google's first natively multimodal model family — built from the ground up to understand text, images, audio, and video together rather than bolting on capabilities after the fact. Available in three tiers: Ultra (most capable), Pro (balanced), and Nano (designed to run on mobile phones). Replaced PaLM as Google's flagship AI.

Notable Milestones

  • Replaced PaLM as the engine behind Google's AI products
  • Nano variant designed for on-device use on Pixel phones
  • Ultra was first model to achieve human-expert level on MMLU benchmark

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Family Tree

Built On

Lineage

PaLMPaLM 2Gemini 1.0

Related Research (2)

ReActReasoning
2022 · Princeton / Google

Combined chain-of-thought reasoning with external tool use (APIs, search), improving QA and decision-making through interleaved reasoning and action.

GeminiScaling
2023 · Google DeepMind

Introduced the Gemini family with native multimodal training from the ground up, achieving SOTA on 30+ benchmarks.

Enabled By

TPU v5e / v5pGOOGLE · August 2023
v5e: 197 TFLOPS FP8 / v5p: 459 TFLOPS bfloat16

External Links