Gemini 3.5 Flash

Google DeepMind · May 2026

activeClosedmixture of expertsmultimodalAPI Available
Context Window1M tokens

Description

Google's speed-optimized model designed for high-volume, low-latency applications. Serves as the default model in the Gemini app, balancing strong capabilities with rapid response times and low cost per query.

Key Innovations

Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree