DeepSeek-Coder-V2

DeepSeek · June 2024

● activeOpen Sourcemixture of expertscode

Parameters236B (21B active)

Context Window128K tokens

Why It Matters

First open MoE model purpose-built for code that matched GPT-4 Turbo on coding benchmarks.

Description

A Mixture-of-Experts coding model with 236B total parameters but only 21B active at any time — meaning it routes each request to the most relevant expert sub-networks. The first open MoE model purpose-built for code that matched the coding performance of GPT-4 Turbo, supporting 338 programming languages.

Key Innovations

Code Gen

Code GenAbility to write, debug, and understand programming code across multiple languages.

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Family Tree

Built On

DeepSeek V2 DeepSeek-Coder

Lineage

DeepSeek V1→DeepSeek V2→DeepSeek-Coder-V2

More from DeepSeek

DeepSeek V12024-01 · 67B

DeepSeek V22024-05 · 236B (21B active)

DeepSeek V32024-12 · 671B (37B active)

DeepSeek R12025-01 · 671B (37B active)

DeepSeek V4 Pro2026-04 · 1.6T

DeepSeek-Coder2023-11 · 1.3B - 33B

PreviousDeepSeek V2

NextDeepSeek V3