DeepSeek-Coder-V2

DeepSeek · June 2024

activeOpen Sourcemixture of expertscode
Parameters236B (21B active)
Context Window128K tokens

Why It Matters

First open MoE model purpose-built for code that matched GPT-4 Turbo on coding benchmarks.

Description

A Mixture-of-Experts coding model with 236B total parameters but only 21B active at any time — meaning it routes each request to the most relevant expert sub-networks. The first open MoE model purpose-built for code that matched the coding performance of GPT-4 Turbo, supporting 338 programming languages.

Key Innovations

Code Gen
Code GenAbility to write, debug, and understand programming code across multiple languages.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Family Tree

Lineage

DeepSeek V1DeepSeek V2DeepSeek-Coder-V2