DeepSeek-Coder-V2
DeepSeek · June 2024
● activeOpen Sourcemixture of expertscode
Parameters236B (21B active)
Context Window128K tokens
Why It Matters
First open MoE model purpose-built for code that matched GPT-4 Turbo on coding benchmarks.
Description
A Mixture-of-Experts coding model with 236B total parameters but only 21B active at any time — meaning it routes each request to the most relevant expert sub-networks. The first open MoE model purpose-built for code that matched the coding performance of GPT-4 Turbo, supporting 338 programming languages.
Key Innovations
Code Gen
Code GenAbility to write, debug, and understand programming code across multiple languages.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.