DeepSeek R1

DeepSeek · January 2025

activeOpen Weightmixture of expertstext
Parameters671B (37B active)
Context Window128K tokens
VariantsR1, R1-Distill-Qwen-1.5B, R1-Distill-Qwen-7B, R1-Distill-Qwen-14B, R1-Distill-Qwen-32B, R1-Distill-Llama-8B, R1-Distill-Llama-70B

Why It Matters

First open-source reasoning model. Demonstrated that chain-of-thought reasoning could be trained into any model, not just proprietary ones. Its release democratized advanced reasoning capabilities.

Description

The first open-source reasoning model, rivaling OpenAI's o1. Uses chain-of-thought reasoning — the model 'thinks out loud' step by step before answering — trained purely through reinforcement learning (reward-based trial and error) without needing human-written examples. Also released distilled versions (smaller models trained to mimic R1's reasoning) as small as 1.5B parameters.

Notable Milestones

  • First open-weight model to match OpenAI o1 on reasoning benchmarks
  • Distilled versions brought reasoning to models as small as 1.5B parameters
  • Sparked a wave of open-source reasoning model development

Benchmark Scores

MMLUMassive Multitask Language Understanding — 57 subjects
90.8%
MATHMATH benchmark — competition-level problems
97.3%
AIMEAMC/AIME math competition
79.8%

Key Innovations

Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
Chain-of-Thought
Chain-of-ThoughtPrompting technique where the model 'thinks out loud' step by step before giving a final answer.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Test-Time Compute
Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.
Distillation
DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Built On

Related Research (2)

DeepSeek-V2 / MLAArchitecture
2024 · DeepSeek

Introduced Multi-head Latent Attention (MLA), which compresses the key-value cache into a low-rank latent space, dramatically reducing the memory need…

DeepSeek-R1Reasoning
2025 · DeepSeek AI

Demonstrated that pure RL training (without supervised fine-tuning on reasoning traces) can produce chain-of-thought reasoning, achieving performance …