DeepSeek R1

DeepSeek · January 2025

● activeOpen Weightmixture of expertstext

Parameters671B (37B active)

Context Window128K tokens

VariantsR1, R1-Distill-Qwen-1.5B, R1-Distill-Qwen-7B, R1-Distill-Qwen-14B, R1-Distill-Qwen-32B, R1-Distill-Llama-8B, R1-Distill-Llama-70B

Why It Matters

First open-source reasoning model. Demonstrated that chain-of-thought reasoning could be trained into any model, not just proprietary ones. Its release democratized advanced reasoning capabilities.

Description

The first open-source reasoning model, rivaling OpenAI's o1. Uses chain-of-thought reasoning — the model 'thinks out loud' step by step before answering — trained purely through reinforcement learning (reward-based trial and error) without needing human-written examples. Also released distilled versions (smaller models trained to mimic R1's reasoning) as small as 1.5B parameters.

Notable Milestones

▸First open-weight model to match OpenAI o1 on reasoning benchmarks
▸Distilled versions brought reasoning to models as small as 1.5B parameters
▸Sparked a wave of open-source reasoning model development

Benchmark Scores

MMLUMassive Multitask Language Understanding — 57 subjects

90.8%

MATHMATH benchmark — competition-level problems

97.3%

AIMEAMC/AIME math competition

79.8%

Key Innovations

Reasoning

ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.

Chain-of-Thought

Chain-of-ThoughtPrompting technique where the model 'thinks out loud' step by step before giving a final answer.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Test-Time Compute

Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.

Distillation

DistillationTraining a smaller 'student' model to mimic a larger 'teacher' model, preserving capability at lower cost.

Family Tree

Related Research (2)

DeepSeek-V2 / MLAArchitecture

2024 · DeepSeek

Introduced Multi-head Latent Attention (MLA), which compresses the key-value cache into a low-rank latent space, dramatically reducing the memory need…

DeepSeek-R1Reasoning

2025 · DeepSeek AI

Demonstrated that pure RL training (without supervised fine-tuning on reasoning traces) can produce chain-of-thought reasoning, achieving performance …