o3
OpenAI · April 2025
● activeCloseddecoder onlytextAPI Available
Context Window200K tokens
Variantso3, o3-mini, o3-pro
Why It Matters
Set new records on virtually every major reasoning benchmark. The o3-pro variant demonstrated that scaling test-time compute could achieve near-human expert performance on the most challenging scientific and mathematical problems.
Description
OpenAI's most powerful reasoning model, significantly surpassing o1 on math, coding, and science benchmarks. The Pro variant uses even more compute per query for the hardest problems. Represents the state of the art in AI reasoning at the time of release.
Notable Milestones
- ▸Achieved a new high score on ARC-AGI, a benchmark designed to test general reasoning
- ▸Outperformed PhD-level experts on graduate science exams
- ▸Set state-of-the-art on competitive math olympiad problems
Benchmark Scores
GPQAGraduate-level science QA
87.7%AIMEAMC/AIME math competition
96.7%SWE-benchReal-world software engineering
71.7%Key Innovations
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
Test-Time Compute
Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.
Family Tree
Built On
Lineage
Successors (1)
Related Research (1)
Chain-of-ThoughtReasoning
2022 · Google
Showed that prompting models to "think step-by-step" unlocks arithmetic, logic, and commonsense reasoning in large models like PaLM.