Qwen 3

Alibaba Cloud · April 2025

activeOpen Sourcemixture of expertstextAPI Available
Parameters0.6B - 235B (MoE)
Context Window128K tokens
VariantsQwen3-235B-A22B, Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, Qwen3-0.6B

Why It Matters

First major open-source model family to offer hybrid thinking modes, letting users choose between fast responses and deep reasoning within a single model.

Description

Introduced 'hybrid thinking' — the ability to switch between a fast response mode and a deep reasoning mode (where the model thinks step-by-step before answering). The flagship 235B model uses mixture-of-experts (only 22B parameters active per query) for efficiency. Ranges from a tiny 0.6B model to the 235B flagship, all under the Apache 2.0 open-source license.

Notable Milestones

  • State-of-the-art open-source reasoning performance
  • Hybrid think/non-think mode for flexible deployment
  • Widely adopted as base for community fine-tunes

Key Innovations

Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Test-Time Compute
Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.

Family Tree

Built On

Lineage

QwenQwen 1.5Qwen 2Qwen 2.5Qwen 3

Successors (1)

Related Research (1)

RoPEArchitecture
2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

External Links