Qwen 3
Alibaba Cloud · April 2025
● activeOpen Sourcemixture of expertstextAPI Available
Parameters0.6B - 235B (MoE)
Context Window128K tokens
VariantsQwen3-235B-A22B, Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, Qwen3-0.6B
Why It Matters
First major open-source model family to offer hybrid thinking modes, letting users choose between fast responses and deep reasoning within a single model.
Description
Introduced 'hybrid thinking' — the ability to switch between a fast response mode and a deep reasoning mode (where the model thinks step-by-step before answering). The flagship 235B model uses mixture-of-experts (only 22B parameters active per query) for efficiency. Ranges from a tiny 0.6B model to the 235B flagship, all under the Apache 2.0 open-source license.
Notable Milestones
- ▸State-of-the-art open-source reasoning performance
- ▸Hybrid think/non-think mode for flexible deployment
- ▸Widely adopted as base for community fine-tunes
Key Innovations
Reasoning
ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Test-Time Compute
Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.
Related Research (1)
RoPEArchitecture
2021 · Zhuiyi Technology
Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…