Qwen 3

Alibaba Cloud · April 2025

● activeOpen Sourcemixture of expertstextAPI Available

Parameters0.6B - 235B (MoE)

Context Window128K tokens

VariantsQwen3-235B-A22B, Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, Qwen3-0.6B

Why It Matters

First major open-source model family to offer hybrid thinking modes, letting users choose between fast responses and deep reasoning within a single model.

Description

Introduced 'hybrid thinking' — the ability to switch between a fast response mode and a deep reasoning mode (where the model thinks step-by-step before answering). The flagship 235B model uses mixture-of-experts (only 22B parameters active per query) for efficiency. Ranges from a tiny 0.6B model to the 235B flagship, all under the Apache 2.0 open-source license.

Notable Milestones

▸State-of-the-art open-source reasoning performance
▸Hybrid think/non-think mode for flexible deployment
▸Widely adopted as base for community fine-tunes

Key Innovations

Reasoning

ReasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches.

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Test-Time Compute

Test-Time ComputeUsing extra computation during inference (not training) to improve answer quality — thinking longer on harder problems.

Family Tree

Related Research (1)

RoPEArchitecture

2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…