Kimi K2

Moonshot AI · June 2025

● activeOpen Weightsparse moetext

Parameters1T

Context Window128K tokens

Why It Matters

One of the largest open-weight models ever released, demonstrating that Chinese AI labs can compete at the frontier of model scale and capability while keeping weights accessible to researchers.

Description

Moonshot AI's open-weight sparse mixture-of-experts model with 1 trillion total parameters and 128K context window. Trained using the Muon optimizer and designed for agentic workflows, it represents one of the largest open-weight models available and a milestone in Chinese AI development.

Key Innovations

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Long Context

Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Agentic

AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.

muon-optimizer

More from Chinese LLMs

GLM-42024-01 · —

ChatGLM-42024-06 · —

Ernie 4.02023-10 · —

Baichuan-22023-09 · 13B

MiniMax-012025-01 · 456B (45.9B active)

Doubao2024-05 · —

PreviousMiniMax-01