Kimi K2
Moonshot AI · June 2025
● activeOpen Weightsparse moetext
Parameters1T
Context Window128K tokens
Why It Matters
One of the largest open-weight models ever released, demonstrating that Chinese AI labs can compete at the frontier of model scale and capability while keeping weights accessible to researchers.
Description
Moonshot AI's open-weight sparse mixture-of-experts model with 1 trillion total parameters and 128K context window. Trained using the Muon optimizer and designed for agentic workflows, it represents one of the largest open-weight models available and a milestone in Chinese AI development.
Key Innovations
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.
Agentic
AgenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention.
muon-optimizer