MiniMax-01

MiniMax · January 2025

● activeOpen Weightsparse moetext

Parameters456B (45.9B active)

Context Window4M tokens

Description

MiniMax's open-weight sparse mixture-of-experts model with 456 billion total parameters but only 45.9 billion active per query. Features an unprecedented 4-million-token context window enabled by lightning attention — a linear attention mechanism that maintains quality over extreme sequence lengths.

Key Innovations

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Long Context

Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

lightning-attention

More from Chinese LLMs

GLM-42024-01 · —

ChatGLM-42024-06 · —

Ernie 4.02023-10 · —

Baichuan-22023-09 · 13B