MiniMax-01

MiniMax · January 2025

activeOpen Weightsparse moetext
Parameters456B (45.9B active)
Context Window4M tokens

Description

MiniMax's open-weight sparse mixture-of-experts model with 456 billion total parameters but only 45.9 billion active per query. Features an unprecedented 4-million-token context window enabled by lightning attention — a linear attention mechanism that maintains quality over extreme sequence lengths.

Key Innovations

MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.
lightning-attention