MiniMax-01
MiniMax · January 2025
● activeOpen Weightsparse moetext
Parameters456B (45.9B active)
Context Window4M tokens
Description
MiniMax's open-weight sparse mixture-of-experts model with 456 billion total parameters but only 45.9 billion active per query. Features an unprecedented 4-million-token context window enabled by lightning attention — a linear attention mechanism that maintains quality over extreme sequence lengths.
Key Innovations
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.
lightning-attention