Jamba 1.5
AI21 Labs · August 2024
● activeOpen Weighthybrid mamba transformertext
Parameters398B (94B active)
Context Window256K tokens
VariantsLarge, Mini
Description
A significantly scaled-up version of Jamba, growing to 398 billion total parameters (with 94 billion active at any time thanks to its Mixture-of-Experts architecture). Maintained the same 256K token context window — enough to process roughly 190,000 words or several full-length novels at once. Offered in both 'Large' and 'Mini' variants for different performance and cost trade-offs.
Key Innovations
MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.
Related Research (1)
MambaArchitecture
2023 · Carnegie Mellon University / Princeton
Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…