Jamba 1.5

AI21 Labs · August 2024

activeOpen Weighthybrid mamba transformertext
Parameters398B (94B active)
Context Window256K tokens
VariantsLarge, Mini

Description

A significantly scaled-up version of Jamba, growing to 398 billion total parameters (with 94 billion active at any time thanks to its Mixture-of-Experts architecture). Maintained the same 256K token context window — enough to process roughly 190,000 words or several full-length novels at once. Offered in both 'Large' and 'Mini' variants for different performance and cost trade-offs.

Key Innovations

MoE
MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.
Long Context
Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Family Tree

Built On

Lineage

Jurassic-2JambaJamba 1.5

Related Research (1)

MambaArchitecture
2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…