Jamba 1.5

AI21 Labs · August 2024

● activeOpen Weighthybrid mamba transformertext

Parameters398B (94B active)

Context Window256K tokens

VariantsLarge, Mini

Description

A significantly scaled-up version of Jamba, growing to 398 billion total parameters (with 94 billion active at any time thanks to its Mixture-of-Experts architecture). Maintained the same 256K token context window — enough to process roughly 190,000 words or several full-length novels at once. Offered in both 'Large' and 'Mini' variants for different performance and cost trade-offs.

Key Innovations

MoE

MoEArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute.

Long Context

Long ContextAbility to process very long inputs (100K+ tokens), enabling analysis of entire codebases or books.

Family Tree

Built On

Jamba

Lineage

Jurassic-2→Jamba→Jamba 1.5

Related Research (1)

MambaArchitecture

2023 · Carnegie Mellon University / Princeton

Introduced selective state space models that process sequences in linear time (vs. quadratic for Transformers), with a data-dependent selection mechan…

More from AI21 Labs

Jurassic-22023-03 · 178B

Jamba2024-03 · 52B (12B active)

PreviousJamba