Mistral Large 2

Mistral AI · July 2024

● activeOpen Weightdecoder onlytextAPI Available

Parameters123B

Context Window128K tokens

Why It Matters

Established Mistral AI as a credible competitor to frontier labs by delivering near-GPT-4-level performance in an open-weight package, particularly excelling in multilingual and code tasks.

Description

Mistral AI's 123 billion parameter flagship model, approaching the capabilities of the best closed models while remaining open-weight. Supports a 128K token context window (roughly 100,000 words) and excels at coding, reasoning, and multilingual tasks across dozens of languages.

Notable Milestones

▸Near-GPT-4 performance as an open-weight model
▸Strong multilingual support across dozens of languages
▸Powers Mistral's Le Chat conversational assistant

Benchmark Scores

MMLUMassive Multitask Language Understanding — 57 subjects

84.0%

HumanEvalCode generation pass@1 — Python problems

92.0%

Key Innovations

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Related Research (2)

Grouped-Query AttentionArchitecture

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

Mistral 7BScaling

2023 · Mistral AI

Introduced sliding window attention and demonstrated that a 7B model could outperform LLaMA 2 13B on all benchmarks.