Mistral 7B

Mistral AI · September 2023

● activeOpen Sourcedecoder onlytext

Parameters7B

Context Window32K tokens

Why It Matters

Proved that a small, well-engineered model from a European startup could beat much larger competitors, establishing Mistral AI as a major force in open-source AI.

Description

A remarkably efficient 7 billion parameter model that outperformed the much larger LLaMA 2 13B. Uses sliding window attention — a technique that limits each word to only attending to nearby words rather than the entire text, dramatically reducing memory usage while maintaining quality. Released under the permissive Apache 2.0 license.

Notable Milestones

▸Outperformed LLaMA 2 13B despite being nearly half the size
▸One of the most fine-tuned base models in the open-source community
▸Apache 2.0 license enabled unrestricted commercial use

Key Innovations

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Successors (3)

Mixtral 8x7B Mistral 7B Uncensored Codestral

Related Research (5)

ChinchillaScaling

2022 · DeepMind

Challenged Kaplan's scaling laws by showing data should scale equally to parameters. 70B Chinchilla outperformed 280B Gopher.

RoPEArchitecture

2021 · Zhuiyi Technology

Introduced rotary position embeddings that encode position via rotation matrices, enabling better length generalization. Used by virtually every moder…

Grouped-Query AttentionArchitecture

2023 · Google Research

Introduced grouped-query attention as a middle ground between multi-head and multi-query attention, reducing KV cache memory while maintaining quality…

SwiGLUArchitecture

2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.

Mistral 7BScaling

2023 · Mistral AI

Introduced sliding window attention and demonstrated that a 7B model could outperform LLaMA 2 13B on all benchmarks.

External Links

Research Paper Announcement

More from Mistral AI

Mixtral 8x7B2023-12 · 46.7B total (12.9B active)

Mistral Large 22024-07 · 123B

Mistral Small 42026-03 · —

Mistral Medium 3.52026-03 · —

Codestral2024-05 · 22B

Pixtral Large2024-11 · 124B

NextMixtral 8x7B