Stable Diffusion 3

Stability AI · February 2024

● activeOpen Weightdiffusionimage

VariantsMedium, Large, Large Turbo

Description

Introduced a new architecture called the Multimodal Diffusion Transformer (MMDiT), which processes text and image information together in a unified way rather than treating them separately. Uses flow matching — a more mathematically elegant approach to the diffusion process that produces cleaner, more coherent results, especially for rendering text within images.

Key Innovations

Diffusion

DiffusionGenerates outputs by gradually denoising random noise into coherent images/audio. The backbone of Stable Diffusion and DALL·E.

Text-to-Image

Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.

Transformer

TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Open Weight

Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Related Research (1)

DDPM / DiffusionDiffusion

2020 · UC Berkeley

Showed that gradually adding noise to data and then learning to reverse the process could generate images rivaling GANs, with more stable training and…

External Links

Research Paper Announcement

More from Stability AI

Stable Diffusion 1.52022-08 · ~860M

Stable Diffusion XL2023-07 · 6.6B

PreviousStable Diffusion XL