Stable Diffusion 3
Stability AI · February 2024
● activeOpen Weightdiffusionimage
VariantsMedium, Large, Large Turbo
Description
Introduced a new architecture called the Multimodal Diffusion Transformer (MMDiT), which processes text and image information together in a unified way rather than treating them separately. Uses flow matching — a more mathematically elegant approach to the diffusion process that produces cleaner, more coherent results, especially for rendering text within images.
Key Innovations
Diffusion
DiffusionGenerates outputs by gradually denoising random noise into coherent images/audio. The backbone of Stable Diffusion and DALL·E.
Text-to-Image
Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.
Family Tree
Built On
Lineage
Successors (1)
Related Research (1)
DDPM / DiffusionDiffusion
2020 · UC Berkeley
Showed that gradually adding noise to data and then learning to reverse the process could generate images rivaling GANs, with more stable training and…