Sora

OpenAI · December 2024

activeCloseddiffusionmultimodal

Why It Matters

Stunned the world with its preview videos showing unprecedented realism and physical understanding, setting a new bar for what AI-generated video could look like.

Description

OpenAI's text-to-video model that combines a diffusion approach (gradually refining noise into clear visuals) with a transformer architecture. Can generate up to 1-minute videos with realistic physics, complex scene composition, and consistent characters. Previewed in February 2024 and publicly launched in December 2024.

Notable Milestones

  • Generated photorealistic videos that went viral on social media
  • Demonstrated understanding of 3D physics and object permanence in generated videos

Key Innovations

Diffusion
DiffusionGenerates outputs by gradually denoising random noise into coherent images/audio. The backbone of Stable Diffusion and DALL·E.
Text-to-Video
Text-to-VideoGenerating video clips from text descriptions — one of the newest and most compute-intensive AI capabilities.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.