Sora
OpenAI · December 2024
● activeCloseddiffusionmultimodal
Why It Matters
Stunned the world with its preview videos showing unprecedented realism and physical understanding, setting a new bar for what AI-generated video could look like.
Description
OpenAI's text-to-video model that combines a diffusion approach (gradually refining noise into clear visuals) with a transformer architecture. Can generate up to 1-minute videos with realistic physics, complex scene composition, and consistent characters. Previewed in February 2024 and publicly launched in December 2024.
Notable Milestones
- ▸Generated photorealistic videos that went viral on social media
- ▸Demonstrated understanding of 3D physics and object permanence in generated videos
Key Innovations
Diffusion
DiffusionGenerates outputs by gradually denoising random noise into coherent images/audio. The backbone of Stable Diffusion and DALL·E.
Text-to-Video
Text-to-VideoGenerating video clips from text descriptions — one of the newest and most compute-intensive AI capabilities.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.