Sora

OpenAI · December 2024

● activeCloseddiffusionmultimodal

Why It Matters

Stunned the world with its preview videos showing unprecedented realism and physical understanding, setting a new bar for what AI-generated video could look like.

Description

OpenAI's text-to-video model that combines a diffusion approach (gradually refining noise into clear visuals) with a transformer architecture. Can generate up to 1-minute videos with realistic physics, complex scene composition, and consistent characters. Previewed in February 2024 and publicly launched in December 2024.

Notable Milestones

▸Generated photorealistic videos that went viral on social media
▸Demonstrated understanding of 3D physics and object permanence in generated videos

Key Innovations

Diffusion

DiffusionGenerates outputs by gradually denoising random noise into coherent images/audio. The backbone of Stable Diffusion and DALL·E.

Text-to-Video

Text-to-VideoGenerating video clips from text descriptions — one of the newest and most compute-intensive AI capabilities.

Transformer

TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

External Links

Research Paper Announcement

More from OpenAI GPT

InstructGPT / text-davinci-0022022-01 · 175B

GPT-3.5 / ChatGPT2022-11 · 175B

GPT-42023-03 · ~1.7T (est. MoE)

GPT-4 Turbo2023-11 · ~1.7T (est. MoE)

GPT-4o2024-05 · —

GPT-4o Mini2024-07 · —

PreviousGPT-4o Mini

NextGPT-4.5 Preview