LLM Treeof Life

GPT-Image-1

OpenAI · March 2025

● activeCloseddecoder onlyimageAPI Available

Why It Matters

Marked a shift back to autoregressive image generation, showing that the same architecture used for text could also produce high-quality images when trained at sufficient scale.

Description

OpenAI's latest image generation model, built directly into the GPT-4o architecture rather than being a separate system. Unlike previous diffusion-based image generators, it uses an autoregressive approach (generating images piece by piece, similar to how GPT generates text token by token). Produces images with strong text rendering, world knowledge, and precise instruction following.

Notable Milestones

▸Powers image generation in ChatGPT
▸First major autoregressive image model to rival diffusion-based approaches

Key Innovations

Text-to-Image

Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

Autoregressive

AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.

Family Tree

Built On

Lineage

DALL·E→DALL·E 2→DALL·E 3→GPT-Image-1

External Links

More from Image Generation

DALL·E2021-01 · 12B

DALL·E 22022-04 · 3.5B

DALL·E 32023-10 · —

Flux.12024-08 · 12B

Runway Gen-3 Alpha2024-06 · —

Pika2023-06 · —

Kling2024-06 · —

Luma Dream Machine2024-06 · —