GPT-Image-1
OpenAI · March 2025
● activeCloseddecoder onlyimageAPI Available
Why It Matters
Marked a shift back to autoregressive image generation, showing that the same architecture used for text could also produce high-quality images when trained at sufficient scale.
Description
OpenAI's latest image generation model, built directly into the GPT-4o architecture rather than being a separate system. Unlike previous diffusion-based image generators, it uses an autoregressive approach (generating images piece by piece, similar to how GPT generates text token by token). Produces images with strong text rendering, world knowledge, and precise instruction following.
Notable Milestones
- ▸Powers image generation in ChatGPT
- ▸First major autoregressive image model to rival diffusion-based approaches
Key Innovations
Text-to-Image
Text-to-ImageGenerating images from text descriptions — the technology behind DALL·E, Midjourney, and Stable Diffusion.
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.