GPT-3

OpenAI · June 2020

◌ legacyCloseddecoder onlytextAPI Available

Parameters175B

Context Window2K tokens

VariantsAda, Babbage, Curie, Davinci

Sunset DateJanuary 2024

Why It Matters

The model that launched the modern AI era. Its 175 billion parameters showed that scaling up language models dramatically improves capabilities, enabling surprisingly human-like text generation and few-shot learning.

Description

With 175 billion parameters — over 100× larger than GPT-2 — this model demonstrated 'few-shot learning': the ability to perform new tasks from just a few examples in the prompt, without any retraining. This was the model that launched the modern AI era and proved that scale alone could unlock remarkable capabilities.

Notable Milestones

▸Powered the first wave of AI writing assistants
▸Spawned hundreds of startups built on the GPT-3 API
▸Demonstrated few-shot learning — performing tasks from a handful of examples

Key Innovations

Few-Shot

Few-ShotLearning from just a handful of examples provided in the prompt, without retraining.

Scaling Laws

Scaling LawsMathematical relationships showing how model performance improves predictably with more data, compute, and parameters.

Family Tree

Built On

GPT-2

Lineage

GPT-1→GPT-2→GPT-3

Successors (2)

Codex InstructGPT / text-davinci-002

Related Research (3)

TransformerTransformer

2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

GPT-3Transformer

2020 · OpenAI

175B-parameter GPT. Pioneered few-shot and in-context learning, dramatically reducing the need for fine-tuning.

Scaling Laws (Kaplan)Scaling

2020 · OpenAI

Found that model performance follows power laws in compute, parameters, and data. Provided the mathematical framework for scaling decisions.