PaLM

Google DeepMind · April 2022

☠ discontinuedCloseddecoder onlytext

Parameters540B

Context Window8K tokens

Sunset DateFebruary 2024

Why It Matters

Demonstrated that scaling a single dense model to 540B parameters unlocked emergent reasoning abilities. First major model to show chain-of-thought prompting could solve complex math and logic problems.

Description

Google's 540-billion-parameter language model — one of the largest dense models ever built. First model trained using Google's Pathways system, which allowed efficient training across thousands of TPU chips. Showed breakthrough ability in chain-of-thought reasoning (solving problems by thinking step-by-step), math, and code generation.

Notable Milestones

▸First model trained on Google's Pathways distributed training system
▸Demonstrated emergent chain-of-thought reasoning at scale
▸Could explain jokes and solve multi-step math problems

Key Innovations

Chain-of-Thought

Chain-of-ThoughtPrompting technique where the model 'thinks out loud' step by step before giving a final answer.

Scaling Laws

Scaling LawsMathematical relationships showing how model performance improves predictably with more data, compute, and parameters.

Family Tree

Successors (2)

PaLM 2 PaLM-E

Related Research (3)

TransformerTransformer

2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

Chain-of-ThoughtReasoning

2022 · Google

Showed that prompting models to "think step-by-step" unlocks arithmetic, logic, and commonsense reasoning in large models like PaLM.

SwiGLUArchitecture

2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.