PaLM

Google DeepMind · April 2022

discontinuedCloseddecoder onlytext
Parameters540B
Context Window8K tokens
Sunset DateFebruary 2024

Why It Matters

Demonstrated that scaling a single dense model to 540B parameters unlocked emergent reasoning abilities. First major model to show chain-of-thought prompting could solve complex math and logic problems.

Description

Google's 540-billion-parameter language model — one of the largest dense models ever built. First model trained using Google's Pathways system, which allowed efficient training across thousands of TPU chips. Showed breakthrough ability in chain-of-thought reasoning (solving problems by thinking step-by-step), math, and code generation.

Notable Milestones

  • First model trained on Google's Pathways distributed training system
  • Demonstrated emergent chain-of-thought reasoning at scale
  • Could explain jokes and solve multi-step math problems

Key Innovations

Chain-of-Thought
Chain-of-ThoughtPrompting technique where the model 'thinks out loud' step by step before giving a final answer.
Scaling Laws
Scaling LawsMathematical relationships showing how model performance improves predictably with more data, compute, and parameters.

Family Tree

Successors (2)

Related Research (3)

TransformerTransformer
2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

2022 · Google

Showed that prompting models to "think step-by-step" unlocks arithmetic, logic, and commonsense reasoning in large models like PaLM.

SwiGLUArchitecture
2020 · Google

Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.

Enabled By

A100NVIDIA · May 2020
312 TFLOPS FP16 Tensor
TPU v4GOOGLE · May 2021
275 TFLOPS bfloat16