PaLM
Google DeepMind · April 2022
Why It Matters
Demonstrated that scaling a single dense model to 540B parameters unlocked emergent reasoning abilities. First major model to show chain-of-thought prompting could solve complex math and logic problems.
Description
Google's 540-billion-parameter language model — one of the largest dense models ever built. First model trained using Google's Pathways system, which allowed efficient training across thousands of TPU chips. Showed breakthrough ability in chain-of-thought reasoning (solving problems by thinking step-by-step), math, and code generation.
Notable Milestones
- ▸First model trained on Google's Pathways distributed training system
- ▸Demonstrated emergent chain-of-thought reasoning at scale
- ▸Could explain jokes and solve multi-step math problems
Key Innovations
Related Research (3)
Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…
Showed that prompting models to "think step-by-step" unlocks arithmetic, logic, and commonsense reasoning in large models like PaLM.
Showed that SwiGLU activation (Swish + Gated Linear Unit) significantly improves Transformer FFN quality with minimal compute overhead.