BERT

Google · October 2018

activeOpen Sourceencoder onlytext
Parameters340M
Context Window512 tokens

Why It Matters

Revolutionized NLP by introducing bidirectional context — understanding words based on BOTH their left and right context. Became the backbone of Google Search and spawned hundreds of derivatives.

Description

Google's Bidirectional Encoder Representations from Transformers — a model that reads text in both directions simultaneously (left-to-right AND right-to-left) to understand context, unlike GPT which only reads left-to-right. Trained by randomly hiding words and learning to predict them (called masked language modeling). Dominated NLP benchmarks for years and became the backbone of Google Search.

Notable Milestones

  • Powered Google Search's understanding of queries
  • Spawned RoBERTa, ALBERT, DistilBERT and hundreds of derivatives
  • Became the standard baseline for NLP research benchmarks

Key Innovations

Masked LM
Masked LMTraining by randomly hiding words and having the model predict them — BERT's key innovation for understanding context.
Transformer
TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Family Tree

Related Research (2)

TransformerTransformer
2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

BERTTransformer
2018 · Google

Encoder-only bidirectional pretraining with masked language modeling (MLM) and next-sentence prediction. Set SOTA on GLUE benchmarks.

Enabled By

Tesla V100NVIDIA · May 2017
125 TFLOPS FP16 Tensor

External Links