BERT

Google · October 2018

● activeOpen Sourceencoder onlytext

Parameters340M

Context Window512 tokens

Why It Matters

Revolutionized NLP by introducing bidirectional context — understanding words based on BOTH their left and right context. Became the backbone of Google Search and spawned hundreds of derivatives.

Description

Google's Bidirectional Encoder Representations from Transformers — a model that reads text in both directions simultaneously (left-to-right AND right-to-left) to understand context, unlike GPT which only reads left-to-right. Trained by randomly hiding words and learning to predict them (called masked language modeling). Dominated NLP benchmarks for years and became the backbone of Google Search.

Notable Milestones

▸Powered Google Search's understanding of queries
▸Spawned RoBERTa, ALBERT, DistilBERT and hundreds of derivatives
▸Became the standard baseline for NLP research benchmarks

Key Innovations

Masked LM

Masked LMTraining by randomly hiding words and having the model predict them — BERT's key innovation for understanding context.

Transformer

TransformerNeural network architecture using self-attention to process entire sequences in parallel. Replaced RNNs and enabled massive scaling.

Family Tree

Successors (6)

RoBERTa T5 XLNet ALBERT DistilBERT DeBERTa

Related Research (2)

TransformerTransformer

2017 · Google Brain

Introduced the Transformer architecture using self-attention mechanisms, replacing RNNs entirely. Enabled parallel training and superior long-range de…

BERTTransformer

2018 · Google

Encoder-only bidirectional pretraining with masked language modeling (MLM) and next-sentence prediction. Set SOTA on GLUE benchmarks.

Enabled By

Tesla V100NVIDIA · May 2017

125 TFLOPS FP16 Tensor

External Links

Research Paper

More from Foundational

DistilBERT2019-10 · 66M

DeBERTa2020-06 · —

FLAN-T52022-10 · —

NextXLNet