RoBERTa

Meta · July 2019

● activeOpen Sourceencoder onlytext

Context Window512 tokens

Why It Matters

Proved that BERT's training recipe mattered as much as its architecture, establishing best practices for pre-training that influenced all subsequent encoder models.

Description

Meta's 'Robustly Optimized BERT' approach demonstrated that BERT was significantly undertrained and that careful tuning of hyperparameters and training data size could substantially improve performance. Became the go-to baseline for NLP research.