DeBERTa

Microsoft · June 2020

activeOpen Sourceencoder onlytext

Why It Matters

First model to surpass human baseline on the SuperGLUE benchmark, proving that architectural innovations in attention mechanisms could push NLU beyond human-level.

Description

Microsoft's 'Decoding-enhanced BERT with disentangled Attention' introduced a novel attention mechanism that separately encodes content and position information, then combines them with a disentangled attention matrix. First model to surpass human performance on the SuperGLUE benchmark.

Key Innovations

disentangled-attention
Masked LM
Masked LMTraining by randomly hiding words and having the model predict them — BERT's key innovation for understanding context.

Family Tree

Built On

Lineage

BERTDeBERTa

External Links