DeBERTa
Microsoft · June 2020
● activeOpen Sourceencoder onlytext
Why It Matters
First model to surpass human baseline on the SuperGLUE benchmark, proving that architectural innovations in attention mechanisms could push NLU beyond human-level.
Description
Microsoft's 'Decoding-enhanced BERT with disentangled Attention' introduced a novel attention mechanism that separately encodes content and position information, then combines them with a disentangled attention matrix. First model to surpass human performance on the SuperGLUE benchmark.
Key Innovations
disentangled-attention
Masked LM
Masked LMTraining by randomly hiding words and having the model predict them — BERT's key innovation for understanding context.