DeBERTa

Microsoft · June 2020

● activeOpen Sourceencoder onlytext

Why It Matters

First model to surpass human baseline on the SuperGLUE benchmark, proving that architectural innovations in attention mechanisms could push NLU beyond human-level.

Description

Microsoft's 'Decoding-enhanced BERT with disentangled Attention' introduced a novel attention mechanism that separately encodes content and position information, then combines them with a disentangled attention matrix. First model to surpass human performance on the SuperGLUE benchmark.