ALBERT
Google · September 2019
● activeOpen Sourceencoder onlytext
Description
Google's 'A Lite BERT' that dramatically reduced BERT's parameter count through cross-layer parameter sharing and factorized embedding parameterization — achieving comparable performance with 18× fewer parameters.
Key Innovations
parameter-sharing
Masked LM
Masked LMTraining by randomly hiding words and having the model predict them — BERT's key innovation for understanding context.