ALBERT

Google · September 2019

● activeOpen Sourceencoder onlytext

Description

Google's 'A Lite BERT' that dramatically reduced BERT's parameter count through cross-layer parameter sharing and factorized embedding parameterization — achieving comparable performance with 18× fewer parameters.

Key Innovations

parameter-sharing

Masked LM

Masked LMTraining by randomly hiding words and having the model predict them — BERT's key innovation for understanding context.

Family Tree

Built On

BERT

Lineage

BERT→ALBERT

External Links

Research Paper

More from Foundational

DistilBERT2019-10 · 66M