MusicGen

Meta · June 2023

activeOpen Weightdecoder onlyaudio
Parameters3.3B
Variantssmall, medium, large

Why It Matters

Showed that the same transformer architecture powering chatbots could also compose music, opening the door to AI-generated soundtracks and compositions.

Description

Meta's music generation model that creates high-quality stereo music from text descriptions (like 'an upbeat jazz tune with piano and saxophone'). Uses the same autoregressive approach as language models — generating music one audio token at a time — but in a single stage rather than requiring multiple processing steps. Can also transform an existing melody into a different style.

Key Innovations

Text-to-Audio
Text-to-AudioGenerating speech, music, or sound effects from text descriptions.
Autoregressive
AutoregressiveGenerates text one token at a time, each prediction based on all previous tokens. The foundation of modern language models.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.