StarCoder

BigCode / Hugging Face · May 2023

activeOpen Sourcedecoder onlycode
Parameters15.5B
Context Window8K tokens

Why It Matters

The first major open-source code model trained transparently on a curated, legally vetted dataset (The Stack) — proving code AI could be built responsibly.

Description

The first major open-source code model trained transparently on The Stack — a carefully curated dataset of permissively licensed code. Built by BigCode, a collaboration between Hugging Face and ServiceNow, it set new standards for responsible AI development by allowing developers to check if their code was in the training data and opt out.

Key Innovations

Code Gen
Code GenAbility to write, debug, and understand programming code across multiple languages.
Open Weight
Open WeightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.

Family Tree

Successors (1)

External Links