distilbert-base-uncased download

distilbert-base-uncased is a compact, faster alternative to BERT developed through a distillation process. It retains 97% of BERT's language understanding performance while being 40% smaller and 60% faster. Trained on English Wikipedia and BookCorpus, it was distilled using BERT base as the teacher model with three objectives: distillation loss, masked language modeling (MLM), and cosine embedding loss. The model is uncased (treats "english" and "English" as the same) and is suitable for a wide range of downstream NLP tasks like sequence classification, token classification, or question answering. While efficient, it inherits biases present in the original BERT model. DistilBERT is available under the Apache 2.0 license and is compatible with PyTorch, TensorFlow, and JAX.

Features

40% smaller and 60% faster than BERT base
Trained with distillation, MLM, and cosine loss
Achieves 97% of BERT's performance on GLUE benchmarks
Pretrained on BookCorpus and English Wikipedia
Uncased: capitalization is ignored
Ideal for fine-tuning on classification and QA tasks
Available for PyTorch, TensorFlow, and JAX

Project Samples

Project Activity

See All Activity >

Follow distilbert-base-uncased

distilbert-base-uncased Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of distilbert-base-uncased!

Additional Project Details

Registered

2025-07-01

Similar Business Software

RoBERTa

RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples. RoBERTa, which was implemented in PyTorch, modifies key hyperparameters in BERT, including removing BERT’s next-sentence...

See Software
BERT

BERT is a large language model and a method of pre-training language representations. Pre-training refers to how BERT is first trained on a large source of text, such as Wikipedia. You can then apply the training results to other Natural Language Processing (NLP) tasks, such as question...

See Software
Amazon Nova Premier

Amazon Nova Premier is the most advanced model in their Nova family, designed to handle complex tasks and act as a teacher for model distillation. Available on Amazon Bedrock, Nova Premier can process text, images, and video inputs, making it capable of managing intricate workflows, multi-step...

See Software