What is the best open-source implementation of "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"?

The best maintained implementation is huggingface/transformers with 161,254 stars on GitHub. Confidence: high. Reproducibility: Strong.

Are there pretrained models available for "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"?

Yes, 2 Hugging Face models found. The top result is distilbert/distilbert-base-cased-distilled-squad with 227,481 downloads.

What framework is used to implement "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"?

The primary implementation uses pytorch.

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Q: How reproducible is "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with huggingface/transformers and validate setup instructions in README.

Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf

Published: Oct 2, 2019

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 161,254

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good perfor ...

Read full abstract

mances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.

Technical details

Canonical key: arxiv-1910.01108

Cache status: Fresh

Generated at: Jun 4, 2026, 4:47 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Natural language processing

SQuAD

Accuracy

1.1

Split: test

Source: paper fulltext

Natural language processing

ELMo

SST-2

91.5

Source: paper fulltext

Natural language processing

BERT-base

SST-2

92.7

Source: paper fulltext

Benchmark evidence drill-down

3 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	SQuAD	Accuracy	1.1	paper-derived	No explicit refs
Natural language processing	ELMo	SST-2	91.5	paper-derived	No explicit refs
Natural language processing	BERT-base	SST-2	92.7	paper-derived	No explicit refs

Use This Implementation Because…

Confidence: high

huggingface/transformers is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open huggingface/transformers

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 90/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

huggingface/transformers

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 161,254
Last push: Jun 3, 2026 (1d ago)

CIReleasesDependencies

Risk flags

No Docker setup

huggingface/swift-coreml-transformers

historical official

Maintenance: Archived

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 1,683
Last push: Nov 24, 2023 (923d ago)

Risk flags

Repository archived
No push in 12+ months
No CI pipeline detected

allenai/scifact

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Moderate

Community adoption signal (261 stars) · Repository appears stale (>24 months since last push)

Stars: 261
Last push: Oct 15, 2023 (963d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

huggingface/transformers

Confidence: High

Reproducibility: Strong

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Stars: 161,254

Forks: 33,396

Last push: Jun 3, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Community adoption signal (161254 stars)

License ✓

CI ✓

Deps ✓

Docker –

Selected huggingface/transformers as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

huggingface/swift-coreml-transformers

Stars: 1,683

Last push: Nov 24, 2023

Archived

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Jun 4, 2026

Ready to reproduce

· Clone huggingface/transformers and install dependencies from pyproject.toml.
· CI pipeline detected — automated tests are in place.
· Last updated 1 days ago.

Open huggingface/transformers

Quick start

git clone https://github.com/huggingface/transformers.git
pip install -e .

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (3)

These repositories had low-confidence matching signals and are hidden by default.

allenai/scifact

Confidence: Low

Stars: 261
epfml/collaborative-attention

Confidence: Low

Stars: 152
facebookresearch/EgoTV

Confidence: Low

Stars: 27

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

distilbert/distilbert-base-cased-distilled-squad

Curated Related

Downloads: 227,481

Likes: 267
distilbert/distilbert-base-uncased-distilled-squad

Curated Related

Downloads: 64,661

Likes: 119

Broaden model search

Transformer Natural language processing Transformer Natural Language Processing Natural language processing

Datasets

No trustworthy dataset matches right now.

Search datasets on Hugging Face

Spaces

No trustworthy demo spaces right now.

Search spaces on Hugging Face

Explore on Hugging Face

Search models Search datasets Search spaces

Research context

Tasks

Natural language processing

Methods

Transformer

Domains

Natural Language Processing

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Natural language processing Transformer Natural Language Processing

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote