An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Q: What is the best open-source implementation of "An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models"?

The best maintained implementation is mcgill-nlp/bias-bench with 156 stars on GitHub. Confidence: high. Reproducibility: Limited.

Q: How reproducible is "An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models"?

Estimated time to first reproduction: a few days. Risk flags: License metadata missing, No CI workflows detected, Dependency manifest is missing. Start with mcgill-nlp/bias-bench and validate setup instructions in README.

Q: What framework is used to implement "An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models"?

The primary implementation uses pytorch.

Published: Oct 1, 2021

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 156

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few days

3 risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2110.08527

Cache status: Stale (SWR served)

Generated at: May 8, 2026, 6:08 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

3 risk flags

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Natural language processing

BERT

Perplexity

4.469

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	BERT	Perplexity	4.469	paper-derived	No explicit refs

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

mcgill-nlp/bias-bench is the strongest maintained implementation based on ranking signals.

Open mcgill-nlp/bias-bench

Reproduction Risks

License metadata missing
No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 100/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

mcgill-nlp/bias-bench

best maintained

Maintenance: Stale risk

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 156
Last push: Aug 18, 2025 (264d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

mcgill-nlp/debias-eval

historical official

Maintenance: Stale risk

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 156
Last push: Aug 18, 2025 (264d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

princeton-nlp/mabel

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Moderate

Community adoption signal (38 stars) · Repository appears stale (>24 months since last push)

Stars: 38
Last push: Dec 14, 2023 (877d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

mcgill-nlp/bias-bench

Confidence: High

Reproducibility: Limited

ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.

Stars: 156

Forks: 41

Last push: Aug 18, 2025

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (156 stars)

License –

CI –

Deps –

Docker –

Selected mcgill-nlp/bias-bench as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.
Official repository is preserved separately as historical context.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

mcgill-nlp/debias-eval

Stars: 156

Last push: Aug 18, 2025

Reproduction readiness

Major Work

Time to first repro: days

Last checked: May 8, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· mcgill-nlp/bias-bench has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.
· Last push was 264 days ago.

Open mcgill-nlp/bias-bench