ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Published: Apr 1, 2020

Best maintained implementation now

Evidence: Direct

Domain fit: AI-adjacent

Verified repos: 1

Top repo stars: 3,854

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: pytorch

Time to first repro: a few days

2 risk flags

Technical details

Canonical key: arxiv-2004.12832

Cache status: Stale (SWR served)

Generated at: May 2, 2026, 1:18 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Efficient Effective Passage Search Contextualized Late

MS MARCO

MRR

50

Split: development

Source: paper fulltext

Efficient Effective Passage Search Contextualized Late

BM25 (official)

Recall@1000.

81.4

Source: paper fulltext

Efficient Effective Passage Search Contextualized Late

KNRM

MRR@10

19.8

Source: paper fulltext

Efficient Effective Passage Search Contextualized Late

Duet

MRR@10

24.5

Source: paper fulltext

Efficient Effective Passage Search Contextualized Late

BM25 (Anserini)

Recall@1000.

85.7

Source: paper fulltext

Benchmark evidence drill-down

5 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Efficient Effective Passage Search Contextualized Late	MS MARCO	MRR	50	paper-derived	No explicit refs
Efficient Effective Passage Search Contextualized Late	BM25 (official)	Recall@1000.	81.4	paper-derived	No explicit refs
Efficient Effective Passage Search Contextualized Late	KNRM	MRR@10	19.8	paper-derived	No explicit refs
Efficient Effective Passage Search Contextualized Late	Duet	MRR@10	24.5	paper-derived	No explicit refs
Efficient Effective Passage Search Contextualized Late	BM25 (Anserini)	Recall@1000.	85.7	paper-derived	No explicit refs

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

stanford-futuredata/ColBERT is the strongest maintained implementation based on ranking signals. License is declared (MIT).

Open stanford-futuredata/ColBERT

Reproduction Risks

No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 100/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

stanford-futuredata/ColBERT

best maintained

Maintenance: Stale risk

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Community adoption signal (3854 stars)

Stars: 3,854
Last push: Oct 14, 2025 (201d ago)

Releases

Risk flags

No CI pipeline detected
No Docker setup
Dependency manifest missing

hltcoe/colbert-x

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Moderate

Community adoption signal (73 stars)

Stars: 73
Last push: Jun 23, 2025 (314d ago)

ReleasesDependencies

Risk flags

No CI pipeline detected
No Docker setup
Low confidence match

terrierteam/pyterrier_colbert

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search · Community adoption signal (89 stars)

Stars: 89
Last push: Apr 3, 2025 (395d ago)

CIDependencies

Risk flags

No push in 12+ months
No tagged releases
No Docker setup

Best implementation now

stanford-futuredata/ColBERT

Confidence: High

Reproducibility: Limited

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Stars: 3,854

Forks: 469

Last push: Oct 14, 2025

License: MIT

Official implementation from Papers with Code

Community adoption signal (3854 stars)

License ✓

CI –

Deps –

Docker –

Selected stanford-futuredata/ColBERT as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: May 2, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· stanford-futuredata/ColBERT has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.
· Last push was 201 days ago.

Open stanford-futuredata/ColBERT

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (2)

These repositories had low-confidence matching signals and are hidden by default.

hltcoe/colbert-x

Confidence: Low

Stars: 73
terrierteam/pyterrier_colbert

Confidence: Low

Stars: 89

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

jinaai/jina-colbert-v1-en

Curated Related

Downloads: 446

Likes: 100
colbert-ir/colbertv2.0

Curated Related

Downloads: 14,121,693

Likes: 333
sebastian-hofstaetter/colbert-distilbert-margin_mse-T2-msmarco

Curated Related

Downloads: 27

Likes: 15

Broaden model search

Efficient Effective Passage Search Contextualized Late colbert efficient effective passage

Datasets

No trustworthy dataset matches right now.

Search datasets on Hugging Face

Spaces

No trustworthy demo spaces right now.

Search spaces on Hugging Face

Explore on Hugging Face

Search models Search datasets Search spaces

Research context

Tasks

Efficient Effective Passage Search Contextualized Late

Methods

None detected

Domains

None detected

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Efficient Effective Passage Search Contextualized Late

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote