RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

Q: What is the best open-source implementation of "RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models"?

The best maintained implementation is interactivenlp-team/rolellm-public with 525 stars on GitHub. Confidence: high. Reproducibility: Limited.

Q: How reproducible is "RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models"?

Estimated time to first reproduction: a few days. Risk flags: License metadata missing, No CI workflows detected, Dependency manifest is missing. Start with interactivenlp-team/rolellm-public and validate setup instructions in README.

Q: What framework is used to implement "RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models"?

The primary implementation uses none.

Published: Oct 1, 2023

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 525

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: none

Time to first repro: a few days

3 risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2310.00746

Cache status: Fresh

Generated at: Jun 18, 2026, 3:08 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

3 risk flags

none

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Natural language processing

Vicuna

GPT-4

32.0

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	Vicuna	GPT-4	32.0	paper-derived	No explicit refs

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

interactivenlp-team/rolellm-public is the strongest maintained implementation based on ranking signals.

Open interactivenlp-team/rolellm-public

Reproduction Risks

License metadata missing
No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 100/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

interactivenlp-team/rolellm-public

best maintained

Maintenance: Stale

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 525
Last push: Oct 11, 2024 (615d ago)

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

yuyouyu32/beyonddialogue

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Limited

Community adoption signal (50 stars)

Stars: 50
Last push: May 19, 2025 (395d ago)

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

InteractiveNLP-Team/RoleLLM-public

alternative

Maintenance: Stale

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 525
Last push: Oct 11, 2024 (615d ago)

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

interactivenlp-team/rolellm-public

Confidence: High

Reproducibility: Limited

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

Stars: 525

Forks: 18

Last push: Oct 11, 2024

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (525 stars)

License –

CI –

Deps –

Docker –

Selected interactivenlp-team/rolellm-public as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: Jun 18, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· interactivenlp-team/rolellm-public has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.
· Last push was 615 days ago.

Open interactivenlp-team/rolellm-public

Additional implementations

Official

No additional official repositories detected.

Community

InteractiveNLP-Team/RoleLLM-public
Confidence: Medium

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

Stars: 525

Last push: Oct 11, 2024