What is the best open-source implementation of "RWKV: Reinventing RNNs for the Transformer Era"?

The best maintained implementation is blinkdl/chatrwkv with 9,498 stars on GitHub. Confidence: high. Reproducibility: Moderate.

How reproducible is "RWKV: Reinventing RNNs for the Transformer Era"?

Estimated time to first reproduction: a few hours. Risk flags: No CI workflows detected. Start with blinkdl/chatrwkv and validate setup instructions in README.

What framework is used to implement "RWKV: Reinventing RNNs for the Transformer Era"?

The primary implementation uses pytorch.

RWKV: Reinventing RNNs for the Transformer Era

Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Jiaju Lin, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Bolun Wang, Johan S. Wind, Stanislaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Qinghua Zhou, Jian Zhu, Rui-Jie Zhu

Published: May 22, 2023

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 9,498

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

1 risk flag

arXiv PDF

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a ...

Read full abstract

novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.

Technical details

Canonical key: arxiv-2305.13048

Cache status: Fresh

Generated at: May 15, 2026, 5:53 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

1 risk flag

pytorch

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Generation

RTE

RWKV-adapted

74.8

Source: paper fulltext

Generation

WNLI

RWKV-adapted

49.3

Source: paper fulltext

Transformer

RTE

RWKV-adapted

74.8

Source: paper fulltext

Transformer

WNLI

RWKV-adapted

49.3

Source: paper fulltext

Benchmark evidence drill-down

4 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Generation	RTE	RWKV-adapted	74.8	paper-derived	No explicit refs
Generation	WNLI	RWKV-adapted	49.3	paper-derived	No explicit refs
Transformer	RTE	RWKV-adapted	74.8	paper-derived	No explicit refs
Transformer	WNLI	RWKV-adapted	49.3	paper-derived	No explicit refs

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length.

Use This Implementation Because…

Confidence: high

blinkdl/chatrwkv is the strongest maintained implementation based on ranking signals. License is declared (Apache-2.0). Dependency/environment manifests are present.

Open blinkdl/chatrwkv

Reproduction Risks

No CI workflows detected

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 90/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

blinkdl/chatrwkv

best maintained

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 9,498
Last push: Feb 11, 2026 (94d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

BlinkDL/RWKV-LM

historical official

Maintenance: Active

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 14,528
Last push: May 8, 2026 (7d ago)

Releases

Risk flags

No CI pipeline detected
No Docker setup
Dependency manifest missing

sustcsonglin/flash-linear-attention

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Strong

Community adoption signal (5095 stars)

Stars: 5,095
Last push: May 14, 2026 (1d ago)

CIReleasesDependencies

Risk flags

No Docker setup
Low confidence match

Best implementation now

blinkdl/chatrwkv

Confidence: High

Reproducibility: Moderate

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

Stars: 9,498

Forks: 691

Last push: Feb 11, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Community adoption signal (9498 stars)

License ✓

CI –

Deps ✓

Docker –

Selected blinkdl/chatrwkv as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.
Official repository is preserved separately as historical context.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

BlinkDL/RWKV-LM

Stars: 14,528

Last push: May 8, 2026

Reproduction readiness

Setup Required

Time to first repro: hours

Last checked: May 15, 2026

Dependencies pinned, manual setup needed

· blinkdl/chatrwkv has requirements.txt but requires manual environment setup.
· No Dockerfile — you will set up the environment manually.
· No CI pipeline — test coverage is unknown.

Open blinkdl/chatrwkv

Quick start

git clone https://github.com/blinkdl/chatrwkv.git
pip install -r requirements.txt

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (4)

These repositories had low-confidence matching signals and are hidden by default.

sustcsonglin/flash-linear-attention

Confidence: Low

Stars: 5,095
rwkv/rwkv.cpp

Confidence: Low

Stars: 1,568
rwkv/rwkv-lm

Confidence: Low

Stars: 62
hannibal046/nanorwkv

Confidence: Low

Stars: 197

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2305.13048 RWKV RNNs

Datasets

arxiv:2305.13048 RWKV dataset Transformer benchmark

Spaces

arxiv:2305.13048 RWKV demo Transformer gradio

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Tasks

Generation, Transformer

Methods

Transformer

Domains

Natural Language Processing

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Generation Transformer Natural Language Processing

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote