Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 9,498
- Last push
- Feb 11, 2026 (94d ago)
Risk flags
- No CI pipeline detected
- No tagged releases
- No Docker setup
Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Jiaju Lin, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Bolun Wang, Johan S. Wind, Stanislaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Qinghua Zhou, Jian Zhu, Rui-Jie Zhu
Core AI workload signals detected from paper context and implementation/artifact evidence.
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a ...
novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Generation | RTE | RWKV-adapted | 74.8 | paper-derived | No explicit refs |
| Generation | WNLI | RWKV-adapted | 49.3 | paper-derived | No explicit refs |
| Transformer | RTE | RWKV-adapted | 74.8 | paper-derived | No explicit refs |
| Transformer | WNLI | RWKV-adapted | 49.3 | paper-derived | No explicit refs |
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length.
blinkdl/chatrwkv is the strongest maintained implementation based on ranking signals. License is declared (Apache-2.0). Dependency/environment manifests are present.
Open blinkdl/chatrwkvEvidence graph: 3 refs, 3 links.
Utility signals: depth 90/100, grounding 85/100, status high.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Community adoption signal (5095 stars)
Risk flags
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Preserved for provenance. Not recommended as the default path for new builds.
Dependencies pinned, manual setup needed
Quick start
git clone https://github.com/blinkdl/chatrwkv.git
pip install -r requirements.txt No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Models
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Tasks
Generation, Transformer
Methods
Transformer
Domains
Natural Language Processing
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.