OpenTrain AI
Maintained implementation availablepytorchPretrained Models Available

RLHF Workflow: From Reward Modeling to Online RLHF

May 1, 2024arXiv: 2405.07863
3 repos1,524 stars~a few days to reproduce
arXiv PDF

Abstract

Results & Benchmarks

TaskDatasetMetricValue
Reinforcement learningLLaMA-3-8B-itGSM-8K79.6
Reinforcement learningOurs (SFT baseline)GSM-8K74.2

Hardware Requirements

  • Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Recipes to train reward model for RLHF.

1.5k 108 Apr 2025 Apache-2.0
License
CI
Deps
Docker
  • Selected RLHFlow/RLHF-Reward-Modeling as the strongest maintained implementation for new work.
  • Repository activity is within the last 24 months.
  • Official repository is preserved separately as historical context.

Reproduction Path

  1. 1

    Start with RLHFlow/RLHF-Reward-Modeling and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing

Additional Implementations

Official

No additional official repositories detected.

Community

  • RLHFlow/Online-RLHFConfidence: low

    A recipe for online RLHF and online iterative DPO.

    Stars: 543Forks: 48Last push: Dec 2024

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.