No verified implementation yet

Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization

Shaohua Duan, Pengcheng Huang, Xinze Li, Zhenghao Liu, Xiaoyuan Yi +5 more

August 19, 2025arXiv: 2508.13993

0 repos~a few days to reproduce

Abstract

Long-context modeling is critical for a wide range of real-world tasks, including long-context question answering, summarization, and complex reasoning tasks. Recent studies have explored fine-tuning Large Language Models (LLMs) with synthetic data to enhance their long-context capabilities. However, the effectiveness of such approaches is often limited by the low diversity and factual inconsistencies in the generate...

Results & Benchmarks

Task	Dataset	Metric	Value
Natural language processing	Vanilla LLM	16k	30.00
Natural language processing	LongAlpaca	16k	33.33

Hardware Requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Maintained implementation evidence is not confirmed for this paper yet.

Use the Implementation Status and Reproduction Path sections below for the current action plan.

Reproduction Path

Follow this baseline workflow to decide if this paper is worth immediate prototyping.

1
Use the paper and benchmark evidence to scope a baseline reproduction plan.
2
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few daysEstimate is based on paper-only reproduction flow

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2508.13993 Multi-Armed Bandit-Guided

datasets

arxiv:2508.13993 Multi-Armed dataset

spaces

arxiv:2508.13993 Multi-Armed demo

Research Context