Matched via arXiv identifier search · Strong overlap with paper title keywords
- Stars
- 665
- Last push
- Jun 15, 2026 (3d ago)
Risk flags
- No CI pipeline detected
- No tagged releases
- No Docker setup
Pengcheng Jiang, Zhiyi Shi, Kelly Hong, Xueqiang Xu, Jiashuo Sun, Jimeng Sun, Hammad Bashir, Jiawei Han
Core AI workload signals detected from paper context and implementation/artifact evidence.
Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and re ...
coverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available at https://github.com/pat-jj/harness-1.
No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.
Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked.
pat-jj/harness-1 is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. License is declared (Apache-2.0). Dependency/environment manifests are present.
Open pat-jj/harness-1Evidence graph: 4 refs, 4 links.
Utility signals: depth 55/100, grounding 85/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search · Strong overlap with paper title keywords
Risk flags
🚀 Ultra Recipe for Training Long-Horizon Search Agents - matching frontier AI's search capability with a 20B model + stateful harness
Dependencies pinned, manual setup needed
Quick start
git clone https://github.com/pat-jj/harness-1.git
pip install -e . No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
No trustworthy dataset matches right now.
Search datasets on Hugging FaceNo trustworthy demo spaces right now.
Search spaces on Hugging FaceTasks
Agentic tool use, Retrieval / indexing
Methods
Reinforcement learning
Domains
AI Agents, Information Retrieval
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.