Matched via arXiv identifier search · Strong overlap with paper title keywords
- Stars
- 730
- Last push
- May 22, 2026 (7d ago)
Risk flags
- No CI pipeline detected
- No tagged releases
- No Docker setup
Yuwen Du, Rui Ye, Shuo Tang, Keduan Huang, Xinyu Zhu, Yuzhu Cai, Siheng Chen
Core AI workload signals detected from paper context and implementation/artifact evidence.
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informati ...
ve and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants.
PolarSeeker/OpenSeeker is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. Dependency/environment manifests are present.
Open PolarSeeker/OpenSeekerLLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_6], evidencePack.paperSections[id=paper_caption_5], evidencePack.paperSections[id=paper_caption_2], evidencePack.paperSections[id=paper_8], evidencePack.paperSections[id=paper_table_2], evidencePack.paperSections[id=paper_table_1], evidencePack.paperSections[id=paper_caption_1], paper.title, summary.hasReliableImplementation
Evidence graph: 4 refs, 4 links.
Utility signals: depth 60/100, grounding 85/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search · Strong overlap with paper title keywords
Risk flags
Matched via arXiv identifier search
Risk flags
Matched via arXiv identifier search
Risk flags
OpenSeeker: A search agent with open-source data and models
Dependencies pinned, manual setup needed
Quick start
git clone https://github.com/PolarSeeker/OpenSeeker.git
pip install -r requirements.txt No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
No trustworthy dataset matches right now.
Search datasets on Hugging FaceNo trustworthy demo spaces right now.
Search spaces on Hugging FaceTasks
Agentic tool use
Methods
Reinforcement learning
Domains
Natural Language Processing, Large Language Models, AI Agents
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.