- Efficient PRM Training Data Synthesis via Formal Verification
Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang · May 21, 2025 · Citations: 0
However, existing approaches for constructing PRM training data remain costly and noisy, as they typically rely on human annotation or sampling-based labeling methods that require repeated LLM calls.
- VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Yuchen Yan, Jin Jiang, Zhenbang Ren, Yijun Li, Xudong Cai · May 21, 2025 · Citations: 0
Pairwise Preference
In this paper, we introduce VerifyBench and its challenging variant VerifyBench-Hard, two benchmarks specifically designed to assess reference-based reward systems.
- Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Kefan Song, Amir Moeini, Peng Wang, Lei Gong, Rohan Chandra · May 21, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
Pingqing Zheng, Jiayin Qin, Fuqi Zhang, Niraj Chitla, Zishen Wan · May 21, 2025 · Citations: 0
- Explainable embeddings with Distance Explainer
Christiaan Meijer, E. G. Patrick Bos · May 21, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs
Artem Zabolotnyi, Roman Makarov, Mile Mitrovic, Polina Proskura, Oleg Travkin · May 21, 2025 · Citations: 0
Experiments across seven classification datasets and two NER benchmarks, evaluated on five language models (RoBERTa, ELECTRA, LLaMA-2, Qwen2.5, and Qwen3), show that ALIEN consistently outperforms strong baselines across all considered…
- Guided Policy Optimization under Partial Observability
Yueheng Li, Guangming Xie, Zongqing Lu · May 21, 2025 · Citations: 0
- Understanding the Anchoring Effect of LLM with Synthetic Data: Existence, Mechanism, and Potential Mitigations
Yiming Huang, Biquan Bie, Zuqiu Na, Weilin Ruan, Songxin Lei · May 21, 2025 · Citations: 0
Combining refined evaluation metrics, we benchmark current widely used LLMs.
- A quantitative analysis of semantic information in deep representations of text and images
Santiago Acevedo, Andrea Mascaretti, Riccardo Rende, Matéo Mahaut, Marco Baroni · May 21, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SAKE: Structured Agentic Knowledge Extrapolation for Complex LLM Reasoning via Reinforcement Learning
Jiashu He, Jinxuan Fan, Bowen Jiang, Ignacio Houine, Dan Roth · May 21, 2025 · Citations: 0
Long Horizon
We propose SAKE (Structured Agentic Knowledge Extrapolation), a RL powered agentic framework that trains LLMs to autonomously retrieve and extrapolate structured knowledge through tool-augmented reinforcement learning.
- MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo · May 21, 2025 · Citations: 0
- Entailed Opinion Matters: Improving the Fact-Checking Performance of Language Models by Relying on their Entailment Ability
Gaurav Kumar, Ayush Garg, Debajyoti Mazumder, Aditya Kishore, Babu kumar · May 21, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision
Zixuan Ke, Austin Xu, Yifei Ming, Xuan-Phi Nguyen, Ryan Chin · May 21, 2025 · Citations: 0
Critique Edit Multi Agent
Multi-agent systems (MAS) leveraging the impressive capabilities of Large Language Models (LLMs) hold significant potential for tackling complex tasks.