- Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang, Zhangyi Jiang, Zhenqi He, Shenyang Tong, Wenhan Yang · Mar 16, 2025 · Citations: 0
Long Horizon
Empirical results on the PRM800K dataset show that HRM, together with HNC, provides more stable and reliable evaluations than PRM.
- HyConEx: Hypernetwork classifier with counterfactual explanations for tabular data
Patryk Marszałek, Kamil Książek, Oleksii Furman, Ulvi Movsum-zada, Przemysław Spurek · Mar 16, 2025 · Citations: 0
- A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xin Jiang · Mar 16, 2025 · Citations: 0
Long Horizon
With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely adopted in various fields, becoming essential for autonomous decision-making and interactive tasks.
- Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
Zhi Chen, Wei Ma, Lingxiao Jiang · Mar 16, 2025 · Citations: 0
- Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes
Zhanliang Wang, Da Wu, Quan Nguyen, Kai Wang · Mar 15, 2025 · Citations: 0
These studies typically use Human Phenotype Ontology (HPO) terms to prompt foundation models like GPT and LLaMA to predict candidate genes.
- Interpretable Deep Learning Framework for Improved Disease Classification in Medical Imaging
Jutika Borah, Hidam Kumarjit Singh · Mar 14, 2025 · Citations: 0
The framework is evaluated on four medical imaging benchmark datasets: chest X-rays of COVID-19, Tuberculosis, Pneumonia, and retinal Optical Coherence Tomography (OCT) images.
- Implicit Bias-Like Patterns in Reasoning Models
Messi H. J. Lee, Calvin K. Lai · Mar 14, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control
Yifeng Zhang, Yilin Liu, Ping Gong, Peizhuo Li, Mingfeng Fan · Mar 14, 2025 · Citations: 0
- Reasoning-Grounded Natural Language Explanations for Language Models
Vojtech Cahlik, Rodrigo Alves, Pavel Kordik · Mar 14, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu, Dong Gong, Yichao Cai, Erdun Gao, Zhen Zhang · Mar 12, 2025 · Citations: 0
- PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization
Zhiwen You, Yue Guo · Mar 11, 2025 · Citations: 0
Existing automatic factual consistency evaluation methods, such as entailment- and question-answering (QA) -based, struggle with plain language summarization (PLS) due to elaborative explanation phenomenon, which introduces external content…
- Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Xiaoxiao Liu, Qingying Xiao, Bingquan Zhang, Junying Chen, Xiangyi Feng · Mar 11, 2025 · Citations: 0
However, there is a lack of standardized evaluation criteria to assess their effectiveness, particularly in dynamic, interactive scenarios.