- A Scalable Framework for Evaluating Health Language Models
Neil Mallinar, A. Ali Heydari, Xin Liu, Anthony Z. Faranesh, Brent Winslow · Mar 30, 2025 · Citations: 0
Rubric RatingExpert Verification
As LLM-driven health applications are increasingly adopted, rigorous and efficient one-sided evaluation methodologies are crucial to ensure response quality across multiple dimensions, including accuracy, personalization and safety.
- EventWeave: A Dynamic Framework for Capturing Core and Supporting Events in Dialogue Systems
Zhengyi Zhao, Shubo Zhang, Yiming Du, Bin Liang, Baojun Wang · Mar 29, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao, Renhong Chen, Yingtian Zou, Chao Peng, Huacong Xu · Mar 28, 2025 · Citations: 0
Unlike previous Process Reward Models (PRMs) that rely on static partitioning and human labeling, EDU-PRM automatically anchors step boundaries at tokens with high predictive entropy, effectively capturing intrinsic logical transitions and…
- Boosting Large Language Models with Mask Fine-Tuning
Mingyuan Zhang, Yue Bai, Huan Wang, Yizhou Wang, Qihua Dong · Mar 27, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics
Arsham Gholamzadeh Khoee, Shuai Wang, Robert Feldt, Dhasarathy Parthasarathy, Yinan Yu · Mar 27, 2025 · Citations: 0
Multi Agent
Ensuring reliable data-driven decisions is crucial in domains where analytical accuracy directly impacts safety, compliance, or operational outcomes.
- Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral
Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda · Mar 25, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries
Lovedeep Gondara, Jonathan Simkin, Shebnum Devji, Gregory Arbour, Raymond Ng · Mar 24, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Minimum Volume Conformal Sets for Multivariate Regression
Sacha Braun, Liviu Aolaritei, Michael I. Jordan, Francis Bach · Mar 24, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents
Sara Fish, Julia Shephard, Minkai Li, Ran I. Shorrer, Yannai A. Gonczarowski · Mar 24, 2025 · Citations: 0
We develop evaluation methods for measuring the economic decision-making capabilities and tendencies of LLMs.
- Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages
Tadesse Destaw Belay, Dawit Ketema Gete, Abinew Ali Ayele, Olga Kolesnikova, Iqra Ameer · Mar 24, 2025 · Citations: 0
Developing and integrating emotion-understanding models are essential for a wide range of human-computer interaction tasks, including customer feedback analysis, marketing research, and social media monitoring.