- Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Max Lamparth, Declan Grabb, Amy Franks, Scott Gershan, Kaitlyn N. Kunstman · Feb 22, 2025 · Citations: 0
Pairwise PreferenceExpert Verification
Current medical language model (LM) benchmarks often over-simplify the complexities of day-to-day clinical practice tasks and instead rely on evaluating LMs on multiple-choice board exam questions.
- Integrating Personality into Digital Humans: A Review of LLM-Driven Approaches for Virtual Reality
Iago Alves Brito, Julia Soares Dollis, Fernanda Bufon Färber, Pedro Schindler Freire Brasil Ribeiro, Rafael Teixeira Sousa · Feb 22, 2025 · Citations: 0
The integration of large language models (LLMs) into virtual reality (VR) environments has opened new pathways for creating more immersive and interactive digital humans.
- HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Rasmus Aavang, Giovanni Rizzi, Rasmus Bøggild, Alexandre Iolov, Mike Zhang · Feb 21, 2025 · Citations: 0
For rapid evaluation, we also release HiFi-KPI-Lite, a manually curated 8K paragraph subset.
- Less is More: Improving LLM Alignment via Preference Data Selection
Xun Deng, Han Zhong, Rui Ai, Fuli Feng, Zheng Wang · Feb 20, 2025 · Citations: 0
Pairwise Preference
Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences.
- Glycemic-Aware and Architecture-Agnostic Training Framework for Blood Glucose Forecasting in Type 1 Diabetes
Saman Khamesian, Asiful Arefeen, Maria Adela Grando, Bithika M. Thompson, Hassan Ghasemzadeh · Feb 20, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh
Nurkhan Laiyk, Daniil Orel, Rituraj Joshi, Maiya Goloburda, Yuxia Wang · Feb 19, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- LaVCa: LLM-assisted Visual Cortex Captioning
Takuya Matsuyama, Shinji Nishimoto, Yu Takagi · Feb 19, 2025 · Citations: 0
- Don't Stop the Multi-Party! On Generating Synthetic Written Multi-Party Conversations with Constraints
Nicolò Penzo, Marco Guerini, Bruno Lepri, Goran Glavaš, Sara Tonelli · Feb 19, 2025 · Citations: 0
Finally, we assess the level of obtained WMPCs via human and LLM-as-a-judge evaluations.
- MKE-Coder: Multi-Axial Knowledge with Evidence Verification in ICD Coding for Chinese EMRs
Xinxin You, Xien Liu, Xue Yang, Ziyi Wang, Ji Wu · Feb 19, 2025 · Citations: 0
In the practical evaluation of our method within simulated real coding scenarios, it has been demonstrated that our approach significantly aids coders in enhancing both their coding accuracy and speed.
- Reflection of Episodes: Learning to Play Game from Expert and Self Experiences
Xiaojie Xu, Zongyuan Li, Chang Lu, Runnan Qi, Yanan Ni · Feb 19, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Cyber-Physical Systems Security: A Comprehensive Review of Anomaly Detection Techniques
Danial Abshari, Meera Sridhar · Feb 18, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SEFL: A Framework for Generating Synthetic Educational Assignment Feedback with LLM Agents
Mike Zhang, Amalie Pernille Dilling, Léon Gondelman, Niels Erik Ruan Lyngdorf, Euan D. Lindsay · Feb 18, 2025 · Citations: 0
Critique Edit
Through comprehensive evaluations with three LLM judges and three human experts, across a subset of 900 outputs, we demonstrate that SEFL-tuned models outperform both their untuned counterparts and an existing baseline in terms of feedback…
- Conditioning LLMs to Generate Code-Switched Text
Maite Heredia, Gorka Labaka, Jeremy Barnes, Aitor Soroa · Feb 18, 2025 · Citations: 0
Pairwise Preference
Code-switching (CS) is still a critical challenge in Natural Language Processing (NLP), due to the limited availability of large-scale, diverse CS datasets for robust training and evaluation.
- Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models
Neeraj Gangwar, Suma P Bhat, Nickvash Kani · Feb 18, 2025 · Citations: 0
Our experiments on multiple reasoning benchmarks demonstrate that incorporating an arithmetic dataset, whether through targeted fine-tuning or within an instruction-tuning mixture, enhances models' arithmetic capabilities, thereby improving…
- Using the Path of Least Resistance to Explain Deep Networks
Sina Salek, Joseph Enguehard · Feb 17, 2025 · Citations: 0
Through experiments on both synthetic and real-world image classification data, we provide empirical evidence supporting our theoretical analysis and showing that GIG produces more faithful attributions than existing methods, including IG,…
- MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Xin Xu · Feb 17, 2025 · Citations: 0
Through comprehensive experiments on multiple mathematical reasoning datasets, including MathInstruct, MetaMathQA and etc., we demonstrate that models trained on MathFimer-expanded data consistently outperform their counterparts trained on…