- Adaptive Social Learning via Mode Policy Optimization for Language Agents
Minzheng Wang, Yongbin Li, Haobo Wang, Xinghua Zhang, Nan Xu · May 4, 2025 · Citations: 0
To address this, we propose an Adaptive Social Learning (ASL) framework in this paper, aiming to improve the adaptive reasoning ability of language agents in dynamic social interactions.
- Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
Cfir Avraham Hadar, Omer Shubi, Yoav Meiri, Amit Heshes, Yevgeni Berzak · May 4, 2025 · Citations: 0
To address this question, we introduce goal decoding tasks and evaluation frameworks using large-scale eye tracking for reading data in English with hundreds of text-specific information seeking tasks.
- Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
Jing Liu, Yao Du, Kun Yang, Jiaqi Wu, Yan Wang · May 3, 2025 · Citations: 0
Performance analysis and benchmarking techniques are also thoroughly explored to establish evaluation standards for these complex systems.
- Large Language Model Compression with Global Rank and Sparsity Optimization
Changhai Zhou, Qian Qiao, Yuhua Zhou, Yuxin Wu, Shichao Weng · May 2, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- FineScope : SAE-guided Data Selection Enables Domain Specific LLM Pruning and Finetuning
Chaitali Bhattacharyya, Hyunsei Lee, Junyoung Lee, Shinhyoung Jang, Il hong Suh · May 1, 2025 · Citations: 0
- BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder · Apr 28, 2025 · Citations: 0
However, benchmarking on large-scale real-world data such as electronic health records (EHRs) is critical, as clinical decisions are directly informed by these sources, yet current evaluations remain limited.
- A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim · Apr 28, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Reshaping MOFs text mining with a dynamic multi-agents framework of large language model
Zuhong Lin, Daoyuan Ren, Kai Ran, Jing Sun, Songlin Yu · Apr 26, 2025 · Citations: 0
Multi Agent
We present MOFh6, a large language model driven system that reads raw articles or crystal codes and converts them into standardized synthesis tables.
- Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition
Zheng Hui, Xiaokai Wei, Yexi Jiang, Kevin Gao, Chen Wang · Apr 26, 2025 · Citations: 0
Pairwise Preference Multi Agent
These domains typically involve fixed content and passive consumption, where user preferences can be matched by genre or theme.
- Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation
Peiyuan Jing, Kinhei Lee, Zhenxuan Zhang, Huichi Zhou, Zhengqing Yuan · Apr 25, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli, Kentaroh Toyoda, Yuan Wang, Leon Witt, Muhammad Asif Ali · Apr 25, 2025 · Citations: 0
While some of these methods have been adapted for LLMs, the literature lacks an in-depth analysis of their effectiveness and does not offer a comprehensive benchmark to enable insightful comparison among existing solutions.
- How much does context affect the accuracy of AI health advice?
Prashant Garg, Thiemo Fetzer · Apr 25, 2025 · Citations: 0
English-language performance does not reliably generalise across contexts, underscoring the need for multilingual, domain-specific evaluation before deployment in public-health communication.
- Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval
Yongkang Li, Panagiotis Eustratiadis, Simon Lupart, Evangelos Kanoulas · Apr 24, 2025 · Citations: 0
- Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns, Letitia Parcalabescu, Stephan Wäldchen, Michael Barlow, Gregor Ziegltrum · Apr 24, 2025 · Citations: 0
A comparison on German-language benchmarks, including MMMLU, shows significant performance gains of Aleph-Alpha-GermanWeb over FineWeb2 alone.
- RaPA: Enhancing Transferable Targeted Attacks via Random Parameter Pruning
Tongrui Su, Qingbin Li, Shengyu Zhu, Wei Chen, Xueqi Cheng · Apr 24, 2025 · Citations: 0
- FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
Yulia Otmakhova, Hung Thinh Truong, Rahmad Mahendra, Zenan Zhai, Rongxin Zhu · Apr 24, 2025 · Citations: 0
We present FLUKE (Framework for LingUistically-driven and tasK-agnostic robustness Evaluation), a framework for assessing model robustness through systematic minimal variations of test data.
- Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang · Apr 24, 2025 · Citations: 0
Multi Agent
Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into operational code repositories.
- ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation
Dezheng Han, Yibin Jia, Ruxiao Chen, Wenjie Han, Shuaishuai Guo · Apr 24, 2025 · Citations: 0
Compared to general-purpose LLMs, our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across multiple tissue types, while more closely aligning with the cognitive logic of manual annotation.
- GeneMamba: An Efficient and Effective Foundation Model on Single Cell Data
Cong Qi, Hanzhang Fang, Siqi Jiang, Xun Song, Tianxing Hu · Apr 22, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees
David Smith Sundarsingh, Jun Wang, Jyotirmoy V. Deshmukh, Yiannis Kantaros · Apr 22, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
Daocheng Fu, Jianlong Chen, Renqiu Xia, Zijun Chen, Qi Liu · Apr 22, 2025 · Citations: 0
Furthermore, we propose "connection thinking" to bridge the semantic gap between rigid formal logic and fluent human-like reasoning, ensuring coherent logical transitions.