- AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
Yuxuan Lu, Ting-Yao Hsu, Hansu Gu, Limeng Cui, Yaochen Xie · Apr 13, 2025 · Citations: 0
Long Horizon
Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result.
- Dominated Actions in Imperfect-Information Games
Sam Ganzfried · Apr 13, 2025 · Citations: 0
- Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes
Stella C. Dong · Apr 13, 2025 · Citations: 0
A Proximal Policy Optimization (PPO) agent is trained using a risk-sensitive reward that penalizes reserve shortfall, capital inefficiency, and breaches of a volatility-adjusted solvency floor, with tail risk explicitly controlled through…
- Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries
Koustuv Saha, Yoshee Jain, Violeta J. Rodriguez, Munmun De Choudhury · Apr 12, 2025 · Citations: 0
Although genAI shows promise in delivering immediate and personalized responses, its effectiveness in replicating the nuanced, experience-based support of human peers remains an open question.
- BioChemInsight: An Online Platform for Automated Extraction of Chemical Structures and Activity Data from Patents
Zhe Wang, Fangtian Fu, Wei Zhang, Lige Yan, Nan Li · Apr 12, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu · Apr 12, 2025 · Citations: 0
Tool Use
In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which…
- MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers
Arash Ahmadi, Sarah Sharif, Yaser M. Banad · Apr 11, 2025 · Citations: 0
- Generating Fine Details of Entity Interactions
Xinyi Gu, Jiayuan Mao · Apr 11, 2025 · Citations: 0
Critique Edit
However, images should also encapsulate rich interactions between objects, where existing models often fall short, likely due to limited training data and benchmarks for rare interactions.
- Automating quantum feature map design via large language models
Kenya Sakka, Kosuke Mitarai, Keisuke Fujii · Apr 10, 2025 · Citations: 0
- CAReDiO: Cultural Alignment via Representativeness and Distinctiveness Guided Data Optimization
Jing Yao, Xiaoyuan Yi, Jindong Wang, Zhicheng Dou, Xing Xie · Apr 9, 2025 · Citations: 0
Extensive experiments on 15 cultures demonstrate that CAReDiO can create high-quality data with richer cultural information and enable efficient alignment of small open-source or large proprietary LLMs with as few as 200 training samples,…
- Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms
Pooya Razavi, Sonya Powers · Apr 9, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Can LLMs Simulate Personas with Reversed Performance? A Systematic Investigation for Counterfactual Instruction Following in Math Reasoning Context
Sai Adith Senthil Kumar, Hao Yan, Saipavan Perepa, Murong Yue, Ziyu Yao · Apr 8, 2025 · Citations: 0
- Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning
Yuehan Qin, Shawn Li, Yi Nian, Xinyan Velocity Yu, Yue Zhao · Apr 8, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- BiasCause: Evaluate Socially Biased Causal Reasoning of Large Language Models
Tian Xie, Tongxin Yin, Vaishakh Keshava, Xueru Zhang, Siddhartha Reddy Jonnalagadda · Apr 8, 2025 · Citations: 0
- SkillFlow: Scalable and Efficient Agent Skill Retrieval System
Fangzhou Li, Pagkratios Tagkopoulos, Ilias Tagkopoulos · Apr 8, 2025 · Citations: 0
We present SkillFlow, the first multi-stage retrieval pipeline designed for agent skill discovery, framing skill acquisition as an information retrieval problem over a corpus of ~36K community-contributed SKILL.md definitions indexed from…
- NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge
Firoj Alam, Md Arid Hasan, Sahinur Rahman Laskar, Mucahid Kutlu, Kareem Darwish · Apr 8, 2025 · Citations: 0
Web Browsing
The developed resources can be used for LLMs benchmarking and further fine-tuning.
- Pretraining Language Models for Diachronic Linguistic Change Discovery
Elisabeth Fittschen, Sabrina Li, Tom Lippincott, Leshem Choshen, Craig Messner · Apr 7, 2025 · Citations: 0
This has engendered growing interest in their use in humanistic disciplines, such as historical linguistics and literary studies.
- SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff, Erblina Purelku, Jakob Hackstein, Jonas Loos, Leo Pinetzki · Apr 7, 2025 · Citations: 0
- Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models
Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao · Apr 7, 2025 · Citations: 0
Red Team
We organize existing benchmarks and datasets into coherent categories reflecting the evolving landscape of multi-turn dialogue evaluation, and review a broad spectrum of enhancement methodologies, including model-centric strategies…
- Causal Retrieval with Semantic Consideration
Hyunseo Shin, Wonseok Hwang · Apr 7, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.