- AutoPK: Leveraging LLMs and a Hybrid Similarity Metric for Advanced Retrieval of Pharmacokinetic Data from Complex Tables and Documents
Hossein Sholehrasa, Amirhossein Ghanaatian, Doina Caragea, Lisa A. Tell, Jim E. Riviere · Sep 26, 2025 · Citations: 0
Pharmacokinetics (PK) plays a critical role in drug development and regulatory decision-making for human and veterinary medicine, directly affecting public health through drug safety and efficacy assessments.
- Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning
Mohammed Sabry, Anya Belz · Sep 26, 2025 · Citations: 0
Across 0.13B-1B decoder-only models, we evaluate (i) few-shot performance on standard LM benchmarks and function-style ICL probes, (ii) head-level copy telemetry, and (iii) held-out perplexity as a guardrail.
- Compute-Optimal Quantization-Aware Training
Aleksandr Dremov, David Grangier, Angelos Katharopoulos, Awni Hannun · Sep 26, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton · Sep 26, 2025 · Citations: 0
- HEART: Emotionally-Driven Test-Time Scaling of Language Models
Gabriela Pinto, Palash Goyal, Mihir Parmar, Yiwen Song, Souradip Chakraborty · Sep 26, 2025 · Citations: 0
We introduce HEART, a framework that uses emotional cues to guide the model's focus, much like how feelings contribute to human decision-making.
- Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Chi Ruan, Dongfu Jiang, Yubo Wang, Wenhu Chen · Sep 26, 2025 · Citations: 0
Critique Edit
We fine-tune multiple models (Critique-Coder) and evaluate them on different benchmarks to show their advantages over RL-only models.
- Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty · Sep 26, 2025 · Citations: 0
Pairwise Preference
We investigate the relationship between this notion of creativity and n-gram novelty through 8,618 expert writer annotations of novelty, pragmaticality, and sensicality via close reading of human- and AI-generated text.
- StateX: Enhancing RNN Recall via Post-training State Expansion
Xingyu Shen, Yingfa Chen, Zhen Leng Thai, Xu Han, Zhiyuan Liu · Sep 26, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
Aayush Mishra, Daniel Khashabi, Anqi Liu · Sep 26, 2025 · Citations: 0
Demonstrations
Performing IA2 as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families.
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
Siwei Wang, Yifei Shen, Haoran Sun, Shi Feng, Shang-Hua Teng · Sep 26, 2025 · Citations: 0
Finally, applying our framework to the real-world planning benchmark Blocksworld, we confirm that these behaviors manifest in practice.
- From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages
Katsuhiko Hayashi, Hidetaka Kamigaito · Sep 26, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- From Parameters to Behaviors: Unsupervised Compression of the Policy Space
Davide Tenedini, Riccardo Zamboni, Mirco Mutti, Marcello Restelli · Sep 26, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation
Shichao Weng, Zhiqiang Wang, Yuhua Zhou, Rui Lu, Ting Liu · Sep 26, 2025 · Citations: 0
- Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
Peter Shaw, James Cohan, Jacob Eisenstein, Kristina Toutanova · Sep 26, 2025 · Citations: 0
Demonstrations
The Minimum Description Length (MDL) principle offers a formal framework for applying Occam's razor in machine learning.
- FeatBench: Towards More Realistic Evaluation of Feature-level Code Generation
Haorui Chen, Chengze Li, Jia Li · Sep 26, 2025 · Citations: 0
To address these limitations, we propose a new benchmark - FeatBench, which introduces the following advances: (1) Realistic Task Inputs.
- LogiPart: Local Large Language Models for Data Exploration at Scale with Logical Partitioning
Tiago Fernandes Tavares · Sep 26, 2025 · Citations: 0
A qualitative audit by an independent LLM-as-a-judge confirms the discovery of meaningful functional axes, such as policy intent, that thematic ground-truth labels fail to capture.
- Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Shijing Hu, Jingyang Li, Zhihui Lu, Pan Zhou · Sep 26, 2025 · Citations: 0
- SciTS: Scientific Time Series Understanding and Generation with LLMs
Wen Wu, Ziyang Zhang, Liwei Liu, Xuenan Xu, Jimin Zhuang · Sep 26, 2025 · Citations: 0
To address these gaps, we introduce SciTS, a benchmark spanning 12 scientific domains and 43 tasks, with over 50k+ instances, both univariate and multivariate signals ranging from 10^0 to 10^7 in length and up to 10~MHz in frequency.
- SecureVibeBench: Evaluating Secure Coding Capabilities of Code Agents with Realistic Vulnerability Scenarios
Junkai Chen, Huihui Huang, Yunbo Lyu, Junwen An, Jieke Shi · Sep 26, 2025 · Citations: 0
Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern.
- CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Denis Makhov, Dmitriy Shopkhoev, Magauiya Zhussip, Ammar Ali, Stamatios Lefkimmiatis · Sep 26, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Fine-tuning Done Right in Model Editing
Wanli Yang, Rui Tang, Hongyu Zang, Du Su, Qi Cao · Sep 26, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference
Jeonghyun Park, Ingeol Baek, Seunghyun Yoon, Haeun Jang, Aparna Garimella · Sep 26, 2025 · Citations: 0
Long Horizon
In this paper, we introduce MARCH, a benchmark for their intersection, with 2,209 multi-hop ambiguous questions curated via multi-LLM verification and validated by human annotation with strong agreement.
- ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
Jewon Lee, Wooksu Shin, Seungmin Yang, Ki-Ung Song, DongUk Lim · Sep 26, 2025 · Citations: 0
For instance, ERGO surpasses Qwen2.5-VL-7B on the V* benchmark by 4.7 points while using only 23% of the vision tokens, achieving a 3x inference speedup.
- Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial Environments
Muhammad Junaid Asif, Abdul Rehman, Asim Mehmood, Rana Fayyaz Ahmad, Shazia Saqib · Sep 26, 2025 · Citations: 0
- ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
Jiho Kim, Junseong Choi, Woosog Chay, Daeun Kyung, Yeonsu Kwon · Sep 26, 2025 · Citations: 0
Pairwise Preference
In our simulation environment, a user agent with a rich persona interacts with the assistant, providing ratings on how well each suggestion aligns with its preferences and context.