- GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression
Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang · Dec 31, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context
Kai Ruan, Xuan Wang, Jixiang Hong, Peng Wang, Yang Liu · Dec 23, 2024 · Citations: 0
While Large Language Models (LLMs) demonstrate remarkable capabilities in scientific tasks such as literature analysis and experimental design (e.g., accurately extracting key findings from papers or generating coherent experimental…
- A Survey of Query Optimization in Large Language Models
Mingyang Song, Mao Zheng · Dec 23, 2024 · Citations: 0
We further examine evaluation methodologies, identify critical gaps in existing benchmarks, and discuss open challenges including process reward models, efficiency optimization, and multi-modal query handling.
- LLM4AD: A Platform for Algorithm Design with Large Language Model
Fei Liu, Rui Zhang, Zhuoliang Xie, Rui Sun, Kai Li · Dec 23, 2024 · Citations: 0
We have also designed a unified evaluation sandbox to ensure a secure and robust assessment of algorithms.
- Multi-modal, Multi-task, Multi-criteria Automatic Evaluation with Vision Language Models
Masanari Ohi, Masahiro Kaneko, Naoaki Okazaki, Nakamasa Inoue · Dec 19, 2024 · Citations: 0
However, existing metrics for evaluating the quality of text generated by VLMs typically focus on an overall evaluation for a specific task, such as image captioning.
- LMUnit: Fine-grained Evaluation with Natural Language Unit Tests
Jon Saad-Falcon, Rajan Vivek, William Berrios, Nandita Shankar Naik, Matija Franklin · Dec 17, 2024 · Citations: 0
Pairwise Preference
We introduce natural language unit tests, a paradigm that decomposes response quality into explicit, testable criteria, along with a unified scoring model, LMUnit, which combines multi-objective training across preferences, direct ratings,…
- Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck
Andor Diera, Lukas Galke, Fabian Karl, Ansgar Scherp · Dec 11, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SpecFuse: Ensembling Large Language Models via Next-Segment Prediction
Bo Lv, Nayu Liu, Chen Tang, Xin Liu, Yue Yu · Dec 10, 2024 · Citations: 0
Experimental results on five LLM families (ranging from 7B to 72B parameters) and six benchmark datasets, spanning open-domain instruction following, reasoning, commonsense, demonstrate consistent performance improvements compared to…
- Speaker effects in language comprehension: An integrative model of language and speaker processing
Hanlin Wu, Zhenguang G. Cai · Dec 10, 2024 · Citations: 0
We discuss how speaker effects serve as indices for assessing language development and social cognition, and we encourage future research to extend these findings to the emerging domain of artificial intelligence (AI) speakers, as AI agents…
- Predicting Subway Passenger Flows under Incident Situation with Causality
Xiannan Huang, Shuhan Qiu, Quan Yuan, Chao Yang · Dec 9, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Efficient Context Propagating Perceiver Architectures for Auto-Regressive Language Modeling
Kaleel Mahmood, Shaoyi Huang · Dec 8, 2024 · Citations: 0
Pairwise Preference
To this end, we develop four new architectural paradigms, the best performing of which we denote as the Efficient Context propagating Perceiver (ECP).
- A Contemporary Overview: Trends and Applications of Large Language Models on Mobile Devices
Lianjun Liu, Hongli An, Pengxuan Chen, Longxiang Ye · Dec 4, 2024 · Citations: 0
- RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su · Nov 25, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang, Lizhang Chen, Bo Liu, Qiang Liu · Nov 25, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Rahul Garg, Trilok Padhi, Hemang Jain, Ugur Kursuncu, Ponnurangam Kumaraguru · Nov 19, 2024 · Citations: 0
Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively.
- Federated Co-tuning Framework for Large and Small Language Models
Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Shuoling Liu · Nov 18, 2024 · Citations: 0
Our evaluation of FedCoLLM, utilizing various public LLMs and SLMs across a range of NLP text generation tasks, reveals that the performance of clients' SLMs experiences notable improvements with the assistance of the LLMs.
- Personalized Help for Optimizing Low-Skilled Users' Strategy
Feng Gu, Wichayaporn Wongkamjan, Jonathan K. Kummerfeld, Denis Peskoff, Jonathan May · Nov 14, 2024 · Citations: 0
AIs can beat humans in game environments; however, how helpful those agents are to human remains understudied.
- UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction
Zhiqiang Liu, Yin Hua, Mingyang Chen, Yichi Zhang, Zhuo Chen · Nov 11, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields, Casey Kennington · Nov 11, 2024 · Citations: 0
To conduct these experiments, we introduce a VL evaluation framework called Renaissance.
- LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
Weiquan Huang, Aoqi Wu, Yifan Yang, Xufang Luo, Yuqing Yang · Nov 7, 2024 · Citations: 0
The LLM-enhanced CLIP delivers consistent improvements across a wide range of downstream tasks, including linear-probe classification, zero-shot image-text retrieval with both short and long captions (in English and other languages),…
- Llama-Mob: Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction
Peizhi Tang, Chuang Yang, Tong Xing, Xiaohang Xu, Jiayi Xu · Oct 31, 2024 · Citations: 0
Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting.
- WAFFLE: Finetuning Multi-Modal Models for Automated Front-End Development
Shanchao Liang, Nan Jiang, Shangshu Qian, Lin Tan · Oct 24, 2024 · Citations: 0
Models fine-tuned with Waffle show up to 9.00 pp (percentage point) higher HTML match, 0.0982 higher CW-SSIM, 32.99 higher CLIP, and 27.12 pp higher LLEM on our new benchmark WebSight-Test and an existing benchmark Design2Code,…
- Scaling Knowledge Graph Construction through Synthetic Data Generation and Distillation
Prafulla Kumar Choubey, Xin Su, Man Luo, Xiangyu Peng, Caiming Xiong · Oct 22, 2024 · Citations: 0
- Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation
Safeyah Khaled Alshemali, Daniel Bauer, Yuval Marton · Oct 19, 2024 · Citations: 0
Long Horizon
We set a new state-of-the-art on thematic fit benchmarks, but show that closed and open weight LLMs respond differently to our prompting strategies: Closed models achieve better scores overall and benefit from multi-step reasoning, but they…
- Diverging Preferences: When do Annotators Disagree and do Models Know?
Michael JQ Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau · Oct 18, 2024 · Citations: 0
Pairwise Preference
In our experiments, we demonstrate how standard reward modeling (e.g., Bradley-Terry) and LLM-as-Judge evaluation methods fail to account for divergence between annotators.
- SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Yuling Gu, Oyvind Tafjord, Hyunwoo Kim, Jared Moore, Ronan Le Bras · Oct 17, 2024 · Citations: 0
Yet most evaluations stop at explicit belief attribution in classical toy stories or stylized tasks, leaving open the questions of whether LLMs can implicitly apply such knowledge to predict human behavior, or to judge an observed behavior,…
- GLEE: A Unified Framework and Benchmark for Language-based Economic Environments
Eilam Shapira, Omer Madmon, Itamar Reinman, Samuel Joseph Amouyal, Roi Reichart · Oct 7, 2024 · Citations: 0
To answer these questions, we introduce a benchmark for standardizing research on two-player, sequential, language-based games.
- A Watermark for Black-Box Language Models
Dara Bahri, John Wieting · Oct 2, 2024 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.