- Semantic Chunking and the Entropy of Natural Language
Weishun Zhong, Doron Sivan, Tankut Can, Mikhail Katkov, Misha Tsodyks · Feb 13, 2026 · Citations: 0
The entropy rate of printed English is famously estimated to be about one bit per character, a benchmark that modern large language models (LLMs) have only recently approached.
- CoPE-VideoLM: Leveraging Codec Primitives For Efficient Video Language Modeling
Sayan Deb Sarkar, Rémi Pautrat, Ondrej Miksik, Marc Pollefeys, Iro Armeni · Feb 13, 2026 · Citations: 0
Moreover, by varying the keyframe and codec primitive densities we maintain or exceed performance on 14 diverse video understanding benchmarks spanning general question answering, temporal and motion reasoning, long-form understanding, and…
- Quantization-Robust LLM Unlearning via Low-Rank Adaptation
João Vitor Boer Abitante, Joana Meneguzzo Pasquali, Luan Fonseca Garcia, Ewerton de Oliveira, Thomas da Silva Paula · Feb 13, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- OpenLID-v3: Improving the Precision of Closely Related Language Identification -- An Experience Report
Mariia Fedorova, Nikolay Arefyev, Maja Buljan, Jindřich Helcl, Stephan Oepen · Feb 13, 2026 · Citations: 0
We call this extended system OpenLID-v3 and evaluate it against GlotLID on multiple benchmarks.
- SCOPE: Selective Conformal Optimized Pairwise LLM Judging
Sher Badshah, Ali Emami, Hassan Sajjad · Feb 13, 2026 · Citations: 0
Pairwise Preference
Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation.
- Towards interpretable models for language proficiency assessment: Predicting the CEFR level of Estonian learner texts
Kais Allkivi · Feb 13, 2026 · Citations: 0
Additional evaluation on an earlier exam sample revealed that the writings have become more complex over a 7-10-year period, while accuracy still reached 0.8 with some feature sets.
- Consistency of Large Reasoning Models Under Multi-Turn Attacks
Yubo Li, Ramayya Krishnan, Rema Padman · Feb 13, 2026 · Citations: 0
Long Horizon
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Buy versus Build an LLM: A Decision Framework for Governments
Jiahao Lu, Ziwei Xu, William Tjhi, Junnan Li, Antoine Bosselut · Feb 13, 2026 · Citations: 0
This paper provides a strategic framework for making this decision by evaluating these options across dimensions including sovereignty, safety, cost, resource capability, cultural fit, and sustainability.
- BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents
Huanyao Zhang, Jiepeng Zhou, Bo Li, Bowen Zhou, Yanzhe Shan · Feb 13, 2026 · Citations: 0
Web Browsing
Multimodal large language models (MLLMs), equipped with increasingly advanced planning and tool-use capabilities, are evolving into autonomous agents capable of performing multimodal web browsing and deep search in open-world environments.
- Towards a Diagnostic and Predictive Evaluation Methodology for Sequence Labeling Tasks
Elena Alvarez-Mellado, Julio Gonzalo · Feb 13, 2026 · Citations: 0
We propose an evaluation methodology for sequence labeling tasks grounded on error analysis that provides both quantitative and qualitative information on where systems must be improved and predicts how models will perform on a different…
- MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian · Feb 13, 2026 · Citations: 0
Pairwise PreferenceRubric Rating Long Horizon
MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities.
- SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen · Feb 13, 2026 · Citations: 0
- Learning Ordinal Probabilistic Reward from Preferences
Longze Chen, Lu Wang, Renke Shan, Ze Gong, Run Luo · Feb 13, 2026 · Citations: 0
Pairwise Preference
Reward models are crucial for aligning large language models (LLMs) with human values and intentions.
- PMG: Parameterized Motion Generator for Human-like Locomotion Control
Chenxi Han, Yuheng Min, Zihao Huang, Ao Hong, Hang Liu · Feb 13, 2026 · Citations: 0
Long Horizon
Recent advances in data-driven reinforcement learning and motion tracking have substantially improved humanoid locomotion, yet critical practical challenges remain.
- Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats
Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin · Feb 13, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification
Bo Wang, Yuxuan Zhang, Yueqin Hu, Hanchao Hou, Kaiping Peng · Feb 13, 2026 · Citations: 0
We benchmarked the framework across DASS, IPIP, and EPOCH, evaluating structural recovery, internal consistency, factor congruence, correlation preservation, and reduction efficiency.
- To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
Haoqing Wang, Xiang Long, Ziheng Li, Yilong Xu, Tingguang Li · Feb 13, 2026 · Citations: 0