- propella-1: Multi-Property Document Annotation for LLM Data Curation at Scale
Maximilian Idahl, Benedikt Droste, Björn Plüster, Jan Philipp Harries · Feb 12, 2026 · Citations: 0
We introduce propella-1, a family of small multilingual LLMs (0.6B, 1.7B, 4B parameters) that annotate text documents across 18 properties organized into six categories: core content, classification, quality and value, audience and purpose,…
- Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting
Siyuan Li, Yunjia Wu, Yiyong Xiao, Pingyang Huang, Peize Li · Feb 12, 2026 · Citations: 0
Long Horizon
Experiments on multiple benchmarks show that EST consistently improves diverse backbones and achieves state-of-the-art performance, highlighting the importance of state persistence for long-horizon TKG forecasting.
- On-Policy Context Distillation for Language Models
Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, Furu Wei · Feb 12, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Think like a Scientist: Physics-guided LLM Agent for Equation Discovery
Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad, Sharvaree Vadgama, Rose Yu · Feb 12, 2026 · Citations: 0
Long Horizon
We introduce KeplerAgent, an agentic framework that explicitly follows this scientific reasoning process.
- "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
Kaitlyn Zhou, Martijn Bartelds, Federico Bianchi, James Zou · Feb 12, 2026 · Citations: 0
Despite speech recognition systems achieving low word error rates on standard benchmarks, they often fail on short, high-stakes utterances in real-world deployments.
- Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision
Anika Tabassum Meem, Muntasir Hossain Nadid, Md Zesun Ahmed Mia · Feb 12, 2026 · Citations: 0
- Query-focused and Memory-aware Reranker for Long Context Processing
Yuqing Li, Jiangnan Li, Mo Yu, Guoxuan Ding, Zheng Lin · Feb 12, 2026 · Citations: 0
Rubric Rating
It further establishes a new state-of-the-art on the LoCoMo benchmark that assesses the capabilities of dialogue understanding and memory usage.
- GPT-4o Lacks Core Features of Theory of Mind
John Muchovej, Amanda Royka, Shane Lee, Julian Jara-Ettinger · Feb 12, 2026 · Citations: 0
Research into this question has focused on evaluating LLMs against benchmarks and found success across a range of social tasks.
- Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
Wenkai Yang, Weijie Liu, Ruobing Xie, Kai Yang, Saiyong Yang · Feb 12, 2026 · Citations: 0
Expert Verification
Then, we propose the Generalized On-Policy Distillation (G-OPD) framework, which extends the standard OPD objective by introducing a flexible reference model and a reward scaling factor that controls the relative weight of the reward term…
- Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
Zewei Yu, Lirong Gao, Yuke Zhu, Bo Zheng, Junbo Zhao · Feb 12, 2026 · Citations: 0
- Tiny Recursive Reasoning with Mamba-2 Attention Hybrid
Wenlong Wang, Fergal Reid · Feb 12, 2026 · Citations: 0
- Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models
Yuzhe Shang, Pengzhi Gao, Wei Liu, Jian Luan, Jinsong Su · Feb 12, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Who is the richest club in the championship? Detecting and Rewriting Underspecified Questions Improve QA Performance
Yunchong Huang, Gianni Barlacchi, Sandro Pezzelle · Feb 12, 2026 · Citations: 0
Large language models (LLMs) perform well on well-posed questions, yet standard question-answering (QA) benchmarks remain far from solved.
- Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
Lai Wei, Liangbo He, Jun Lan, Lingzhong Dong, Yutong Cai · Feb 12, 2026 · Citations: 0
Tool Use
To address this, we propose Region-to-Image Distillation, which transforms zooming from an inference-time tool into a training-time primitive, thereby internalizing the benefits of agentic zooming into a single forward pass of an MLLM.
- Predicting LLM Output Length via Entropy-Guided Representations
Huanyi Xie, Yubin Chen, Liangyu Wang, Lijie Hu, Di Wang · Feb 12, 2026 · Citations: 0
- TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Heiko Ludwig, Holger Boche · Feb 12, 2026 · Citations: 0
Long Horizon
Advances in large language models (LLMs) are driving a shift toward using reinforcement learning (RL) to train agents from iterative, multi-turn interactions across tasks.
- MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
MiniCPM Team, Wenhao An, Yingfa Chen, Yewei Fang, Jiayi Li · Feb 12, 2026 · Citations: 0
- Curriculum Learning and Pseudo-Labeling Improve the Generalization of Multi-Label Arabic Dialect Identification Models
Ali Mekky, Mohamed El Zeftawy, Lara Hassan, Amr Keleg, Preslav Nakov · Feb 12, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving
Seo Hyun Kim, Jin Bok Park, Do Yeon Koo, Hogun Park, Il Yong Chun · Feb 12, 2026 · Citations: 0
For safe driving in unexpected scenarios, these systems may additionally rely on human interventions such as natural language instructions.
- Variation-aware Flexible 3D Gaussian Editing
Hao Qin, Yukai Sun, Meng Wang, Ming Kong, Mengxu Lu · Feb 12, 2026 · Citations: 0
- Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Yuanfu Wang, Zhixuan Liu, Xiangtian Li, Chaochao Lu, Chao Yang · Feb 12, 2026 · Citations: 0
Demonstrations
The prevailing paradigm for training large reasoning models--combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR)--is fundamentally constrained by its reliance on high-quality, human-annotated…
- OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
Maomao Li, Zhen Li, Kaipeng Zhang, Guosheng Yin, Zhifeng Li · Feb 12, 2026 · Citations: 0
Third, we train OmniCustom on our constructed large-scale, high-quality audio-visual human dataset.
- RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis
Zhen Bi, Xueshu Chen, Luoyang Sun, Yuhang Yao, Qing Shen · Feb 12, 2026 · Citations: 0
- Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis · Feb 12, 2026 · Citations: 0
Red Team
On an abliterated LLaMA-3.1-8B model, selectively bypassing high-susceptibility layers blocks 78% of jailbreak attempts while preserving benign behavior on 94% of benign prompts.
- When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration
Jayadev Billa · Feb 12, 2026 · Citations: 0
We introduce ALME (Audio-LLM Modality Evaluation), a dataset of 57,602 controlled audio-text conflict stimuli across eight languages, together with Text Dominance Ratio (TDR), which measures how often a model follows conflicting text when…