- Learning Nested Named Entity Recognition from Flat Annotations
Igor Rozhkov, Natalia Loukachevitch · Feb 28, 2026 · Citations: 0
- Constitutional Black-Box Monitoring for Scheming in LLM Agents
Simon Storf, Rich Barton-Cooper, James Peters-Gill, Marius Hobbhahn · Feb 28, 2026 · Citations: 0
- A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations
Hossein Javidnia · Feb 28, 2026 · Citations: 0
- A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
Ruihao Pan, Suhang Wang · Feb 28, 2026 · Citations: 0
- Qwen3-Coder-Next Technical Report
Ruisheng Cao, Mouxiang Chen, Jiawei Chen, Zeyu Cui, Yunlong Feng · Feb 28, 2026 · Citations: 0
- LaSTR: Language-Driven Time-Series Segment Retrieval
Kota Dohi, Harsh Purohit, Tomoya Nishida, Takashi Endo, Yusuke Ohtsubo · Feb 28, 2026 · Citations: 0
- RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
Andrew Zhuoer Feng, Cunxiang Wang, Bosi Wen, Yidong Wang, Yu Luo · Feb 28, 2026 · Citations: 0
- SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?
Shiqi Chen, Jingze Gai, Ruochen Zhou, Jinghan Zhang, Tongyao Zhu · Feb 28, 2026 · Citations: 0
Long Horizon
Real-world tool-using agents operate over long-horizon workflows with recurring structure and diverse demands, where effective behavior requires not only invoking atomic tools but also abstracting, and reusing higher-level tool…
- DRIV-EX: Counterfactual Explanations for Driving LLMs
Amaia Cardiel, Eloi Zablocki, Elias Ramzi, Eric Gaussier · Feb 28, 2026 · Citations: 0
- RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis
Andrew Zhuoer Feng, Cunxiang Wang, Yu Luo, Bosi Wen, Yidong Wang · Feb 28, 2026 · Citations: 0
- Polynomial Mixing for Efficient Self-supervised Speech Encoders
Eva Feillet, Ryan Whetten, David Picard, Alexandre Allauzen · Feb 28, 2026 · Citations: 0
- SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
Chaoyue He, Xin Zhou, Xinjia Yu, Lei Zhang, Yan Zhang · Feb 28, 2026 · Citations: 0
Expert Verification
We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype and interactive web platform that transforms standards into auditable knowledge graphs (KGs) through an LLM-centered, expert-guided pipeline.
- BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages
Jason Lucas, Matt Murtagh-White, Adaku Uchendu, Ali Al-Lawati, Michiharu Yamashita · Feb 28, 2026 · Citations: 0
Multi Agent
We introduce BLUFF, a comprehensive benchmark for detecting false and synthetic content, spanning 79 languages with over 202K samples, combining human-written fact-checked content (122K+ samples across 57 languages) and LLM-generated…
- TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces
Shu-Xun Yang, Cunxiang Wang, Haoke Zhang, Wenbo Yu, Lindong Wu · Feb 28, 2026 · Citations: 0
- Piecing Together Cross-Document Coreference Resolution Datasets: Systematic Dataset Analysis and Unification
Anastasia Zhukova, Terry Ruas, Jan Philip Wahle, Bela Gipp · Feb 28, 2026 · Citations: 0
To address these challenges, we introduce uCDCR, a unified dataset that consolidates diverse publicly available English CDCR corpora across various domains into a consistent format, which we analyze with standardized metrics and evaluation…
- QQ: A Toolkit for Language Identifiers and Metadata
Wessel Poelman, Yiyi Chen, Miryam de Lhoneux · Feb 28, 2026 · Citations: 0
- From Literature to Hypotheses: An AI Co-Scientist System for Biomarker-Guided Drug Combination Hypothesis Generation
Raneen Younis, Suvinava Basak, Lukas Chavez, Zahra Ahmadi · Feb 28, 2026 · Citations: 0
- LangGap: Diagnosing and Closing the Language Gap in Vision-Language-Action Models
Yuchen Hou, Lin Zhao · Feb 28, 2026 · Citations: 0
- Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research
Yubo Dong, Nianhao You, Yuxuan Hou, Zixun Sun, Yue Zhang · Feb 28, 2026 · Citations: 0
Long Horizon
To evaluate this capability, we curated a benchmark of 300 expert-written questions across diverse domains, each requiring up to 100+ retrieval steps and 1,000+ web pages to reconcile conflicting evidence.
- Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs
Jie Cao, Tianwei Lin, Zhenxuan Fan, Bo Yuan, Ziyuan Zhao · Feb 28, 2026 · Citations: 0
- CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging
Jie Cao, Zhenxuan Fan, Zhuonan Wang, Tianwei Lin, Ziyuan Zhao · Feb 28, 2026 · Citations: 0
- CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
Swapnil Parekh · Feb 28, 2026 · Citations: 0
- Optimizing In-Context Demonstrations for LLM-based Automated Grading
Yucheng Chu, Hang Li, Kaiqi Yang, Yasemin Copur-Gencturk, Kevin Haudek · Feb 28, 2026 · Citations: 0
Rubric RatingDemonstrations
GUIDE paves the way for trusted, scalable assessment systems that align closely with human pedagogical standards.
- Confusion-Aware Rubric Optimization for LLM-based Automated Grading
Yucheng Chu, Hang Li, Kaiqi Yang, Yasemin Copur-Gencturk, Joseph Krajcik · Feb 28, 2026 · Citations: 0
Rubric Rating
Empirical evaluations on teacher education and STEM datasets demonstrate that CARO significantly outperforms existing SOTA methods.
- RTLocating: Intent-aware RTL Localization for Hardware Design Iteration
Changwen Xing, Yanfeng Lu, Lei Qi, Chenxu Niu, Jie Li · Feb 28, 2026 · Citations: 0
- A Typologically Grounded Evaluation Framework for Word Order and Morphology Sensitivity in Multilingual Masked LMs
Anna Feldman, Libby Barak, Jing Peng · Feb 28, 2026 · Citations: 0
- LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation
Cunyuan Yang, Dejuan Song, Xiaotao Pang, Qianqian Shen, Wenjie Nie · Feb 28, 2026 · Citations: 0