- Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang · Apr 9, 2026 · Citations: 0
Automatic Metrics General
The advent of agentic multimodal models has empowered systems to actively interact with external environments.
- KV Cache Offloading for Context-Intensive Tasks
Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov · Apr 9, 2026 · Citations: 0
Automatic Metrics General
Prior evaluations have largely focused on tasks that do not require extracting large amounts of information from the context.
- Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
Baihui Liu, Kaiyuan Tian, Wei Wang, Zhaoning Zhang, Linbo Qiao · Apr 9, 2026 · Citations: 0
Coding
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
Yuan Xie, Jiaqi Song, Guang Qiu, Xianliang Wang, Ming Lei · Apr 9, 2026 · Citations: 0
Automatic Metrics General
Although recent LLM-based ASR models have shown promising performance on public benchmarks, it remains challenging to balance recognition quality with latency and overhead, while hallucinations further limit real-world deployment.
- PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
Zhifei Xie, Zongzheng Hu, Fangda Ye, Xin Zhang, Haobo Chai · Apr 9, 2026 · Citations: 0
Automatic Metrics General
Prior work remains largely confined to laboratory settings, leaving a clear gap in real-world proactive agent: depth, complexity, ambiguity, precision and real-time constraints.
- DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs
Nayoung Choi, Jonathan Zhang, Jinho D. Choi · Jan 12, 2026 · Citations: 0
Automatic Metrics General
Across three long-form dialogue benchmarks-LoCoMo, MT-Bench+, and SCM4LLMs-and multiple LLM backends, DyCP achieves competitive answer quality in downstream generation, with more selective context usage and improved inference efficiency.
- See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
Yicheng Ji, Jun Zhang, Jinpeng Chen, Cong Wang, Lidan Shou · Apr 7, 2026 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Efficient Learned Data Compression via Dual-Stream Feature Decoupling
Huidong Ma, Xinyan Shi, Hui Sun, Xiaofei Yue, Xiaoguang Liu · Apr 8, 2026 · Citations: 0
LawCoding
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models
Md Motaleb Hossen Manik, Ge Wang · Apr 8, 2026 · Citations: 0
Automatic Metrics Math
We present a controlled empirical benchmark of seven recent reasoning-oriented instruction-tuned models spanning dense and MoE designs, namely Gemma-4-E2B, Gemma-4-E4B, Gemma-4-26B-A4B, Phi-4-mini-reasoning, Phi-4-reasoning, Qwen3-8B, and…
- MARS: Enabling Autoregressive Models Multi-Token Generation
Ziqi Jin, Lei Wang, Ziwei Luo, Aixin Sun · Apr 8, 2026 · Citations: 0
Automatic Metrics General
When generating one token per forward pass, MARS matches or exceeds the AR baseline on six standard benchmarks.
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang · Feb 24, 2025 · Citations: 0
Coding
As Large Language Models (LLMs) can now process extremely long contexts, efficient inference over these extended inputs has become increasingly important, especially for emerging applications like LLM agents that highly depend on this…
- MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control
Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang · Apr 7, 2026 · Citations: 0
Automatic Metrics General
Experiments on the MMEB-V2 benchmark demonstrate that our model achieves a score of 71.2 with only 4B parameters, establishing a new state-of-the-art while significantly reducing reasoning overhead and inference latency.
- BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs
Abbas Ghaddar, Ivan Kobyzev, Boxing Chen, Yufei Cui · Apr 7, 2026 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification
Noor Islam S. Mohammad · Oct 19, 2025 · Citations: 0
Automatic Metrics General
On the Jigsaw Toxic Comment benchmark, CoGate-LSTM achieves 0.881 macro-F1 (95% CI: [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) and XGBoost by 4.7, while using only 7.3M parameters…
- Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker
Matthias De Lange, Jens-Joris Decorte, Jeroen Van Hautte · Nov 11, 2025 · Citations: 0
Automatic Metrics General
These constraints have led to isolated, task-specific developments in the field, with models and benchmarks focused on single prediction tasks.
- SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT
Guan-Yan Yang, Wei-Ling Wen, Shu-Yuan Ku, Farn Wang, Kuo-Hui Yeh · Apr 7, 2026 · Citations: 0
Automatic Metrics General
Our evaluation demonstrates that SemLink achieves a Recall of 96.00%, comparable to state-of-the-art LLMs (GPT-5.2), while operating approximately 47.5 times faster and requiring significantly fewer computational resources.
- Robust Multilingual Text-to-Pictogram Mapping for Scalable Reading Rehabilitation
Soufiane Jhilal, Martina Galletti · Mar 25, 2026 · Citations: 0
Automatic Metrics MedicineMultilingual
Evaluation results indicate high pictogram coverage and visual scaffolding density across the five languages.
- AI-Driven Modular Services for Accessible Multilingual Education in Immersive Extended Reality Settings: Integrating Speech Processing, Translation, and Sign Language Rendering
N. D. Tantaroudas, A. J. McCracken, I. Karachalios, E. Papatheou · Apr 7, 2026 · Citations: 0
Automatic Metrics Multilingual
Validation comprised technical benchmarking of each AI component, including comparative assessments of speech synthesis providers and multilingual translation models (NLLB 200 and EuroLLM 1.7B variants).
- Weakly Supervised Distillation of Hallucination Signals into Transformer Representations
Shoaib Sadiq Salehmohamed, Jinal Prashant Thakkar, Hansika Aredla, Shaik Mohammed Omar, Shalmali Ayachit · Apr 7, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics General
We introduce a weak supervision framework that combines three complementary grounding signals: substring matching, sentence embedding similarity, and an LLM as a judge verdict to label generated responses as grounded or hallucinated without…
- Screening Is Enough
Ken M. Nakanishi · Apr 1, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
Guan-Ting Lin, Chen Chen, Zhehuai Chen, Hung-yi Lee · Apr 6, 2026 · Citations: 0
Automatic Metrics General
We introduce Full-Duplex-Bench-v3 (FDB-v3), a benchmark for evaluating spoken language models under naturalistic speech conditions and multi-step tool use.
- Voxtral Realtime
Mistral-AI, :, Alexander H. Liu, Andy Ehrenberg, Andy Lo · Feb 11, 2026 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- PRISM: Prompt-Refined In-Context System Modelling for Financial Retrieval
Chun Chet Ng, Jia Yu Lim, Wei Zeng Low · Nov 18, 2025 · Citations: 0
Automatic Metrics Coding
We present PRISM, a training-free framework that integrates refined system prompting, in-context learning (ICL), and lightweight multi-agent coordination for document and chunk ranking tasks.
- Democratizing AI: A Comparative Study in Deep Learning Efficiency and Future Trends in Computational Processing
Lisan Al Amin, Md Ismail Hossain, Rupak Kumar Das, Mahbubul Islam, Abdulaziz Tabbakh · Mar 21, 2026 · Citations: 0
Automatic Metrics General
This study benchmarks four deep learning models (Conv6, VGG16, ResNet18, CycleGAN) using TensorFlow and PyTorch on Intel Xeon CPUs and NVIDIA Tesla T4 GPUs.
- 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Yeounoh Chung, Rushabh Desai, Jian He, Yu Xiao, Thibaud Hottelier · Mar 16, 2026 · Citations: 0
Automatic Metrics General
This paper provides an extensive evaluation of a recent AI query approximation approach that enables low cost analytics and database applications to benefit from AI queries.
- SEAL: An Open, Auditable, and Fair Data Generation Framework for AI-Native 6G Networks
Sunder Ali Khowaja, Kapal Dev, Engin Zeydan, Madhusanka Liyanage · Apr 2, 2026 · Citations: 0
Automatic MetricsSimulation Env General
In this regard, we propose the Synthetic Data Generation with Ethics Audit Loop (SEAL) framework, which extends baseline modular pipelines with an Ethical and Regulatory Compliance by Design (ERCD) module and a Federated Learning (FL)…
- DeDelayed: Deleting Remote Inference Delay via On-Device Correction
Dan Jacobellis, Mateen Ulhaq, Fabien Racapé, Hyomin Choi, Neeraja J. Yadwadkar · Oct 15, 2025 · Citations: 0
Automatic Metrics Coding
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- APEX: Agent Payment Execution with Policy for Autonomous Agent API Access
Mohd Safwan Uddin, Mohammed Mouzam, Mohammed Imran, Syed Badar Uddin Faizan · Apr 2, 2026 · Citations: 0
Automatic Metrics General
Autonomous agents are moving beyond simple retrieval tasks to become economic actors that invoke APIs, sequence workflows, and make real-time decisions.
- NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL
Amos Goldman, Nimrod Boker, Maayan Sheraizin, Nimrod Admoni, Artem Polyakov · Mar 13, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Adaptive Stopping for Multi-Turn LLM Reasoning
Xiaofan Zhou, Huy Nguyen, Bo Yu, Chenxi Liu, Lu Cheng · Apr 1, 2026 · Citations: 0
Automatic Metrics General
Large Language Models (LLMs) increasingly rely on multi-turn reasoning and interaction, such as adaptive retrieval-augmented generation (RAG) and ReAct-style agents, to answer difficult questions.
- OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion
Sai Koneru, Matthias Huck, Jan Niehues · Nov 28, 2025 · Citations: 0
CodingMultilingual
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- TRIMS: Trajectory-Ranked Instruction Masked Supervision for Diffusion Language Models
Lingjie Chen, Ruizhong Qiu, Yuyu Fan, Yanjun Zhao, Hanghang Tong · Apr 1, 2026 · Citations: 0
Automatic Metrics MathCoding
Experiments on LLaDA and Dream across math and coding benchmarks show that TRIMS significantly improves the accuracy-parallelism trade-off over both standard MDLM training and train-free acceleration baselines, while achieving competitive…
- Execution-Verified Reinforcement Learning for Optimization Modeling
Runda Guan, Xiangqing Shen, Jiajun Zhang, Yifan Zhang, Jian Cheng · Apr 1, 2026 · Citations: 0
MathCoding
Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller…
- Large Language Models in the Abuse Detection Pipeline
Suraj Kath, Sanket Badhe, Preet Shah, Ashwin Sampathkumar, Shivani Gupta · Mar 31, 2026 · Citations: 0
General
Large Language Models introduce new capabilities for contextual reasoning, policy interpretation, explanation generation, and cross-modal understanding, enabling them to support multiple stages of modern safety systems.
- FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval
Antonín Jarolím, Martin Fajčík · Mar 31, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving
Annette Taberner-Miller · Mar 31, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation
Ashish Rana, Chia-Chien Hung, Qumeng Sun, Julian Martin Kunkel, Carolin Lawrence · Mar 31, 2026 · Citations: 0
Automatic Metrics Coding
Human memory adapts through selective forgetting: experiences become less accessible over time but can be reactivated by reinforcement or contextual cues.
- DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams
Ginés Carreto Picón, Peng Yuan Zhou, Qi Zhang, Alexandros Iosifidis · Nov 21, 2025 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context
Hannes Kunstmann, Joseph Ollier, Joel Persson, Florian von Wangenheim · Jul 5, 2024 · Citations: 0
Automatic Metrics General
Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises…
- CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
Yang Zhao, Chengxiao Dai, Wei Zhuo, Yue Xiu, Dusit Niyato · Sep 25, 2025 · Citations: 0
Automatic Metrics General
We introduce CLAUSE, an agentic three-agent neuro-symbolic framework that treats context construction as a sequential decision process over knowledge graphs, deciding what to expand, which paths to follow or backtrack, what evidence to…
- ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models
Shivanshu Kumar, Gopalakrishnan Srinivasan · Oct 13, 2025 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Qingyan Wei, Yaojie Zhang, Zhiyuan Liu, Puyu Zeng, Yuxuan Wang · Jun 12, 2025 · Citations: 0
Automatic Metrics General
Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63\times speedup on LLaDA with minimal accuracy drop, and up to 34.22\times when combined with caching.
- Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang, Yuyu Luo, Haixun Wang · Mar 31, 2026 · Citations: 0
Automatic Metrics Coding
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- OneComp: One-Line Revolution for Generative AI Model Compression
Yuma Ichikawa, Keiji Kimura, Akihiro Yoshida, Yudai Fujimoto, Hiroki Tokura · Mar 30, 2026 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Jiyuan Fu, Kaixun Jiang, Lingyi Hong, Jinglun Li, Haijing Guo · Jun 17, 2025 · Citations: 0
General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
Di Cao, Dongjie Fu, Hai Yu, Siqi Zheng, Xu Tan · Mar 6, 2026 · Citations: 0
Automatic Metrics General
Extensive experiments across multiple benchmarks demonstrate that X-OPD significantly narrows the gap in complex tasks while preserving the model's inherent capabilities.
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong · Oct 6, 2025 · Citations: 0
Automatic Metrics General
We introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation.
- LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications
Alexandre Cristovão Maiorano · Mar 28, 2026 · Citations: 0
Automatic Metrics General
We present a readiness harness for LLM and RAG applications that turns evaluation into a deployment decision workflow.
- SCOPE: Tree-based Self-Correcting Online Log Parsing via Syntactic-Semantic Collaboration
Dongyi Fan, Suqiong Zhang, Lili He, Ming Liu, Yifan Huo · Mar 28, 2026 · Citations: 0
Automatic Metrics General
Extensive evaluations on diverse benchmark datasets show that SCOPE outperforms state-of-the-art methods in both accuracy and efficiency.
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu · Nov 12, 2025 · Citations: 0
Automatic Metrics Coding
We implement SPI as a plugin for both FAISS and Qdrant backends and evaluate it across multiple RAG tasks including MS MARCO, Natural Questions, and multimodal retrieval benchmarks.
- PHONOS: PHOnetic Neutralization for Online Streaming Applications
Waris Quamer, Mu-Ruei Tseng, Ghady Nasrallah, Ricardo Gutierrez-Osuna · Mar 27, 2026 · Citations: 0
Automatic Metrics General
Our evaluations show an 81% reduction in non-native accent confidence, with listening-test ratings consistent with this shift, and reduced speaker linkability as accent-neutralized utterances move away from the original speaker in embedding…
- FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
Nikil Ravi, Kexing Ying, Vasilii Nesterov, Rayan Krishnan, Elif Uskuplu · Mar 27, 2026 · Citations: 0
Automatic Metrics Math
We present FormalProofBench, a private benchmark designed to evaluate whether AI models can produce formally verified mathematical proofs at the graduate level.
- JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems
Guangzhao Yang, Yu Pan, Shi Qiu, Ningjie Bai · Mar 27, 2026 · Citations: 0
Automatic Metrics Multilingual
Despite recent advances, efficient and robust turn-taking detection remains a significant challenge in industrial-grade Voice AI agent deployments.
- TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling
Nisharg Nargund, Priyesh Shukla · Feb 7, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- LLM4AD: Large Language Models for Autonomous Driving -- Concept, Review, Benchmark, Experiments, and Future Trends
Can Cui, Yunsheng Ma, Sung-Yeon Park, Zichong Yang, Yupeng Zhou · Oct 20, 2024 · Citations: 0
Simulation Env General
Then, a comprehensive benchmark is proposed for evaluating the instruction-following and reasoning abilities of LLM4AD systems, which includes LaMPilot-Bench, CARLA Leaderboard 1.0 Benchmark in simulation and NuPlanQA for multi-view visual…
- Characterizing Linear Alignment Across Language Models
Matt Gorbett, Suman Jana · Mar 19, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Selim An, Il hong Suh, Yeseong Kim · Mar 26, 2026 · Citations: 0
Automatic Metrics General
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models
Hieu Xuan Le, Benjamin Goh, Quy Anh Tang · Mar 26, 2026 · Citations: 0
Llm As Judge General
In production, guardrails must mitigate these attacks under strict low-latency constraints, resulting in a deployment gap in which lightweight classifiers and rule-based systems struggle to generalize under distribution shift, while…
- Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models
Peiju Liu, Jinming Liu, Xipeng Qiu, Xuanjing Huang · Mar 26, 2026 · Citations: 0
Automatic Metrics General
On the CogACT + SIMPLER benchmark, TIES improves average success rates by 6\% while reducing token usage by 78\%, and demonstrate strong generalization across diverse decoders and benchmarks.
- GraphER: An Efficient Graph-Based Enrichment and Reranking Method for Retrieval-Augmented Generation
Ruizhong Miao, Yuying Wang, Rongguang Wang, Chenyang Li, Tao Sheng · Mar 26, 2026 · Citations: 0
Automatic Metrics General
Prior approaches to this problem include agentic retrieval strategies, which expand the semantic search space by generating additional queries.