- InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Sayed Mohammadreza Tayaranian Hosseini, Amir Ardakani, Warren J. Gross · Feb 26, 2026
Automatic Metrics MathCoding
Our evaluation experiments on Llama models shows that InnerQ maintains a few-shot GSM8K performance comparable to non-quantized KV caches and surpasses prior KV cache quantization methods.
- pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training
Wenzheng Zhang, Bingzheng Liu, Yang Hu, Xiaoying Bai, Wentao Zhang · Feb 26, 2026
Automatic Metrics General
Quantization-Aware Training from scratch has emerged as a promising approach for building efficient large language models (LLMs) with extremely low-bit weights (sub 2-bit), which can offer substantial advantages for edge deployment.
- A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection
Mirza Raquib, Asif Pervez Polok, Kedar Nath Biswas, Rahat Uddin Azad, Saydul Akbar Murad · Feb 25, 2026
Automatic Metrics General
Evaluation uses multiple metrics, including accuracy, precision, recall, F1-score, Hamming loss, Cohens kappa, and AUC-ROC.
- Dynamic Personality Adaptation in Large Language Models via State Machines
Leon Pielage, Ole Hätscher, Mitja Back, Bernhard Marschall, Benjamin Risse · Feb 25, 2026
Simulation Env Medicine
This work demonstrates the feasibility of modular, personality-adaptive architectures for education, customer support, and broader human-computer interaction.
- Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
Mohammadreza Ghaffarzadeh-Esfahani, Nahid Yousefian, Ebrahim Heidari-Farsani, Ali Akbar Omidvarian, Sepehr Ghahraei · Feb 24, 2026
Automatic Metrics MedicineMultilingual
Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP).
- An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
Cathy Shyr, Yan Hu, Rory J. Tinker, Thomas A. Cassini, Kevin W. Byram · Feb 23, 2026
Automatic Metrics Medicine
Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontolo
- Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference
Arindam Khaled · Feb 23, 2026
Automatic Metrics Math
In this work, we propose "Pyramid MoA", a hierarchical Mixture-of-Agents architecture that uses a lightweight Router to dynamically escalate queries only when necessary.
- MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs
Dongwei Wang, Jinhee Kim, Seokho Han, Denis Gudovskiy, Yohei Nakata · Feb 21, 2026
Automatic Metrics General
Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources.
- Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning
Abhinaba Basu · Feb 21, 2026
Automatic Metrics Multilingual
Personal AI agents incur substantial cost via repeated LLM calls.
- AAVGen: Precision Engineering of Adeno-associated Viral Capsids for Renal Selective Targeting
Mohammadreza Ghaffarzadeh-Esfahani, Yousof Gheisari · Feb 21, 2026
Automatic Metrics General
Adeno-associated viruses (AAVs) are promising vectors for gene therapy, but their native serotypes face limitations in tissue tropism, immune evasion, and production efficiency.
- The Anxiety of Influence: Bloom Filters in Transformer Attention Heads
Peter Balogh · Feb 19, 2026
Automatic Metrics General
Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, me
- Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats
Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin · Feb 13, 2026
Automatic Metrics General
As LLMs scale, low-bit floating-point formats like MXFP and NVFP4 offer new opportunities for precision and efficiency.
- SciTS: Scientific Time Series Understanding and Generation with LLMs
Wen Wu, Ziyang Zhang, Liwei Liu, Xuenan Xu, Jimin Zhuang · Sep 26, 2025
Automatic Metrics Coding
To address these gaps, we introduce SciTS, a benchmark spanning 12 scientific domains and 43 tasks, with over 50k+ instances, both univariate and multivariate signals ranging from $10^0$ to $10^7$ in length and up to 10~MHz in frequency.
- Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise
Zhenkai Zhang, Krista A. Ehinger, Tom Drummond · Oct 26, 2023
Automatic Metrics Math
This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes.