- EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
Jiahe Shi, Zhengqi Gao, Ching-Yun Ko, Duane Boning · Nov 15, 2025
Recent advances in large language models (LLMs) have demonstrated significant potential in hardware design automation, particularly in using natural language to synthesize Register-Transfer Level (RTL) code.
- CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
Crystal Min Hui Poon, Pai Chet Ng, Xiaoxiao Miao, Immanuel Jun Kai Loh, Bowen Zhang · Nov 14, 2025
Instruction-guided text-to-speech (TTS) research has reached a maturity level where excellent speech generation quality is possible on demand, yet two coupled biases persist in reducing perceived quality: accent bias, where models default t
- Multimodal Peer Review Simulation with Actionable To-Do Recommendations for Community-Aware Manuscript Revisions
Mengze Hong, Di Jiang, Weiwei Zhao, Yawen Li, Yihang Wang · Nov 14, 2025
Critique Edit
Experimental results highlight the effectiveness of the proposed system in generating more comprehensive and useful reviews aligned with expert standards, surpassing ablated baselines and advancing transparent, human-centered scholarly assi
- Mastering Olympiad-Level Physics with Artificial Intelligence
Dong-Shan Jian, Xiang Li, Chen-Xu Yan, Hui-Wen Zheng, Zhi-Zhang Bian · Nov 13, 2025
Olympiad-level physics problem-solving significantly challenges both humans and artificial intelligence (AI), as it requires integrating appropriate modeling, application of physical principles, and precise calculation within long reasoning
- Chain of Summaries: Summarization Through Iterative Questioning
William Brach, Kristián Košťál, Lukas Galke Poech · Nov 12, 2025
CoS thus resembles an appealing option for website maintainers to make their content more accessible for LLMs, while retaining possibilities for human oversight.
- State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
Taja Kuzman Pungeršek, Peter Rupnik, Ivan Porupski, Vuk Dinić, Nikola Ljubešić · Nov 11, 2025
Until recently, fine-tuned BERT-like models provided state-of-the-art performance on text classification tasks.
- Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya · Nov 11, 2025
Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure.
- Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury · Nov 10, 2025
Long Horizon
On the Episodic Memory Benchmark (EpBench) \cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to \textbf{20\%}.
- Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents
Hanlin Cai, Houtianfu Wang, Haofan Dong, Kai Li, Sai Zou · Nov 10, 2025
Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale.
- RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
Haofeng Wang, Yu Zhang · Nov 10, 2025
Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks.
- OckBench: Measuring the Efficiency of LLM Reasoning
Zheng Du, Hao Kang, Song Han, Tushar Krishna, Ligeng Zhu · Nov 7, 2025
Yet current benchmarks emphasize accuracy and output quality, neglecting a critical dimension: efficiency of token usage.
- Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale
David Acuna, Chao-Han Huck Yang, Yuntian Deng, Jaehun Jung, Ximing Lu · Nov 7, 2025
Pairwise Preference
We introduce a framework able to synthesize vision-centric problems spanning diverse levels of complexity, and the resulting dataset with over 1M high-quality problems including: reasoning traces, preference data, and instruction prompts su
- Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
Saurabh Srivastava, Janit Bidhan, Hao Yan, Abhishek Dey, Tanu Kansal · Nov 6, 2025
Across 13 diverse benchmarks with DeepSeek-R1 and OpenAI-o1, batch prompting {reduces reasoning tokens by 76\% (2{,}950$\mapsto$710), on average, while preserving or improving accuracy}.
- Error-Aware Knowledge Distillation via Targeted Revision for Customer-Service Summarization
Hee-Jin Lee, Zhen Guo, Luchao Jin, Morteza Moazami Goudarzi · Nov 4, 2025
Critique Edit
We introduce an Analyze-Revise-Finetune (ARF) pipeline that enables smaller open-source language models (LLMs) to surpass substantially larger proprietary models in customer service summarization tasks.
- A Proof of Learning Rate Transfer under $μ$P
Soufiane Hayou · Nov 3, 2025
We provide the first proof of learning rate transfer with width in a linear multi-layer perceptron (MLP) parametrized with $μ$P, a neural network parameterization designed to ``maximize'' feature learning in the infinite-width limit.