- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation
Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou · Apr 1, 2026 · Citations: 0
Critique Edit Simulation Env Long Horizon
As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution…
- Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai · Mar 30, 2026 · Citations: 0
Critique Edit Long Horizon
We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe.
- How Much LLM Does a Self-Revising Agent Actually Need?
Sungwoo Jung, Seonil Son · Apr 8, 2026 · Citations: 0
Critique Edit Automatic Metrics
Recent LLM-based agents often place world modeling, planning, and reflection inside a single language model loop.
- The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle
Lara Russell-Lasalandra, Hudson Golino, Luis Eduardo Garrido, Alexander P. Christensen · Mar 30, 2026 · Citations: 0
Critique Edit Tool Use
Psychological scale development has traditionally required extensive expert involvement, iterative revision, and large-scale pilot testing before psychometric evaluation can begin.
- Can Large Language Models Self-Correct in Medical Question Answering? An Exploratory Study
Zaifu Zhan, Mengyuan Cui, Rui Zhang · Mar 31, 2026 · Citations: 0
Critique Edit Automatic Metrics
Large language models (LLMs) have achieved strong performance on medical question answering (medical QA), and chain-of-thought (CoT) prompting has further improved results by eliciting explicit intermediate reasoning; meanwhile,…
- Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
Zhiting Fan, Ruizhe Chen, Tianxiang Hu, Ru Peng, Zenan Huang · Apr 1, 2026 · Citations: 0
Rubric RatingCritique Edit
However, high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law, and finance is scarce because expert curation is expensive, privacy constraints are strict, and label consistency is hard to…
- Voice Under Revision: Large Language Models and the Normalization of Personal Narrative
Tom van Nuenen · Apr 24, 2026 · Citations: 0
Critique Edit
This has consequences for digital humanities and computational text analysis, where features such as function words, pronouns, contractions, and punctuation often serve as evidence for style, voice, authorship, and corpus integrity.
- From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection
Hongxu Zhou · Apr 7, 2026 · Citations: 0
Critique Edit
While structured feedback can mitigate this issue, existing approaches often rely on externally trained critics or symbolic tools, reducing agent autonomy.
- The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management
Andrew Ang, Nazym Azimbayev, Andrey Kim · Apr 2, 2026 · Citations: 0
Critique Edit
Agentic AI shifts the investor's role from analytical execution to oversight.
- Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines
Jingjie Ning, Xueqi Li, Chengyu Yu · Apr 1, 2026 · Citations: 0
Critique Edit
We evaluate this design across two model pairs on three benchmarks spanning knowledge-intensive MCQ and competitive programming.
- EarlySciRev: A Dataset of Early-Stage Scientific Revisions Extracted from LaTeX Writing Traces
Léane Jourdan, Julien Aubert-Béduchaud, Yannis Chupin, Marah Baccari, Florian Boudin · Mar 30, 2026 · Citations: 0
Critique Edit
This limits empirical study of revision behaviour and evaluation of large language models (LLMs) for scientific writing.
- Understanding Teacher Revisions of Large Language Model-Generated Feedback
Conrad Borchers, Luiz Rodrigues, Newarney Torrezão da Costa, Cleon Xavier, Rafael Ferreira Mello · Mar 29, 2026 · Citations: 0
Critique EditRlaif Or Synthetic Feedback
First, we find that teachers accept AI feedback without modification in about 80% of cases, while edited feedback tends to be significantly longer and subsequently shortened by teachers.