Paper2Code archive

Paper archive page 44

Page 44 of 111. Links match the indexable papers sitemap inventory.

300 canonical paper links on this archive page.

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
arxiv-2604.07343 Sparse Blocked context only

Apr 8, 2026
A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering
arxiv-2604.07274 Sparse Blocked context only

Apr 8, 2026
Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent
arxiv-2604.07269 Sparse Blocked context only

Apr 8, 2026
How Much LLM Does a Self-Revising Agent Actually Need?
arxiv-2604.07236 Sparse Blocked context only

Apr 8, 2026
TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
arxiv-2604.07223 Sparse Blocked context only

Apr 8, 2026
MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation
arxiv-2605.13857 Sparse Blocked context only

Apr 8, 2026
Agent-Driven Corpus Linguistics: A Framework for Autonomous Linguistic Discovery
arxiv-2604.07189 Sparse Blocked context only

Apr 8, 2026
STRIDE-ED: A Strategy-Grounded Stepwise Reasoning Framework for Empathetic Dialogue Systems
arxiv-2604.07100 Sparse Blocked context only

Apr 8, 2026
Is Cross-Lingual Transfer in Bilingual Models Human-Like? A Study with Overlapping Word Forms in Dutch and English
arxiv-2604.07067 Sparse Blocked context only

Apr 8, 2026
Sell More, Play Less: Benchmarking LLM Realistic Selling Skill
arxiv-2604.07054 Sparse Blocked context only

Apr 8, 2026
ReDAct: Uncertainty-Aware Deferral for LLM Agents
arxiv-2604.07036 Curated Related Blocked context only

Apr 8, 2026
Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models
arxiv-2604.07035 Sparse Blocked context only

Apr 8, 2026
MARS: Enabling Autoregressive Models Multi-Token Generation
arxiv-2604.07023 Sparse Blocked context only

Apr 8, 2026
DTCRS: Dynamic Tree Construction for Recursive Summarization
arxiv-2604.07012 Sparse Blocked context only

Apr 8, 2026
iTAG: Inverse Design for Natural Text Generation with Accurate Causal Graph Annotations
arxiv-2604.06902 Sparse Blocked context only

Apr 8, 2026
On the Step Length Confounding in LLM Reasoning Data Selection
arxiv-2604.06834 Sparse Blocked context only

Apr 8, 2026
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
arxiv-2604.06832 Sparse Blocked context only

Apr 8, 2026
WRAP++: Web discoveRy Amplified Pretraining
arxiv-2604.06829 Sparse Blocked context only

Apr 8, 2026
Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions
arxiv-2604.06799 Sparse Blocked context only

Apr 8, 2026
GCoT-Decoding: Unlocking Deep Reasoning Paths for Universal Question Answering
arxiv-2604.06794 Sparse Blocked context only

Apr 8, 2026
From Perception to Autonomous Computational Modeling: A Multi-Agent Approach
arxiv-2604.06788 Sparse Blocked context only

Apr 8, 2026
Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models
arxiv-2604.06767 Sparse Blocked context only

Apr 8, 2026
How Long Reasoning Chains Influence LLMs' Judgment of Answer Factuality
arxiv-2604.06756 Sparse Blocked context only

Apr 8, 2026
Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents
arxiv-2604.06753 Sparse Blocked context only

Apr 8, 2026
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference
arxiv-2604.06746 Sparse Blocked context only

Apr 8, 2026
WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report
arxiv-2604.06737 Sparse Blocked context only

Apr 8, 2026
Steering the Verifiability of Multimodal AI Hallucinations
arxiv-2604.06714 Sparse Blocked context only

Apr 8, 2026
Adaptive Prompt Structure Factorization: A Framework for Self-Discovering and Optimizing Compositional Prompt Programs
arxiv-2604.06699 Sparse Blocked context only

Apr 8, 2026
ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding
arxiv-2604.06685 Sparse Blocked context only

Apr 8, 2026
Argus: Reorchestrating Static Analysis via a Multi-Agent Ensemble for Full-Chain Security Vulnerability Detection
arxiv-2604.06633 Sparse Blocked context only

Apr 8, 2026
DiffuMask: Diffusion Language Model for Token-level Prompt Pruning
arxiv-2604.06627 Sparse Blocked context only

Apr 8, 2026
Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs
arxiv-2604.06603 Sparse Blocked context only

Apr 8, 2026
LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources
arxiv-2604.06571 Sparse Blocked context only

Apr 8, 2026
CCD-CBT: Multi-Agent Therapeutic Interaction for CBT Guided by Cognitive Conceptualization Diagram
arxiv-2604.06551 Sparse Blocked context only

Apr 8, 2026
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
arxiv-2605.02913 Sparse Blocked context only

Apr 8, 2026
MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts
arxiv-2604.06505 Sparse Blocked context only

Apr 7, 2026
Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR
arxiv-2604.06487 Sparse Blocked context only

Apr 7, 2026
ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs
arxiv-2604.06484 Sparse Blocked context only

Apr 7, 2026
DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling
arxiv-2604.06474 Sparse Blocked context only

Apr 7, 2026
Multi-objective Evolutionary Merging Enables Efficient Reasoning Models
arxiv-2604.06465 Sparse Blocked context only

Apr 7, 2026
Context-Aware Dialectal Arabic Machine Translation with Interactive Region and Register Selection
arxiv-2604.06456 Sparse Blocked context only

Apr 7, 2026
Learning to Interrupt in Language-based Multi-agent Communication
arxiv-2604.06452 Sparse Blocked context only

Apr 7, 2026
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
arxiv-2604.06427 Sparse Blocked context only

Apr 7, 2026
State-of-the-Art Arabic Language Modeling with Sparse MoE Fine-Tuning and Chain-of-Thought Distillation
arxiv-2604.06421 Sparse Blocked context only

Apr 7, 2026
Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries
arxiv-2604.06416 Sparse Blocked context only

Apr 7, 2026
Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning
arxiv-2604.06385 Sparse Blocked context only

Apr 7, 2026
STDec: Spatio-Temporal Stability Guided Decoding for dLLMs
arxiv-2604.06330 Sparse Blocked context only

Apr 7, 2026
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
arxiv-2604.06170 Sparse Blocked context only

Apr 7, 2026
Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
arxiv-2604.06155 Sparse Blocked context only

Apr 7, 2026
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives
arxiv-2604.06091 Sparse Blocked context only

Apr 7, 2026
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles
arxiv-2604.06071 Sparse Blocked context only

Apr 7, 2026
Short Data, Long Context: Distilling Positional Knowledge in Transformers
arxiv-2604.06070 Sparse Blocked context only

Apr 7, 2026
From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection
arxiv-2604.06066 Sparse Blocked context only

Apr 7, 2026
BiMind: A Dual-Head Reasoning Model with Attention-Geometry Adapter for Incorrect Information Detection
arxiv-2604.06022 Sparse Blocked context only

Apr 7, 2026
Is CLIP Cross-Eyed? Revealing and Mitigating Center Bias in the CLIP Family
arxiv-2604.05971 Sparse Blocked context only

Apr 7, 2026
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures
arxiv-2604.05966 Direct Blocked context only

Apr 7, 2026
"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns?
arxiv-2604.05930 Sparse Blocked context only

Apr 7, 2026
Mechanistic Circuit-Based Knowledge Editing in Large Language Models
arxiv-2604.05876 Sparse Blocked context only

Apr 7, 2026
Evaluating Learner Representations for Differentiation Prior to Instructional Outcomes
arxiv-2604.05848 Sparse Blocked context only

Apr 7, 2026
AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning
arxiv-2604.05846 Sparse Blocked context only

Apr 7, 2026
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
arxiv-2604.05818 Sparse Blocked context only

Apr 7, 2026
Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation
arxiv-2604.05795 Sparse Blocked context only

Apr 7, 2026
What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know"
arxiv-2604.05779 Sparse Blocked context only

Apr 7, 2026
Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0
arxiv-2604.05767 Sparse Blocked context only

Apr 7, 2026
Identifying Influential N-grams in Confidence Calibration via Regression Analysis
arxiv-2604.05757 Sparse Blocked context only

Apr 7, 2026
Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning
arxiv-2604.05756 Sparse Blocked context only

Apr 7, 2026
Can Large Language Models Reinvent Foundational Algorithms?
arxiv-2604.05716 Sparse Blocked context only

Apr 7, 2026
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion
arxiv-2604.05688 Sparse Blocked context only

Apr 7, 2026
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals
arxiv-2604.05655 Sparse Blocked context only

Apr 7, 2026
See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
arxiv-2604.05650 Sparse Blocked context only

Apr 7, 2026
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge
arxiv-2604.05593 Sparse Blocked context only

Apr 7, 2026
Weakly Supervised Distillation of Hallucination Signals into Transformer Representations
arxiv-2604.06277 Sparse Blocked context only

Apr 7, 2026
Spec Kit Agents: Context-Grounded Agentic Workflows
arxiv-2604.05278 Sparse Blocked context only

Apr 7, 2026
Just Pass Twice: Efficient Token Classification with LLMs for Zero-Shot NER
arxiv-2604.05158 Sparse Blocked context only

Apr 6, 2026
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
arxiv-2604.04921 Sparse Blocked context only

Apr 6, 2026
Vero: An Open RL Recipe for General Visual Reasoning
arxiv-2604.04917 Sparse Blocked context only

Apr 6, 2026
HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection
arxiv-2604.04908 Sparse Blocked context only

Apr 6, 2026
Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation
arxiv-2604.04894 Sparse Blocked context only

Apr 6, 2026
Synthetic Sandbox for Training Machine Learning Engineering Agents
arxiv-2604.04872 Sparse Blocked context only

Apr 6, 2026
Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning
arxiv-2604.04869 Sparse Blocked context only

Apr 6, 2026
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
arxiv-2604.04853 Sparse Blocked context only

Apr 6, 2026
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
arxiv-2604.04847 Sparse Blocked context only

Apr 6, 2026
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
arxiv-2604.04843 Sparse Blocked context only

Apr 6, 2026
Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling
arxiv-2604.04842 Sparse Blocked context only

Apr 6, 2026
MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation
arxiv-2604.04839 Sparse Blocked context only

Apr 6, 2026
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing
arxiv-2605.02910 Sparse Blocked context only

Apr 6, 2026
Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
arxiv-2604.04825 Sparse Blocked context only

Apr 6, 2026
ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture
arxiv-2604.04820 Sparse Blocked context only

Apr 6, 2026
LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection
arxiv-2604.04815 Sparse Blocked context only

Apr 6, 2026
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
arxiv-2604.04804 Sparse Blocked context only

Apr 6, 2026
How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling
arxiv-2604.04791 Sparse Blocked context only

Apr 6, 2026
Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems
arxiv-2604.04767 Sparse Blocked context only

Apr 6, 2026
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
arxiv-2604.04759 Sparse Blocked context only

Apr 6, 2026
AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments
arxiv-2604.04749 Sparse Blocked context only

Apr 6, 2026
Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity
arxiv-2604.04735 Sparse Blocked context only

Apr 6, 2026
Discovering Failure Modes in Vision-Language Models using RL
arxiv-2604.04733 Sparse Blocked context only

Apr 6, 2026
Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs
arxiv-2604.04732 Sparse Blocked context only

Apr 6, 2026
Individual and Combined Effects of English as a Second Language and Typos on LLM Performance
arxiv-2604.04723 Sparse Blocked context only

Apr 6, 2026
Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity
arxiv-2604.04692 Sparse Blocked context only

Apr 6, 2026
ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration
arxiv-2604.04664 Direct Blocked context only

Apr 6, 2026
Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents
arxiv-2604.04651 Sparse Blocked context only

Apr 6, 2026
SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems
arxiv-2604.04514 Direct Blocked context only

Apr 6, 2026
One Model for All: Multi-Objective Controllable Language Models
arxiv-2604.04497 Sparse Blocked context only

Apr 6, 2026
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
arxiv-2604.04488 Sparse Blocked context only

Apr 6, 2026
Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models
arxiv-2604.09687 Sparse Blocked context only

Apr 6, 2026
Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning
arxiv-2604.04442 Sparse Blocked context only

Apr 6, 2026
STEER: Structured Event Evidence for Video Reasoning via Multi-Objective Reinforcement Learning
arxiv-2604.04415 Sparse Blocked context only

Apr 6, 2026
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models
arxiv-2604.04385 Sparse Blocked context only

Apr 6, 2026
REAM: Merging Improves Pruning of Experts in LLMs
arxiv-2604.04356 Sparse Blocked context only

Apr 6, 2026
Self-Distilled RLVR
arxiv-2604.03128 Sparse Blocked context only

Apr 3, 2026
SkVM: Compiling Skills for Efficient Execution Everywhere
arxiv-2604.03088 Direct Blocked context only

Apr 3, 2026
JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency
arxiv-2604.03044 Curated Related Blocked context only

Apr 3, 2026
OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments
arxiv-2605.18758 Sparse Blocked context only

Apr 3, 2026
Mitigating LLM biases toward spurious social contexts using direct preference optimization
arxiv-2604.02585 Sparse Blocked context only

Apr 2, 2026
ActionParty: Multi-Subject Action Binding in Generative Video Games
arxiv-2604.02330 Sparse Blocked context only

Apr 2, 2026
Steerable Visual Representations
arxiv-2604.02327 Sparse Blocked context only

Apr 2, 2026
Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
arxiv-2604.02322 Sparse Blocked context only

Apr 2, 2026
Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models
arxiv-2604.02315 Sparse Blocked context only

Apr 2, 2026
VOID: Video Object and Interaction Deletion
arxiv-2604.02296 Sparse Blocked context only

Apr 2, 2026
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation
arxiv-2604.02289 Sparse Blocked context only

Apr 2, 2026
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
arxiv-2604.02288 Sparse Blocked context only

Apr 2, 2026
Novel Memory Forgetting Techniques for Autonomous AI Agents: Balancing Relevance and Efficiency
arxiv-2604.02280 Sparse Blocked context only

Apr 2, 2026
Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
arxiv-2604.02270 Sparse Blocked context only

Apr 2, 2026
Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs
arxiv-2604.02230 Sparse Blocked context only

Apr 2, 2026
When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning
arxiv-2604.02226 Sparse Blocked context only

Apr 2, 2026
Impact of Multimodal and Conversational AI on Learning Outcomes and Experience
arxiv-2604.02221 Sparse Blocked context only

Apr 2, 2026
Towards Position-Robust Talent Recommendation via Large Language Models
arxiv-2604.02200 Sparse Blocked context only

Apr 2, 2026
Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model
arxiv-2604.02194 Sparse Blocked context only

Apr 2, 2026
The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level
arxiv-2604.02178 Sparse Blocked context only

Apr 2, 2026
Adam's Law: Textual Frequency Law on Large Language Models
arxiv-2604.02176 Sparse Blocked context only

Apr 2, 2026
Quantifying Self-Preservation Bias in Large Language Models
arxiv-2604.02174 Sparse Blocked context only

Apr 2, 2026
MTI: A Behavior-Based Temperament Profiling System for AI Agents
arxiv-2604.02145 Sparse Blocked context only

Apr 2, 2026
LLM-as-a-Judge for Time Series Explanations
arxiv-2604.02118 Sparse Blocked context only

Apr 2, 2026
Reliable Control-Point Selection for Steering Reasoning in Large Language Models
arxiv-2604.02113 Sparse Blocked context only

Apr 2, 2026
Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
arxiv-2604.02091 Sparse Blocked context only

Apr 2, 2026
Diff-KD: Diffusion-based Knowledge Distillation for Collaborative Perception under Corruptions
arxiv-2604.02061 Sparse Blocked context only

Apr 2, 2026
ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety
arxiv-2604.02022 Direct Blocked context only

Apr 2, 2026
$k$NNProxy: Efficient Training-Free Proxy Alignment for Black-Box Zero-Shot LLM-Generated Text Detection
arxiv-2604.02008 Sparse Blocked context only

Apr 2, 2026
ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning
arxiv-2604.02006 Sparse Blocked context only

Apr 2, 2026
SAFE: Stepwise Atomic Feedback for Error correction in Multi-hop Reasoning
arxiv-2604.01993 Sparse Blocked context only

Apr 2, 2026
RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale
arxiv-2604.01977 Sparse Blocked context only

Apr 2, 2026
Ego-Grounding for Personalized Question-Answering in Egocentric Videos
arxiv-2604.01966 Sparse Blocked context only

Apr 2, 2026
Lifting Unlabeled Internet-level Data for 3D Scene Understanding
arxiv-2604.01907 Sparse Blocked context only

Apr 2, 2026
PLOT: Enhancing Preference Learning via Optimal Transport
arxiv-2604.01837 Sparse Blocked context only

Apr 2, 2026
DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment
arxiv-2604.01787 Sparse Blocked context only

Apr 2, 2026
DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
arxiv-2604.01765 Sparse Blocked context only

Apr 2, 2026
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
arxiv-2604.01762 Sparse Blocked context only

Apr 2, 2026
LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches
arxiv-2604.01754 Sparse Blocked context only

Apr 2, 2026
LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis
arxiv-2604.01725 Sparse Blocked context only

Apr 2, 2026
Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion Recognition
arxiv-2604.01711 Sparse Blocked context only

Apr 2, 2026
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
arxiv-2604.01707 Sparse Blocked context only

Apr 2, 2026
Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations
arxiv-2604.01639 Sparse Blocked context only

Apr 2, 2026
OSCAR: Orchestrated Self-verification and Cross-path Refinement
arxiv-2604.01624 Sparse Blocked context only

Apr 2, 2026
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
arxiv-2604.01622 Sparse Blocked context only

Apr 2, 2026
DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
arxiv-2604.01560 Sparse Blocked context only

Apr 2, 2026
Read More, Think More: Revisiting Observation Reduction for Web Agents
arxiv-2604.01535 Sparse Blocked context only

Apr 2, 2026
Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once
arxiv-2604.01504 Sparse Blocked context only

Apr 2, 2026
From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents
arxiv-2604.01496 Sparse Blocked context only

Apr 2, 2026
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks
arxiv-2604.01487 Sparse Blocked context only

Apr 1, 2026
When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals
arxiv-2604.01476 Sparse Blocked context only

Apr 1, 2026
Adaptive Stopping for Multi-Turn LLM Reasoning
arxiv-2604.01413 Sparse Blocked context only

Apr 1, 2026
Procedural Knowledge at Scale Improves Reasoning
arxiv-2604.01348 Sparse Blocked context only

Apr 1, 2026
Preference learning in shades of gray: Interpretable and bias-aware reward modeling for human preferences
arxiv-2604.01312 Sparse Blocked context only

Apr 1, 2026
M2-Verify: A Large-Scale Multidomain Benchmark for Checking Multimodal Claim Consistency
arxiv-2604.01306 Sparse Blocked context only

Apr 1, 2026
Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming
arxiv-2604.01302 Sparse Blocked context only

Apr 1, 2026
HippoCamp: Benchmarking Contextual Agents on Personal Computers
arxiv-2604.01221 Sparse Blocked context only

Apr 1, 2026
Universal YOCO for Efficient Depth Scaling
arxiv-2604.01220 Sparse Blocked context only

Apr 1, 2026
$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution
arxiv-2604.01212 Sparse Blocked context only

Apr 1, 2026
LLM REgression with a Latent Iterative State Head
arxiv-2604.01206 Sparse Blocked context only

Apr 1, 2026
Embarrassingly Simple Self-Distillation Improves Code Generation
arxiv-2604.01193 Sparse Blocked context only

Apr 1, 2026
True (VIS) Lies: Analyzing How Generative AI Recognizes Intentionality, Rhetoric, and Misleadingness in Visualization Lies
arxiv-2604.01181 Sparse Blocked context only

Apr 1, 2026
Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning
arxiv-2604.01170 Sparse Blocked context only

Apr 1, 2026
Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning
arxiv-2604.01152 Sparse Blocked context only

Apr 1, 2026
CARE: Privacy-Compliant Agentic Reasoning with Evidence Discordance
arxiv-2604.01113 Sparse Blocked context only

Apr 1, 2026
Uncertainty-Aware Variational Reward Factorization via Probabilistic Preference Bases for LLM Personalization
arxiv-2604.00997 Sparse Blocked context only

Apr 1, 2026
Multimodal Analysis of State-Funded News Coverage of the Israel-Hamas War on YouTube Shorts
arxiv-2604.00994 Sparse Blocked context only

Apr 1, 2026
Dual Optimal: Make Your LLM Peer-like with Dignity
arxiv-2604.00979 Sparse Blocked context only

Apr 1, 2026
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment
arxiv-2604.00913 Sparse Blocked context only

Apr 1, 2026
Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models
arxiv-2604.00890 Sparse Blocked context only

Apr 1, 2026
PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding
arxiv-2604.00886 Sparse Blocked context only

Apr 1, 2026
KUET at StanceNakba Shared Task: StanceMoE: Mixture-of-Experts Architecture for Stance Detection
arxiv-2604.00878 Sparse Blocked context only

Apr 1, 2026
Agentic Tool Use in Large Language Models
arxiv-2604.00835 Curated Related Blocked context only

Apr 1, 2026
Multimodal Language Models Cannot Spot Spatial Inconsistencies
arxiv-2604.00799 Sparse Blocked context only

Apr 1, 2026
LangMARL: Natural Language Multi-Agent Reinforcement Learning
arxiv-2604.00722 Sparse Blocked context only

Apr 1, 2026
TRIMS: Trajectory-Ranked Instruction Masked Supervision for Diffusion Language Models
arxiv-2604.00666 Sparse Blocked context only

Apr 1, 2026
A Survey of On-Policy Distillation for Large Language Models
arxiv-2604.00626 Sparse Blocked context only

Apr 1, 2026
Speech LLMs are Contextual Reasoning Transcribers
arxiv-2604.00610 Sparse Blocked context only

Apr 1, 2026
More Human, More Efficient: Aligning Annotations with Quantized SLMs
arxiv-2604.00586 Sparse Blocked context only

Apr 1, 2026
Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents
arxiv-2604.00555 Sparse Blocked context only

Apr 1, 2026
MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding
arxiv-2604.00513 Sparse Blocked context only

Apr 1, 2026
Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling
arxiv-2604.00489 Sparse Blocked context only

Apr 1, 2026
Execution-Verified Reinforcement Learning for Optimization Modeling
arxiv-2604.00442 Sparse Blocked context only

Apr 1, 2026
TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning
arxiv-2604.00438 Sparse Blocked context only

Apr 1, 2026
Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
arxiv-2604.00375 Sparse Blocked context only

Apr 1, 2026
Signals: Trajectory Sampling and Triage for Agentic Interactions
arxiv-2604.00356 Sparse Blocked context only

Apr 1, 2026
Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning
arxiv-2604.00344 Sparse Blocked context only

Apr 1, 2026
Large Language Models in the Abuse Detection Pipeline
arxiv-2604.00323 Sparse Blocked context only

Mar 31, 2026
Can Large Language Models Self-Correct in Medical Question Answering? An Exploratory Study
arxiv-2604.00261 Curated Related Blocked context only

Mar 31, 2026
A Taxonomy of Programming Languages for Code Generation
arxiv-2604.00239 Sparse Blocked context only

Mar 31, 2026
Do LLMs Know What Is Private Internally? Probing and Steering Contextual Privacy Norms in Large Language Model Representations
arxiv-2604.00209 Sparse Blocked context only

Mar 31, 2026
Hierarchical Chain-of-Thought Prompting: Enhancing LLM Reasoning Performance and Efficiency
arxiv-2604.00130 Sparse Blocked context only

Mar 31, 2026
Hierarchical Pre-Training of Vision Encoders with Large Language Models
arxiv-2604.00086 Sparse Blocked context only

Mar 31, 2026
One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction
arxiv-2604.00085 Sparse Blocked context only

Mar 31, 2026
Cognitive Friction: A Decision-Theoretic Framework for Bounded Deliberation in Tool-Using Agents
arxiv-2603.30031 Sparse Blocked context only

Mar 31, 2026
Less Is More? Selective Visual Attention to High-Importance Regions for Multimodal Radiology Summarization
arxiv-2603.29901 Sparse Blocked context only

Mar 31, 2026
Training-Free Dynamic Upcycling of Expert Language Models
arxiv-2603.29765 Sparse Blocked context only

Mar 31, 2026
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
arxiv-2603.29676 Sparse Blocked context only

Mar 31, 2026
Agenda-based Narrative Extraction: Steering Pathfinding Algorithms with Large Language Models
arxiv-2603.29661 Sparse Blocked context only

Mar 31, 2026
Learning Diagnostic Reasoning for Decision Support in Toxicology
arxiv-2603.29608 Sparse Blocked context only

Mar 31, 2026
LLM Probe: Evaluating LLMs for Low-Resource Languages
arxiv-2603.29517 Sparse Blocked context only

Mar 31, 2026
MemFactory: Unified Inference & Training Framework for Agent Memory
arxiv-2603.29493 Sparse Blocked context only

Mar 31, 2026
M-MiniGPT4: Multilingual VLLM Alignment via Translated Data
arxiv-2603.29467 Sparse Blocked context only

Mar 31, 2026
ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities
arxiv-2603.29399 Sparse Blocked context only

Mar 31, 2026
Beyond Idealized Patients: Evaluating LLMs under Challenging Patient Behaviors in Medical Consultations
arxiv-2603.29373 Sparse Blocked context only

Mar 31, 2026
Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE
arxiv-2603.29259 Sparse Blocked context only

Mar 31, 2026
MemRerank: Preference Memory for Personalized Product Reranking
arxiv-2603.29247 Sparse Blocked context only

Mar 31, 2026
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
arxiv-2603.29232 Sparse Blocked context only

Mar 31, 2026
SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation
arxiv-2603.29219 Sparse Blocked context only

Mar 31, 2026
Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems
arxiv-2603.29211 Sparse Blocked context only

Mar 31, 2026
Dual Perspectives in Emotion Attribution: A Generator-Interpreter Framework for Cross-Cultural Analysis of Emotion in LLMs
arxiv-2603.29077 Sparse Blocked context only

Mar 30, 2026
Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning
arxiv-2603.29038 Sparse Blocked context only

Mar 30, 2026
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
arxiv-2603.29025 Sparse Blocked context only

Mar 30, 2026
Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction
arxiv-2603.29023 Sparse Blocked context only

Mar 30, 2026
SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
arxiv-2603.28730 Sparse Blocked context only

Mar 30, 2026
Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
arxiv-2603.28618 Sparse Blocked context only

Mar 30, 2026
ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning
arxiv-2603.28610 Sparse Blocked context only

Mar 30, 2026
Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection
arxiv-2603.28596 Sparse Blocked context only

Mar 30, 2026
Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification
arxiv-2603.28488 Sparse Blocked context only

Mar 30, 2026
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
arxiv-2603.28342 Sparse Blocked context only

Mar 30, 2026
DongYuan: An LLM-Based Framework for Integrative Chinese and Western Medicine Spleen-Stomach Disorders Diagnosis
arxiv-2603.28191 Sparse Blocked context only

Mar 30, 2026
MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions
arxiv-2603.28086 Sparse Blocked context only

Mar 30, 2026
Efficient Inference of Large Vision Language Models
arxiv-2603.27960 Sparse Blocked context only

Mar 30, 2026
Top-down string-to-dependency Neural Machine Translation
arxiv-2603.27938 Sparse Blocked context only

Mar 30, 2026
ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control
arxiv-2603.27905 Sparse Blocked context only

Mar 29, 2026
EffiSkill: Agent Skill Based Automated Code Efficiency Optimization
arxiv-2603.27850 Sparse Blocked context only

Mar 29, 2026
Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3
arxiv-2603.27844 Sparse Blocked context only

Mar 29, 2026
Q-Bridge: Code Translation for Quantum Machine Learning via LLMs
arxiv-2603.27836 Sparse Blocked context only

Mar 29, 2026
Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning
arxiv-2603.27820 Sparse Blocked context only

Mar 29, 2026
KVSculpt: KV Cache Compression as Distillation
arxiv-2603.27819 Sparse Blocked context only

Mar 29, 2026
Conversational Agents and the Understanding of Human Language: Reflections on AI, LLMs, and Cognitive Science
arxiv-2603.27809 Sparse Blocked context only

Mar 29, 2026
Understanding Teacher Revisions of Large Language Model-Generated Feedback
arxiv-2603.27806 Sparse Blocked context only

Mar 29, 2026
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
arxiv-2603.27771 Sparse Blocked context only

Mar 29, 2026
KAT-Coder-V2 Technical Report
arxiv-2603.27703 Sparse Blocked context only

Mar 29, 2026
Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?
arxiv-2603.27694 Sparse Blocked context only

Mar 29, 2026
PRBench: End-to-end Paper Reproduction in Physics Research
arxiv-2603.27646 Sparse Blocked context only

Mar 29, 2026
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
arxiv-2603.27538 Direct Blocked context only

Mar 29, 2026
Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models
arxiv-2603.27522 Sparse Blocked context only

Mar 29, 2026
AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents
arxiv-2603.27490 Sparse Blocked context only

Mar 29, 2026
Multi-Agent Dialectical Refinement for Enhanced Argument Classification
arxiv-2603.27451 Sparse Blocked context only

Mar 29, 2026
Improving Attributed Long-form Question Answering with Intent Awareness
arxiv-2603.27435 Sparse Blocked context only

Mar 28, 2026
The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams
arxiv-2603.27412 Sparse Blocked context only

Mar 28, 2026
Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring
arxiv-2603.27404 Sparse Blocked context only

Mar 28, 2026
PubMed Reasoner: Dynamic Reasoning-based Retrieval for Evidence-Grounded Biomedical Question Answering
arxiv-2603.27335 Sparse Blocked context only

Mar 28, 2026
SACRED: A Faithful Annotated Multimedia Multimodal Multilingual Dataset for Classifying Connectedness Types in Online Spirituality
arxiv-2603.27331 Sparse Blocked context only

Mar 28, 2026
Mitigating Hallucination on Hallucination in RAG via Ensemble Voting
arxiv-2603.27253 Sparse Blocked context only

Mar 28, 2026
Rethinking Easy-to-Hard: Limits of Curriculum Learning in Post-Training for Deductive Reasoning
arxiv-2603.27226 Sparse Blocked context only

Mar 28, 2026
daVinci-LLM:Towards the Science of Pretraining
arxiv-2603.27164 Curated Related Blocked context only

Mar 28, 2026
Learning to Predict Future-Aligned Research Proposals with Language Models
arxiv-2603.27146 Sparse Blocked context only

Mar 28, 2026
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
arxiv-2603.27064 Direct Blocked context only

Mar 28, 2026
Debiasing Large Language Models toward Social Factors in Online Behavior Analytics through Prompt Knowledge Tuning
arxiv-2603.27057 Sparse Blocked context only

Mar 28, 2026
The Last Fingerprint: How Markdown Training Shapes LLM Prose
arxiv-2603.27006 Sparse Blocked context only

Mar 27, 2026
Learning to Commit: Generating Organic Pull Requests via Online Repository Memory
arxiv-2603.26664 Sparse Blocked context only

Mar 27, 2026
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
arxiv-2603.26653 Sparse Blocked context only

Mar 27, 2026
When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models
arxiv-2603.26556 Sparse Blocked context only

Mar 27, 2026
JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems
arxiv-2603.26515 Sparse Blocked context only

Mar 27, 2026
ClimateCheck 2026: Scientific Fact-Checking and Disinformation Narrative Classification of Climate-related Claims
arxiv-2603.26449 Sparse Blocked context only

Mar 27, 2026
CALRK-Bench: Evaluating Context-Aware Legal Reasoning in Korean Law
arxiv-2603.26332 Sparse Blocked context only

Mar 27, 2026
From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs
arxiv-2603.26323 Sparse Blocked context only

Mar 27, 2026
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
arxiv-2604.02368 Sparse Blocked context only

Mar 27, 2026
Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents
arxiv-2603.26233 Sparse Blocked context only

Mar 27, 2026
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models
arxiv-2603.26164 Direct Blocked context only

Mar 27, 2026
Vega: Learning to Drive with Natural Language Instructions
arxiv-2603.25741 Sparse Blocked context only

Mar 26, 2026
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
arxiv-2603.25730 Sparse Blocked context only

Mar 26, 2026
R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
arxiv-2603.25720 Sparse Blocked context only

Mar 26, 2026
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
arxiv-2603.25716 Direct Blocked context only

Mar 26, 2026
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
arxiv-2603.25702 Sparse Blocked context only

Mar 26, 2026
Self-Improvement of Large Language Models: A Technical Overview and Future Outlook
arxiv-2603.25681 Sparse Blocked context only

Mar 26, 2026
Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
arxiv-2603.25633 Sparse Blocked context only

Mar 26, 2026
PICon: A Multi-Turn Interrogation Framework for Evaluating Persona Agent Consistency
arxiv-2603.25620 Sparse Blocked context only

Mar 26, 2026
Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification
arxiv-2603.25613 Sparse Blocked context only

Mar 26, 2026
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
arxiv-2603.25562 Sparse Blocked context only

Mar 26, 2026
EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents
arxiv-2603.25498 Sparse Blocked context only

Mar 26, 2026
Cross-Model Disagreement as a Label-Free Correctness Signal
arxiv-2603.25450 Sparse Blocked context only

Mar 26, 2026
From Manipulation to Mistrust: Explaining Diverse Micro-Video Misinformation for Robust Debunking in the Wild
arxiv-2603.25423 Sparse Blocked context only

Mar 26, 2026
Navigating the Prompt Space: Improving LLM Classification of Social Science Texts Through Prompt Engineering
arxiv-2603.25422 Sparse Blocked context only

Mar 26, 2026
TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning
arxiv-2603.25419 Sparse Blocked context only

Mar 26, 2026
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
arxiv-2603.25412 Sparse Blocked context only

Mar 26, 2026
System Design for Maintaining Internal State Consistency in Long-Horizon Robotic Tabletop Games
arxiv-2603.25405 Sparse Blocked context only

Mar 26, 2026
Large Language Model as Token Compressor and Decompressor
arxiv-2603.25340 Sparse Blocked context only

Mar 26, 2026
AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study
arxiv-2603.25322 Sparse Blocked context only

Mar 26, 2026
Separate Before You Compress: The WWHO Tokenization Architecture
arxiv-2603.25309 Sparse Blocked context only

Mar 26, 2026
SliderQuant: Accurate Post-Training Quantization for LLMs
arxiv-2603.25284 Sparse Blocked context only

Mar 26, 2026
CRAFT: Grounded Multi-Agent Coordination Under Partial Information
arxiv-2603.25268 Sparse Blocked context only

Mar 26, 2026
MolQuest: A Benchmark for Agentic Evaluation of Abductive Reasoning in Chemical Structure Elucidation
arxiv-2603.25253 Sparse Blocked context only

Mar 26, 2026
Probabilistic Concept Graph Reasoning for Multimodal Misinformation Detection
arxiv-2603.25203 Sparse Blocked context only

Mar 26, 2026
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
arxiv-2603.25155 Sparse Blocked context only

Mar 26, 2026
OMIND: Framework for Knowledge Grounded Finetuning and Multi-Turn Dialogue Benchmark for Mental Health LLMs
arxiv-2603.25105 Sparse Blocked context only

Mar 26, 2026
An Explainable Ensemble Learning Framework for Crop Classification with Optimized Feature Pyramids and Deep Networks
arxiv-2603.25070 Sparse Blocked context only

Mar 26, 2026
TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization
arxiv-2603.25063 Sparse Blocked context only

Mar 26, 2026
Closing the Confidence-Faithfulness Gap in Large Language Models
arxiv-2603.25052 Sparse Blocked context only

Mar 26, 2026

All archive pages