Featured Papers
Popular high-signal papers with direct links to full protocol pages.
- Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
Apr 9, 2026 · Citations: 0
The advent of agentic multimodal models has empowered systems to actively interact with external environments.
- Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
Apr 9, 2026 · Citations: 0
Experiments on three multimodal MoE models across six benchmarks demonstrate consistent improvements, with gains of up to 3.17% on complex visual reasoning tasks.
- OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
Apr 9, 2026 · Citations: 0
Extensive evaluations across 18 diverse benchmarks demonstrate its superior performance over strong open-source and leading proprietary frontier models.
- AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
Apr 9, 2026 · Citations: 0
We introduce AVGen-Bench, a task-driven benchmark for T2AV generation featuring high-quality prompts across 11 real-world categories.
- Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models
Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
Apr 9, 2026 · Citations: 0
Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning.
- What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- ClawBench: Can AI Agents Complete Everyday Online Tasks?
Apr 9, 2026 · Citations: 0
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life?
- Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- What do Language Models Learn and When? The Implicit Curriculum Hypothesis
Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Differentially Private Language Generation and Identification in the Limit
Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing
Apr 9, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.