- BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng · May 18, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming
Sen Fang, Weiyuan Ding, Mengshi Zhang, Zihao Chen, Bowen Xu · May 18, 2025 · Citations: 0
However, adversarial attacks exhibit fundamental limitations that compromise fair robustness assessment: they demonstrate contradictory evaluation outcomes where different attack strategies tend to favor different models, and more…
- EAMET: Robust Massive Model Editing via Embedding Alignment Optimization
Yanbo Dai, Zhenlan Ji, Zongjie Li, Shuai Wang · May 17, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Visual Planning: Let's Think Only with Images
Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang · May 16, 2025 · Citations: 0
Web Browsing
In this paradigm, planning is executed via sequences of images that encode step-by-step inference in the visual domain, akin to how humans sketch or visualize future actions.
- CodePDE: An Inference Framework for LLM-driven PDE Solver Generation
Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski · May 13, 2025 · Citations: 0
With CodePDE, we present a thorough evaluation on critical capacities of LLM for PDE solving: reasoning, debugging, self-refinement, and test-time scaling.
- Benchmarking Retrieval-Augmented Generation for Chemistry
Xianrui Zhong, Bowen Jin, Siru Ouyang, Yanzhen Shen, Qiao Jin · May 12, 2025 · Citations: 0
Despite its promise, the application of RAG in the chemistry domain remains underexplored, primarily due to the lack of high-quality, domain-specific corpora and well-curated evaluation benchmarks.
- Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications
Zhanliang Wang, Da Wu, Quan Nguyen, Zhuoran Xu, Kai Wang · May 9, 2025 · Citations: 0
Pairwise Preference
To address this challenge, we introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal large decoder models with domain-specific decision patterns from multimodal biomedical data through preference…
- Scalable LLM Reasoning Acceleration with Low-rank Distillation
Harry Dong, Bilge Acun, Beidi Chen, Yuejie Chi · May 8, 2025 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
- Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji · May 7, 2025 · Citations: 0
Demonstrations Long Horizon
The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors.
- ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev, Ammar Ali, Magauiya Zhussip, Valentin Malykh, Stamatios Lefkimmiatis · May 5, 2025 · Citations: 0
Applied to several large language models (LLMs), ReplaceMe achieves up to 25\% pruning while retaining approximately 90\% of the original model's performance on open benchmarks - without any training or healing steps, resulting in minimal…