Skip to content

MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs

Guojiang Zhao, Zixiang Lu, Yutang Ge, Sihang Li, Zheng Cheng, +11 more

2025-08-04T05:10:11Z

Abstract

Large Language Models (LLMs) have shown impressive performance across various domains, but their ability to perform molecular reasoning remains underexplored. Existing methods mostly rely on general-purpose prompting, which lacks domain-specific molecular semantics, or fine-tuning, which faces challenges in interpretability and reasoning depth, often leading to structural and textual hallucinations. To address these issues, we introduce MolReasoner, a two-stage framework that transitions LLMs from memorization to high-fidelity chemical reasoning. In the Mol-SFT stage, knowledge-enhanced Chain-of-Thought (CoT) data provides a strong foundation, while the Mol-RL stage refines reasoning using a novel, task-adaptive reward system to mitigate hallucinations. Extensive evaluations demonstrate that MolReasoner significantly outperforms a wide range of strong baselines in both molecule generation and captioning tasks. Further analyses highlight the framework's synergistic design and its ability to produce more interpretable outputs. Our work presents a principled and effective new approach for advancing high-fidelity molecular reasoning.

Full analysis loading… Code implementations, benchmark data, and reproduction guides are being assembled. Please check back shortly.

Browse all papers

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.