D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models
Shunsuke Ubukata · Feb 25, 2026
Citations: 0
Automatic Metrics Long Horizon General
OpenTrain Research Tools
A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.
Shunsuke Ubukata · Feb 25, 2026
Isaac Picov, Ritesh Goru · Feb 6, 2026
Bo-Wei Chen, Chung-Chi Chen, An-Zi Yen · Feb 25, 2026
David Acuna, Chao-Han Huck Yang, Yuntian Deng, Jaehun Jung, Ximing Lu, Prithviraj Ammanabrolu · Nov 7, 2025