Skip to content

Researcher Tools

Human Feedback and Eval Paper Explorer

A focused feed for RLHF, preference data, rater protocols, agent evaluation, and LLM-as-judge research. Every paper includes structured metadata for quick triage.

Total papers: 1 Search mode: keyword RSS
BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Yunseung Lee, Subin Kim, Youngjun Kwak, Jaegul Choo ยท Feb 19, 2026

Citations: 0
Automatic Metrics Long Horizon Math
  • However, such errors have rarely been captured by existing benchmarks.
  • Mathematical datasets focus on fundamental math problems, whereas financial benchmarks primarily target financial documents, leaving everyday banking scenarios underexplored.

Protocol Hubs

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.