Decomposing Physician Disagreement in HealthBench
Satya Borgohain, Roy Mariathas · Feb 26, 2026
Citations: 0
Rubric Rating Medicine
- We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it.
- The agreement ceiling in medical AI evaluation is thus largely structural, but the reducible/irreducible dissociation suggests that closing information gaps in evaluation scenarios could lower disagreement where inherent clinical ambiguity…