PrefDisco: Benchmarking Proactive Personalized Reasoning
Shuyue Stella Li, Avinandan Bose, Faeze Brahman, Simon Shaolei Du, Pang Wei Koh, Maryam Fazel, Yulia Tsvetkov · Sep 30, 2025 · Citations: 0
Data freshness
Extraction: FreshCheck recency before relying on this page for active eval decisions. Use stale pages as context and verify against current hub results.
Metadata refreshed
Mar 4, 2026, 9:43 PM
RecentExtraction refreshed
Mar 8, 2026, 2:53 AM
FreshExtraction source
Persisted extraction
Confidence 0.70
Abstract
Current large language model (LLM) development treats task-solving and preference-alignment as separate challenges, optimizing first for objective correctness, then for alignment to aggregated human preferences. This paradigm fails in human-facing applications where solving a problem correctly is insufficient if the response mismatches the user's needs. This challenge intensifies in just-in-time scenarios where no prior user interaction history exists due to cold-start conditions or privacy constraints. LLMs need to proactively identify what they don't know about the user, strategically elicit preference values through questioning, then adapt their reasoning processes and responses accordingly -- a complicated chain of cognitive processes which we term personalized reasoning. We introduce PrefDisco, an evaluation methodology that transforms static benchmarks into interactive personalization tasks using psychologically-grounded personas with sparse, context-dependent preferences, and define PrefAlign as a fine-grained rubric-based metric for measuring preference alignment. PrefDisco builds scenarios where identical questions require different reasoning chains depending on user context, as optimal explanation approaches vary by individual expertise and preferences while maintaining factual accuracy. Evaluation of 21 frontier models across 10 tasks reveals 29.0% of naive personalization attempts produce worse preference alignment than generic responses, yet generic responses also fail to serve individual user needs. These findings suggest personalized reasoning requires dedicated development rather than emerging naturally. PrefDisco provides a foundation for developing systems that can adapt to individual users in education, healthcare, and technical domains where personalization is critical.