Implicit Bias-Like Patterns in Reasoning Models
Messi H. J. Lee, Calvin K. Lai
Abstract
Implicit biases refer to automatic mental processes that shape perceptions, judgments, and behaviors. Previous research on "implicit bias" in LLMs focused primarily on outputs rather than the processes underlying the outputs. We present the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models, LLMs that use step-by-step reasoning to solve complex tasks. Using RM-IAT, we find that reasoning models like o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B consistently expend more reasoning tokens on association-incompatible tasks than association-compatible tasks, suggesting greater computational effort when processing counter-stereotypical information. Conversely, Claude 3.7 Sonnet exhibited reversed patterns, which thematic analysis associated with its unique internal focus on reasoning about bias and stereotypes. These findings demonstrate that reasoning models exhibit distinct implicit bias-like patterns and that these patterns vary significantly depending on the models' internal reasoning content.
Full analysis loading… Code implementations, benchmark data, and reproduction guides are being assembled. Please check back shortly.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.