Skip to content
← Back to explorer

Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

Seong Hah Cho, Junyi Li, Anna Leshinskaya · Feb 22, 2026 · Citations: 0

Abstract

Value alignment of Large Language Models (LLMs) requires us to empirically measure these models' actual, acquired representation of value. Among the characteristics of value representation in humans is that they distinguish among value of different kinds. We investigate whether LLMs likewise distinguish three different kinds of good: moral, grammatical, and economic. By probing model behavior, embeddings, and residual stream activations, we report pervasive cases of value entanglement: a conflation between these distinct representations of value. Specifically, both grammatical and economic valuation was found to be overly influenced by moral value, relative to human norms. This conflation was repaired by selective ablation of the activation vectors associated with morality.

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: General

Evaluation Lens

  • Evaluation modes: Automatic Metrics
  • Agentic eval: None
  • Quality controls: Not reported
  • Confidence: 0.30
  • Flags: low_signal, possible_false_positive

Research Summary

Contribution Summary

  • Value alignment of Large Language Models (LLMs) requires us to empirically measure these models' actual, acquired representation of value.
  • Among the characteristics of value representation in humans is that they distinguish among value of different kinds.
  • We investigate whether LLMs likewise distinguish three different kinds of good: moral, grammatical, and economic.

Why It Matters For Eval

  • Among the characteristics of value representation in humans is that they distinguish among value of different kinds.
  • Specifically, both grammatical and economic valuation was found to be overly influenced by moral value, relative to human norms.

Related Papers