Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling
Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, Iryna Gurevych · Oct 1, 2025 · Citations: 0
How to use this paper page
Coverage: RecentUse this page to decide whether the paper is strong enough to influence an eval design. It summarizes the abstract plus available structured metadata. If the signal is thin, use it as background context and compare it against stronger hub pages before making protocol choices.
Best use
Background context only
Metadata: RecentTrust level
Low
Signals: RecentWhat still needs checking
Extraction flags indicate low-signal or possible false-positive protocol mapping.
Signal confidence: 0.15
Abstract
The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior work has largely focused on identifying specific geometries for individual features, limiting its ability to generalize. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method for evaluating and comparing competing feature manifold hypotheses. We apply SMDS to temporal reasoning as a case study and find that different features instantiate distinct geometric structures, including circles, lines, and clusters. SMDS reveals several consistent characteristics of these structures: they reflect the semantic properties of the concepts they represent, remain stable across model families and sizes, actively support reasoning, and dynamically reshape in response to contextual changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.