Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks
Jayadev Billa · Feb 17, 2026 · Citations: 0
Abstract
Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K--85M parameters), 120 task$\times$level$\times$ model combinations (119 achieving accuracy-based emergence) across eight algorithmic tasks, and three Pythia language models (160M--2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210$\times$ parameter range (e.g., modular arithmetic collapses to RANKME $\,\approx\,$2.0 regardless of model size); (2) collapse propagates top-down through layers (28/32 task$ \times $model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (100% precursor rate for hard tasks across all model sizes), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance ranges from 52% for easy tasks to 69% for hard tasks; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not, as the precursor relationship requires task--training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.