Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
Mihir Panchal, Deeksha Varshney, Mamta, Asif Ekbal · Jan 29, 2026 · Citations: 0
How to use this paper page
Coverage: StaleUse this page to decide whether the paper is strong enough to influence an eval design. It summarizes the abstract plus available structured metadata. If the signal is thin, use it as background context and compare it against stronger hub pages before making protocol choices.
Best use
Background context only
Metadata: StaleTrust level
Low
Signals: StaleWhat still needs checking
Extraction flags indicate low-signal or possible false-positive protocol mapping.
Signal confidence: 0.25
Abstract
Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically rich, low resource languages. Our results provide crucial insights into the layer-wise semantic encoding of multilingual transformers. Our model is available at https://huggingface.co/spaces/MihirRajeshPanchal/IndicTunedLens. Our code is available at https://github.com/MihirRajeshPanchal/IndicTunedLens.