Matched via arXiv identifier search
- Stars
- 3
- Last push
- Mar 9, 2026 (1d ago)
Risk flags
- No tagged releases
- No Docker setup
- Low confidence match
Mihir Panchal, Deeksha Varshney, Mamta, Asif Ekbal
Core AI workload signals detected from paper context and implementation/artifact evidence.
Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that ...
learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically rich, low resource languages. Our results provide crucial insights into the layer-wise semantic encoding of multilingual transformers. Our model is available at https://huggingface.co/spaces/MihirRajeshPanchal/IndicTunedLens. Our code is available at https://github.com/MihirRajeshPanchal/IndicTunedLens.
Researcher verdict
This page is useful as a benchmark reference and for scoping a cautious reproduction plan, but there is not enough implementation evidence yet to treat it as a trusted build baseline.
Why this page is still worth reading
Benchmark trust
Concrete benchmark findings are present and can be audited against the extracted evidence.
Use this page as
Use this page to audit benchmark claims and scope a cautious reproduction plan.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Multiple-choice QA evaluation | MMLU Bengali subset | num_rows | 216 | llm-grounded | evidencePack.paperSections[id=paper_caption_5] |
| Multiple-choice QA evaluation | MMLU English subset | num_rows | 277 | llm-grounded | evidencePack.paperSections[id=paper_caption_5] |
| Multiple-choice QA evaluation | MMLU Gujarati subset | num_rows | 243 | llm-grounded | evidencePack.paperSections[id=paper_caption_5] |
| Multiple-choice QA evaluation | MMLU Hindi subset | num_rows | 235 | llm-grounded | evidencePack.paperSections[id=paper_caption_5] |
Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English.
No direct maintained repository implementation was found, but paper-linked Hugging Face artifacts are available.
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_caption_5], researcherSummary.implementationRecommendation, guidance.riskFlags[0], guidance.riskFlags[1], researcherSummary.reproductionRisks[1], researcherSummary.hardwareNotes[0], researcherSummary.timeToFirstMeaningfulRun, paper.title, summary.hasReliableImplementation
Evidence graph: 2 refs, 1 links.
Utility signals: depth 60/100, grounding 58/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search
Risk flags
Matched via arXiv identifier search
Risk flags
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
AI-generated summary grounded in paper metadata and artifact signals.
Indic-TunedLens is an interpretability framework for Indian languages that learns shared affine transformations to adjust hidden states before decoding intermediate activations. This page includes benchmark evidence for Multiple-choice QA evaluation on MMLU Bengali subset. Reproduction guidance focuses on implementation viability and concrete risk controls.
Follow this baseline workflow to decide if this paper is worth immediate prototyping.
Use the paper-linked Hugging Face release as the starting artifact, then reconstruct training and evaluation settings from the paper.
Use the paper and benchmark evidence to scope a baseline reproduction plan.
Start from this likely method family: Transformer.
Track assumptions and missing details in an experiment log before coding.
Framework baselines
Modern transformer training baseline.
Reference transformer building block implementation.
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No trustworthy model matches right now.
Search models on Hugging FaceNo trustworthy dataset matches right now.
Search datasets on Hugging FaceTasks
None detected
Methods
Transformer
Domains
Natural Language Processing
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.