Are there pretrained models available for "IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning"?

Yes, 2 Hugging Face models found. The top result is PKU-Alignment/alpaca-7b-reproduced with 3,333 downloads.

IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning

Q: How reproducible is "IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Aayush Mishra, Daniel Khashabi, Anqi Liu

Published: Sep 26, 2025

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-core

Verified repos: 0

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few days

2 risk flags

arXiv PDF

Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries. In contrast, In-Context Learning (ICL) adapts models during inference with instructions or demonstrations in the prompt. ICL can offer better generalizability and more calibrated responses compared to SFT in data scarce settings, at the cost of more inference compute. In this work, w ...

Read full abstract

e ask the question: Can ICL's internal computations be used to improve the qualities of SFT? We first show that ICL and SFT produce distinct activation patterns, indicating that the two methods achieve adaptation through different functional mechanisms. Motivated by this observation and to use ICL's rich functionality, we introduce ICL Activation Alignment (IA2), a self-distillation technique which aims to replicate ICL's activation patterns in SFT models and incentivizes ICL-like internal reasoning. Performing IA2 as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families. This finding is not only practically useful, but also offers a conceptual window into the inner mechanics of model adaptation.

Technical details

Canonical key: arxiv-2509.22621

Cache status: Fresh

Generated at: Apr 18, 2026, 3:03 PM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: ready

LLM model: openai/gpt-5.1-20251113

LLM generated: Apr 16, 2026, 6:17 AM

LLM content type: researcher_benchmark_brief

HF policy: hf-relevance-v27

LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_table_1], evidencePack.paperSections[id=paper_table_5], evidencePack.paperSections[id=paper_19], evidencePack.paperSections[id=paper_table_7], evidencePack.paperSections[id=paper_caption_8], evidencePack.paperSections[id=paper_caption_9], evidencePack.paperSections[id=paper_caption_11], researcherSummary.benchmarkSnapshot[0], guidance.riskFlags[0], researcherSummary.reproductionRis

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Instruction tuning

GSM8K

Gap wrt ICL

-06.6

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Instruction tuning	GSM8K	Gap wrt ICL	-06.6	paper-derived	No explicit refs

Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 95/100, grounding 78/100, status high.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: PKU-Alignment/alpaca-7b-reproduced