Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

Q: What is the best open-source implementation of "Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic"?

The best maintained implementation is hiyouga/llama-factory with 70,769 stars on GitHub. Confidence: high. Reproducibility: Strong.

Q: How reproducible is "Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with hiyouga/llama-factory and validate setup instructions in README.

Q: Are there pretrained models available for "Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic"?

Yes, 3 Hugging Face models found. The top result is PKU-Alignment/beaver-dam-7b with 5,909 downloads.

Q: What framework is used to implement "Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic"?

The primary implementation uses pytorch.

Published: Feb 1, 2024

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 70,769

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: pytorch

Time to first repro: a few hours

No risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2402.11746

Cache status: Stale (SWR served)

Generated at: Apr 30, 2026, 10:40 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

pytorch

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Natural language processing

MATH

PEFT

46.39

Source: paper fulltext

Question answering

MATH

PEFT

46.39

Source: paper fulltext

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	MATH	PEFT	46.39	paper-derived	No explicit refs
Question answering	MATH	PEFT	46.39	paper-derived	No explicit refs

Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

hiyouga/llama-factory is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).

Open hiyouga/llama-factory

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 90/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

hiyouga/llama-factory

best maintained

Maintenance: Active

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 70,769
Last push: Apr 29, 2026 (3d ago)

CIReleasesDependencies

Risk flags

No Docker setup

declare-lab/resta

historical official

Maintenance: Stale

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 32
Last push: Mar 28, 2024 (765d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

declare-lab/red-instruct

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 111
Last push: Mar 8, 2024 (785d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

hiyouga/llama-factory

Confidence: High

Reproducibility: Strong

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Stars: 70,769

Forks: 8,643

Last push: Apr 29, 2026

License: Apache-2.0

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Community adoption signal (70769 stars)

License ✓

CI ✓

Deps ✓

Docker –

Selected hiyouga/llama-factory as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

declare-lab/resta

Stars: 32

Last push: Mar 28, 2024

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Apr 30, 2026

Ready to reproduce

· Clone hiyouga/llama-factory and install dependencies from pyproject.toml.
· CI pipeline detected — automated tests are in place.
· Last updated 3 days ago.

Open hiyouga/llama-factory

Quick start

git clone https://github.com/hiyouga/llama-factory.git
pip install -e .

Additional implementations

No additional verified repositories beyond the primary recommendation.

Possible but unverified matches (1)

These repositories had low-confidence matching signals and are hidden by default.

declare-lab/red-instruct

Confidence: Low

Stars: 111

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

PKU-Alignment/beaver-dam-7b

Curated Related

Downloads: 5,909

Likes: 17
PKU-Alignment/beaver-7b-v1.0-reward

Curated Related

Downloads: 1,014

Likes: 17
PKU-Alignment/beaver-7b-v1.0-cost

Curated Related

Downloads: 1,824

Likes: 10

Broaden model search

Transformer Natural language processing Transformer Natural Language Processing Natural language processing

Datasets

PKU-Alignment/PKU-SafeRLHF

Curated Related

Downloads: 14,305

Likes: 182

Updated: Oct 18, 2024
PKU-Alignment/BeaverTails

Curated Related

Downloads: 18,537

Likes: 102

Updated: Oct 17, 2023