Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
Shaina Raza, Rizwan Qureshi, Azib Farooq, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis · May 23, 2025 · Citations: 0
How to use this page
High trustUse this as a practical starting point for protocol research, then validate against the original paper.
Best use
Secondary protocol comparison source
What to verify
Validate the evaluation procedure and quality controls in the full paper before operational use.
Evidence quality
High
Derived from extracted protocol signals and abstract evidence.
Abstract
Large language models (LLMs) reproduce misinformation not by memorizing false facts alone, but by learning the linguistic patterns that make falsehoods persuasive, such as hedging, false presuppositions, and fabricated citations. We propose model immunization, a training paradigm based on supervised fine-tuning over curated (false claim, correction) pairs, injected as small vaccine doses (5 to 10% of tokens) alongside truthful data. Unlike post-hoc filtering or preference-based alignment, immunization introduces direct negative supervision on labeled falsehoods. Across four open weight model families, this approach improves TruthfulQA accuracy by 12 points and increases misinformation rejection rates by 30 points, while preserving overall model capability. We further outline key design requirements, including dosage, labeling, quarantine, and diversity and advocate for standardized vaccine corpora and benchmarks to evaluate generalization. These findings position immunization as a practical and scalable component of responsible LLM development.