Are there pretrained models available for "Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis"?

Yes, 3 Hugging Face models found. The top result is EleutherAI/gpt-j-6b with 173,296 downloads.

What framework is used to implement "Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis"?

The primary implementation uses Hugging Face Transformers training guide.

Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis

Q: How reproducible is "Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Abdullah Al Mofael, Lisa M. Kuhn, Ghassan Alkadi, Kuo-Pao Yang

Published: Mar 12, 2026

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-adjacent

Verified repos: 0

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: Hugging Face Transformers training guide

Time to first repro: a few days

2 risk flags

arXiv PDF DOI

Negation remains a persistent challenge for modern language models, often causing reversed meanings or factual errors. In this work, we conduct a causal analysis of how GPT-2 Small internally processes such linguistic transformations. We examine its hidden representations at both the layer and head level. Our analysis is based on a self-curated 12,000-pair dataset of matched affirmative and negated sentences, coverin ...

Read full abstract

g multiple linguistic templates and forms of negation. To quantify this behavior, we define a metric, the Negation Effect Score (NES), which measures the model's sensitivity in distinguishing between affirmative statements and their negations. We carried out two key interventions to probe causal structure. In activation patching, internal activations from affirmative sentences were inserted into their negated counterparts to see how meaning shifted. In ablation, specific attention heads were temporarily disabled to observe how logical polarity changed. Together, these steps revealed how negation signals move and evolve through GPT-2's layers. Our findings indicate that this capability is not widespread; instead, it is highly concentrated within a limited number of mid-layer attention heads, primarily within layers 4 to 6. Ablating these specific components directly disrupts the model's negation sensitivity: on our in-domain, ablation increased NES (indicating weaker negation sensitivity), and re-introducing cached affirmative activations (rescue) increased NES further, confirming that these heads carry affirmative signal rather than restoring baseline behavior. On xNot360, ablation slightly decreased NES and rescue restored performance above baseline. This pattern demonstrates that these causal patterns are consistent across various negation forms and remain detectable on the external xNot360 benchmark, though with smaller magnitude.

Technical details

Canonical key: arxiv-2603.12423

Cache status: Stale (SWR served)

Generated at: Jun 16, 2026, 1:05 PM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: missing

Time to repro: a few days

2 risk flags

Hugging Face Transformers training guide

Results & Benchmarks

Freshness tier: warm

Direct + Inferred Evidence

No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.

Negation remains a persistent challenge for modern language models, often causing reversed meanings or factual errors.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 60/100, grounding 68/100, status medium.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: EleutherAI/gpt-j-6b

Reproduction readiness

No Repo

Time to first repro: days

Last checked: Jun 16, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No verified implementation available

· No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.

No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.

Framework baselines

Hugging Face Transformers training guide
Modern transformer training baseline.
PyTorch nn.Transformer docs
Reference transformer building block implementation.

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

EleutherAI/gpt-j-6b

Curated Related

Downloads: 173,296

Likes: 1,526
EleutherAI/gpt-neo-2.7B

Curated Related

Downloads: 631,611

Likes: 503
EleutherAI/gpt-neox-20b

Curated Related

Downloads: 517,033

Likes: 584

Broaden model search

Transformer Negation Transformer Natural language processing Negation

Datasets

No trustworthy dataset matches right now.

Search datasets on Hugging Face

Spaces

No trustworthy demo spaces right now.

Search spaces on Hugging Face

Explore on Hugging Face

Search models Search datasets Search spaces

Research context

Citations

References

Tasks

Negation, Meaning (existential), Linguistics, Psychology, Polarity (international relations), Causal chain, Cognitive psychology, Key (lock)

Methods

Transformer

Domains

Natural language processing, Computational Theory and Mathematics

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Negation Meaning (existential) Linguistics Psychology Polarity (international relations) Causal chain

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote