Are there pretrained models available for "Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation"?

Yes, 1 Hugging Face model found. The top result is Eculid/HealthJudge with 31 downloads.

Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation

Q: How reproducible is "Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Jiaying Wu, Zihang Fu, Haonan Wang, Fanxiao Li, Jiafeng Guo, Preslav Nakov, Min-Yen Kan

Published: Oct 13, 2025

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-adjacent

Verified repos: 0

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Time to first repro: a few days

2 risk flags

arXiv PDF

Community Notes, the crowd-sourced misinformation governance system on X (formerly Twitter), allows users to flag misleading posts, attach contextual notes, and rate the notes' helpfulness. However, our empirical analysis of 30.8K health-related notes reveals substantial latency, with a median delay of 17.6 hours before notes receive a helpfulness status. To improve responsiveness during real-world misinformation sur ...

Read full abstract

ges, we propose CrowdNotes+, a unified LLM-based framework that augments Community Notes for faster and more reliable health misinformation governance. CrowdNotes+ integrates two modes: (1) evidence-grounded note augmentation and (2) utility-guided note automation, supported by a hierarchical three-stage evaluation of relevance, correctness, and helpfulness. We instantiate the framework with HealthNotes, a benchmark of 1.2K health notes annotated for helpfulness, and a fine-tuned helpfulness judge. Our analysis first uncovers a key loophole in current crowd-sourced governance: voters frequently conflate stylistic fluency with factual accuracy. Addressing this via our hierarchical evaluation, experiments across 15 representative LLMs demonstrate that CrowdNotes+ significantly outperforms human contributors in note correctness, helpfulness, and evidence utility.

Technical details

Canonical key: arxiv-2510.11423

Cache status: Stale (SWR served)

Generated at: Jun 16, 2026, 10:47 PM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Natural language processing

GPT-4.1

Macro-F1

74.28

Source: paper fulltext

Natural language processing

Gemini-2.5-flash

Macro-F1

68.36

Source: paper fulltext

Natural language processing

Claude-Sonnet-4

Macro-F1

78.14

Source: paper fulltext

Benchmark evidence drill-down

3 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	GPT-4.1	Macro-F1	74.28	paper-derived	No explicit refs
Natural language processing	Gemini-2.5-flash	Macro-F1	68.36	paper-derived	No explicit refs
Natural language processing	Claude-Sonnet-4	Macro-F1	78.14	paper-derived	No explicit refs

Community Notes, the crowd-sourced misinformation governance system on X (formerly Twitter), allows users to flag misleading posts, attach contextual notes, and rate the notes' helpfulness.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 95/100, grounding 78/100, status high.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: Eculid/HealthJudge