HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Q: What is the best open-source implementation of "HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models"?

The best maintained implementation is tianyi-lab/hallusionbench with 337 stars on GitHub. Confidence: high. Reproducibility: Limited.

Q: How reproducible is "HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models"?

Estimated time to first reproduction: a few days. Risk flags: No CI workflows detected, Dependency manifest is missing. Start with tianyi-lab/hallusionbench and validate setup instructions in README.

Q: What framework is used to implement "HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models"?

The primary implementation uses none.

Published: Oct 1, 2023

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 337

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: none

Time to first repro: a few days

2 risk flags

arXiv PDF

Technical details

Canonical key: arxiv-2310.14566

Cache status: Fresh

Generated at: May 9, 2026, 10:26 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: missing

Time to repro: a few days

2 risk flags

none

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models is the primary contribution described in this paper.

Use This Implementation Because…

Confidence: high

tianyi-lab/hallusionbench is the strongest maintained implementation based on ranking signals. License is declared (BSD-3-Clause).

Open tianyi-lab/hallusionbench

Reproduction Risks

No CI workflows detected
Dependency manifest is missing

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 65/100, grounding 75/100, status medium.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

tianyi-lab/hallusionbench

best maintained

Maintenance: Stale risk

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 337
Last push: Oct 14, 2025 (207d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

codelion/adaptive-classifier

alternative

Maintenance: Stale risk

Confidence: Low

Reproducibility: Strong

Community adoption signal (550 stars)

Stars: 550
Last push: Oct 7, 2025 (214d ago)

CIReleasesDependencies

Risk flags

No Docker setup
Low confidence match

FuxiaoLiu/LRV-Instruction

alternative

Maintenance: Stale

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search · Partial overlap with paper title keywords

Stars: 296
Last push: Mar 13, 2024 (787d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

tianyi-lab/hallusionbench

Confidence: High

Reproducibility: Limited

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Stars: 337

Forks: 9

Last push: Oct 14, 2025

License: BSD-3-Clause

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Partial overlap with paper title keywords

Community adoption signal (337 stars)

License ✓

CI –

Deps –

Docker –

Selected tianyi-lab/hallusionbench as the strongest maintained implementation for new work.
Repository activity is within the last 24 months.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: May 9, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· tianyi-lab/hallusionbench has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.
· Last push was 207 days ago.

Open tianyi-lab/hallusionbench

No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.

Additional implementations

Official

No additional official repositories detected.

Community

caoyunkang/GPT4V-for-Generic-Anomaly-Detection
Confidence: Medium

[CSCWD] Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead.

Stars: 131

Last push: Mar 4, 2025

License: MIT