Skip to content
← Back to explorer

HFEPX Hub

CS.CV + Medicine Papers

Updated from current HFEPX corpus (Feb 27, 2026). 13 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Simulation Env. Most common rater population: Domain Experts. Frequently cited benchmark: Vbvr-Bench. Common metric signal: accuracy. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Feb 25, 2026.

Papers: 13 Last published: Feb 25, 2026 Global RSS Tag RSS
Cs.CVMedicine

Research Narrative

Grounded narrative Model: deterministic-grounded Source: persisted

Updated from current HFEPX corpus (Feb 27, 2026). This page tracks 13 papers for CS.CV + Medicine Papers. Dominant protocol signals include automatic metrics, simulation environments, with frequent benchmark focus on Vbvr-Bench and metric focus on accuracy, auroc. Use the grounded sections below to prioritize reproducible protocol choices, benchmark-matched comparisons, and judge-vs-human evaluation checks.

Why This Matters For Eval Research

Protocol Takeaways

Benchmark Interpretation

  • Vbvr-Bench appears in 7.7% of hub papers (1/13); use this cohort for benchmark-matched comparisons.

Metric Interpretation

  • accuracy is reported in 53.8% of hub papers (7/13); compare with a secondary metric before ranking methods.
  • auroc is reported in 7.7% of hub papers (1/13); compare with a secondary metric before ranking methods.

Researcher Checklist

  • Close gap on Papers with explicit human feedback. Coverage is a replication risk (15.4% vs 45% target).
  • Close gap on Papers reporting quality controls. Coverage is a replication risk (0% vs 30% target).
  • Close gap on Papers naming benchmarks/datasets. Coverage is a replication risk (7.7% vs 35% target).
  • Maintain strength on Papers naming evaluation metrics. Coverage is strong (61.5% vs 35% target).
  • Maintain strength on Papers with known rater population. Coverage is strong (38.5% vs 35% target).
  • Close gap on Papers with known annotation unit. Coverage is a replication risk (0% vs 35% target).

Papers with explicit human feedback

Coverage is a replication risk (15.4% vs 45% target).

Papers reporting quality controls

Coverage is a replication risk (0% vs 30% target).

Papers naming benchmarks/datasets

Coverage is a replication risk (7.7% vs 35% target).

Papers naming evaluation metrics

Coverage is strong (61.5% vs 35% target).

Papers with known rater population

Coverage is strong (38.5% vs 35% target).

Papers with known annotation unit

Coverage is a replication risk (0% vs 35% target).

Suggested Reading Order

  1. 1. SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

    Start here for detailed protocol reporting, including rater and quality-control evidence.

  2. 2. Following the Diagnostic Trace: Visual Cognition-guided Cooperative Network for Chest X-Ray Diagnosis

    Start here for detailed protocol reporting, including rater and quality-control evidence.

  3. 3. Virtual Biopsy for Intracranial Tumors Diagnosis on MRI

    Start here for detailed protocol reporting, including rater and quality-control evidence.

  4. 4. Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound

    Adds automatic metrics for broader coverage within this hub.

  5. 5. FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

    Adds automatic metrics for broader coverage within this hub.

  6. 6. XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence

    Adds automatic metrics for broader coverage within this hub.

  7. 7. MIP Candy: A Modular PyTorch Framework for Medical Image Processing

    Adds automatic metrics for broader coverage within this hub.

  8. 8. OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation

    Adds automatic metrics for broader coverage within this hub.

Known Limitations

  • Only 0% of papers report quality controls; prioritize calibration/adjudication evidence.
  • Annotation unit is under-specified (0% coverage).
  • Narrative synthesis is grounded in metadata and abstracts only; full-paper implementation details are not parsed.

Research Utility Links

automatic_metrics vs simulation_env

both=0, left_only=12, right_only=1

0 papers use both Automatic Metrics and Simulation Env.

Benchmark Brief

Vbvr-Bench

Coverage: 1 papers (7.7%)

1 papers (7.7%) mention Vbvr-Bench.

Examples: A Very Big Video Reasoning Suite

Metric Brief

auroc

Coverage: 1 papers (7.7%)

1 papers (7.7%) mention auroc.

Examples: MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

Metric Brief

coherence

Coverage: 1 papers (7.7%)

1 papers (7.7%) mention coherence.

Examples: KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification

Top Papers

Related Hubs