HFEPX Archive Slice
HFEPX Daily Papers for 2026-05-16
Daily archive slice for 2026-05-16 from the HFEPX corpus. Updated from current HFEPX corpus (2026-06-07); covers 2 papers from 2026-05-16.
HFEPX Archive Slice
Daily archive slice for 2026-05-16 from the HFEPX corpus. Updated from current HFEPX corpus (2026-06-07); covers 2 papers from 2026-05-16.
Use this archive page for time-slice monitoring (what changed in evaluation methods, metrics, and protocol quality this period). Quality band: Developing .
High-Signal Coverage
100.0%
2 / 2 papers are not low-signal flagged.
Benchmark Anchors
0.0%
Papers with benchmark/dataset mentions in extraction output.
Metric Anchors
100.0%
Papers with reported metric mentions in extraction output.
Primary action: Use this slice as early signal only; benchmark/metric anchoring is limited for rigorous period-over-period claims.
Get this digest every Friday →
SubscribeRanked by protocol completeness and evidence density for faster period-over-period review.
May 16, 2026 · Citations: 0 · Score: 6.0
Eval: Automatic Metrics · Metrics: F1
May 16, 2026 · Citations: 0 · Score: 5.0
Eval: Automatic Metrics · Metrics: Accuracy, Recall
Quickly compare method ingredients across this archive slice.
| Paper | Eval Modes | Benchmarks | Metrics | Quality Controls |
|---|---|---|---|---|
| Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution May 16, 2026 | Automatic Metrics | Not reported | F1 | Calibration |
| MixSD: Mixed Contextual Self-Distillation for Knowledge Injection May 16, 2026 | Automatic Metrics | Not reported | Accuracy, Recall | Not reported |
Gap: Human feedback
Human feedback is present in 0 of 2 papers.
Strong: Quality controls
Quality controls is present in 1 of 2 papers.
Gap: Benchmarks
Benchmarks is present in 0 of 2 papers.
Strong: Metrics
Metrics is present in 2 of 2 papers.
Strong: Known rater population
Known rater population is present in 1 of 2 papers.
Strong: Known annotation unit
Known annotation unit is present in 1 of 2 papers.
Evaluation Modes
Top Metrics
Top Benchmarks
Quality Controls
Antoine Bourgois, Olga Seminck, Thierry Poibeau · May 16, 2026 · Citations: 0
Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
Jiarui Liu, Lechen Zhang, Yongjin Yang, Yinghui He, Yingheng Wang · May 16, 2026 · Citations: 0
We argue this forgetting arises because fine-tuning targets from humans or external systems diverge from the model's autoregressive distribution, forcing the optimizer to imitate low-probability token sequences.