Surgical Post-Training: Cutting Errors, Keeping Knowledge

HFEPX Relevance Assessment

This paper has direct human-feedback and/or evaluation protocol signal and is likely useful for eval pipeline design.

Eval-Fit Score

65/100 • Medium

Useful as a secondary reference; validate protocol details against neighboring papers.

Human Feedback Signal

Detected

Evaluation Signal

Detected

HFEPX Fit

High-confidence candidate

If you are doing eval pipeline work, start here:

Human Eval Hub LLM-as-Judge Hub Pairwise Preference Hub Tool-Use Eval Hub

Protocol And Measurement Signals

Benchmarks / Datasets

No benchmark or dataset names were extracted from the available abstract.

Reported Metrics

accuracy

Research Brief

Deterministic synthesis

While prior research emphasizes the role of on-policy data in mitigating forgetting, we uncover--and validate both theoretically and empirically--an overlooked yet critical mechanism: the implicit regularization inherent in Direct… HFEPX signals include Pairwise Preference, Automatic Metrics with confidence 0.70. Updated from current HFEPX corpus.

Generated Mar 5, 2026, 3:24 AM · Grounded in abstract + metadata only

Key Takeaways

While prior research emphasizes the role of on-policy data in mitigating forgetting, we uncover--and validate both theoretically and empirically--an overlooked yet critical…
Empirically, with only 4k rectified math data pairs, SPoT improves Qwen3-8B's accuracy by 6.2% on average across in-domain and OOD tasks, requiring merely 28 minutes of training on…

Researcher Actions

Compare its human-feedback setup against pairwise and rubric hubs.
Identify benchmark choices from full text before operationalizing conclusions.
Validate metric comparability (accuracy).

Caveats

Generated from title, abstract, and extracted metadata only; full-paper implementation details are not parsed.
Extraction confidence is probabilistic and should be validated for critical decisions.

Recommended Queries

human-eval protocol design pairwise preference data quality inter-rater agreement adjudication

Research Summary

Contribution Summary

While prior research emphasizes the role of on-policy data in mitigating forgetting, we uncover--and validate both theoretically and empirically--an overlooked yet critical mechanism: the implicit regularization inherent in Direct…
Empirically, with only 4k rectified math data pairs, SPoT improves Qwen3-8B's accuracy by 6.2% on average across in-domain and OOD tasks, requiring merely 28 minutes of training on 8x H800 GPUs.

Why It Matters For Eval

While prior research emphasizes the role of on-policy data in mitigating forgetting, we uncover--and validate both theoretically and empirically--an overlooked yet critical mechanism: the implicit regularization inherent in Direct…

Researcher Checklist

Pass: Human feedback protocol is explicit

Detected: Pairwise Preference
Pass: Evaluation mode is explicit

Detected: Automatic Metrics
Gap: Quality control reporting appears

No calibration/adjudication/IAA control explicitly detected.
Gap: Benchmark or dataset anchors are present

No benchmark/dataset anchor extracted from abstract.
Pass: Metric reporting is present

Detected: accuracy

Related Papers

Papers are ranked by protocol overlap, extraction signal alignment, and semantic proximity.

Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences Protocol Overlap

Citations: 0 Relevance: 7.70 Shared tag: Pairwise PreferenceShared tag: Math
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
- Shared metric mentions
Cold-Start Personalization via Training-Free Priors from Structured World Models Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Pairwise PreferenceShared tag: Math
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Pairwise PreferenceShared tag: Math
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Pairwise PreferenceShared tag: Math
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Pairwise PreferenceShared tag: Math
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning Protocol Overlap

Citations: 0 Relevance: 5.00 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
- Shared metric mentions
CAMEL: Confidence-Gated Reflection for Reward Modeling Protocol Overlap

Citations: 0 Relevance: 5.00 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
- Shared metric mentions
DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs Protocol Overlap

Citations: 0 Relevance: 5.00 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
- Shared metric mentions
Modeling Distinct Human Interaction in Web Agents Protocol Overlap

Citations: 0 Relevance: 5.00 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
- Shared metric mentions
A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment Protocol Overlap

Citations: 0 Relevance: 4.10 Shared tag: Pairwise Preference
- Shared HFEPX protocol tags
- Aligned human feedback protocol