Matched via arXiv identifier search
- Stars
- 3
- Last push
- Apr 14, 2026 (2d ago)
Risk flags
- No tagged releases
- No Docker setup
- Low confidence match
Keunhyeung Park, Seunguk Yu, Youngbin Kim
Core AI workload signals detected from paper context and implementation/artifact evidence.
Standard-to-dialect machine translation remains challenging due to a persistent dialect gap in large language models and evaluation distortions inherent in n-gram metrics, which favor source copying over authentic dialect translation. In this paper, we propose the dialect refinement (DIA-REFINE) framework, which guides LLMs toward faithful target dialect outputs through an iterative loop of translation, verification, ...
and feedback using external dialect classifiers. To address the limitations of n-gram-based metrics, we introduce the dialect fidelity score (DFS) to quantify linguistic shift and the target dialect ratio (TDR) to measure the success of dialect translation. Experiments on Korean dialects across zero-shot and in-context learning baselines demonstrate that DIA-REFINE consistently enhances dialect fidelity. The proposed metrics distinguish between False Success cases, where high n-gram scores obscure failures in dialectal translation, and True Attempt cases, where genuine attempts at dialectal translation yield low n-gram scores. We also observed that models exhibit varying degrees of responsiveness to the framework, and that integrating in-context examples further improves the translation of dialectal expressions. Our work establishes a robust framework for goal-directed, inclusive dialect translation, providing both rigorous evaluation and critical insights into model performance.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Machine translation | ZS + DIA-REFINE(S) | BLEU | 47.59 | paper-derived | No explicit refs |
| Machine translation | ZS + DIA-REFINE(M) | BLEU | 32.16 | paper-derived | No explicit refs |
Standard-to-dialect machine translation remains challenging due to a persistent dialect gap in large language models and evaluation distortions inherent in n-gram metrics, which favor source copying over authentic dialect translation.
Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence graph: 2 refs, 1 links.
Utility signals: depth 95/100, grounding 68/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search
Risk flags
Matched via arXiv identifier search
Risk flags
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
Hardware requirements
No verified implementation available
Framework baselines
Modern transformer training baseline.
Reference transformer building block implementation.
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Tasks
Machine translation
Methods
Transformer
Domains
Natural Language Processing
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.