Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 89
- Last push
- Sep 17, 2024 (639d ago)
Risk flags
- No push in 12+ months
- No CI pipeline detected
- No tagged releases
Core AI workload signals detected from paper context and implementation/artifact evidence.
No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval focuses on machine translation.
ntunlp/execeval is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. License is declared (MIT).
Open ntunlp/execevalLLM evidence refs: paper.title, repos[0].fullName, summary.hasReliableImplementation
Evidence graph: 4 refs, 4 links.
Utility signals: depth 60/100, grounding 85/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Repository link is mentioned in the paper metadata · Community adoption signal (64 stars)
Risk flags
Community adoption signal (70 stars)
Risk flags
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
Preserved for provenance. Not recommended as the default path for new builds.
No dependency manifest — manual reconstruction required
No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
No trustworthy demo spaces right now.
Search spaces on Hugging FaceTasks
Machine translation, Retrieval / indexing
Methods
Retrieval-augmented generation
Domains
Natural Language Processing, Information Retrieval
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.