OpenTrain AI
Maintained implementation availablenone

Evaluating Large Language Models Trained on Code

July 1, 2021arXiv: 2107.03374
1 repo3,185 stars~a few hours to reproduce
arXiv PDF

Abstract

Results & Benchmarks

TaskDatasetMetricValue
Natural language processingCodex-S-12Bpass@132.2
Natural language processingCodex-D-12Bpass@120.3
Natural language processingHumanEvalBLEU0.8

Best Implementation

Code for the paper "Evaluating Large Language Models Trained on Code"

3.2k 442 Jan 2025 MIT
License
CI
Deps
Docker
  • Selected openai/human-eval as the strongest maintained implementation for new work.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with openai/human-eval and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo CI workflows detected

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.