OpenTrain AI
Maintained implementation availablenone

Evaluating LLM Reasoning in the Operations Research Domain with ORQA

December 1, 2024arXiv: 2412.17874
1 repo45 stars~a few hours to reproduce
arXiv PDF

Abstract

Best Implementation

[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate the knowledge and reasoning skills of OR experts when presented with complex optimization modeling tasks.

45 2 Jun 2025
License
CI
Deps
Docker
  • Selected nl4opt/ORQA as the strongest maintained implementation for new work.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with nl4opt/ORQA and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursLicense metadata missingNo CI workflows detected

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.

Research Context