Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Generation | abz07 | Dynamic Gap. | 0.46 |
Best Implementation
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
36 6 Dec 2025
License –
CI –
Deps ✓
Docker –
- Selected genglongling/realm-bench as the strongest maintained implementation for new work.
- Includes dependency/environment manifest signals.
- Repository activity is within the last 24 months.
Reproduction Path
- 1
Start with genglongling/realm-bench and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Time to first repro: a few hoursLicense metadata missingNo CI workflows detected
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No direct paper-linked artifacts were found. Showing strongest curated related artifacts.
Curated Related