Skip to content
implementation starting point
Benchmarks: missing
Time to repro: a few hours

Results & Benchmarks

Freshness tier: hot
Direct + Inferred Evidence

No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.

We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specific agents through experience.

Use This Implementation Because…

Confidence: medium

Mathews-Tom/armory is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. CI workflows are present. License is declared (MIT).

Open Mathews-Tom/armory

Reproduction Risks

  • No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.
Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 55/100, grounding 75/100, status medium.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

Mathews-Tom/armory
best maintained
Maintenance: Active
Confidence: Medium
Reproducibility: Strong

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars
251
Last push
Jun 5, 2026 (12d ago)
CIReleasesDependencies

Risk flags

  • No Docker setup
Maintenance: Active
Confidence: Medium
Reproducibility: Limited

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars
1,492
Last push
Jun 12, 2026 (6d ago)
ReleasesDependencies

Risk flags

  • No CI pipeline detected
  • No Docker setup
Maintenance: Active
Confidence: Low
Reproducibility: Moderate

Matched via arXiv identifier search

Stars
4
Last push
Jun 15, 2026 (3d ago)
DockerfileReleasesDependencies

Risk flags

  • No CI pipeline detected
  • Low confidence match

Best implementation now

Mathews-Tom/armory
Confidence: Medium
Reproducibility: Strong

Curated, production-grade skills for AI coding agents. Battle-tested workflows for developers who use AI seriously.

Stars: 251
Forks: 37
Last push: Jun 5, 2026
License: MIT
Matched via arXiv identifier search
Strong overlap with paper title keywords
Community adoption signal (251 stars)
License ✓
CI ✓
Deps ✓
Docker –
  • Selected Mathews-Tom/armory as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction readiness

Ready to Run
Time to first repro: hours
Last checked: Jun 16, 2026

Ready to reproduce

  • · Clone Mathews-Tom/armory and install dependencies from pyproject.toml.
  • · CI pipeline detected — automated tests are in place.
  • · Last updated 12 days ago.
Open Mathews-Tom/armory

Quick start

git clone https://github.com/Mathews-Tom/armory.git
pip install -e .

No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.

Additional implementations

Official

No additional official repositories detected.

Community

These repositories had low-confidence matching signals and are hidden by default.

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Research context

Tasks

Instruction tuning, Agentic tool use

Methods

Reinforcement learning

Domains

Large Language Models, AI Agents

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.