What is the best open-source implementation of "Memento-Skills: Let Agents Design Agents"?

The best maintained implementation is Mathews-Tom/armory with 251 stars on GitHub. Confidence: medium. Reproducibility: Strong.

How reproducible is "Memento-Skills: Let Agents Design Agents"?

Estimated time to first reproduction: a few hours. No risk flags identified. Start with Mathews-Tom/armory and validate setup instructions in README.

Memento-Skills: Let Agents Design Agents

Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiangbin Liu, Xinlei Yu, Jianmin Zhou, Na Wang, Chunyang Sun, Jun Wang

Published: Mar 19, 2026

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 2

Top repo stars: 251

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few hours

No risk flags

arXiv PDF

We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with \emph{stateful prompts}, where reusable skills (stored as structured markdown files) serve as persistent, evolving ...

Read full abstract

memory. These skills encode both behaviour and context, enabling the agent to carry forward knowledge across interactions. Starting from simple elementary skills (like Web search and terminal operations), the agent continually improves via the \emph{Read--Write Reflective Learning} mechanism introduced in \emph{Memento~2}~\cite{wang2025memento2}. In the \emph{read} phase, a behaviour-trainable skill router selects the most relevant skill conditioned on the current stateful prompt; in the \emph{write} phase, the agent updates and expands its skill library based on new experience. This closed-loop design enables \emph{continual learning without updating LLM parameters}, as all adaptation is realised through the evolution of externalised skills and prompts. Unlike prior approaches that rely on human-designed agents, Memento-Skills enables a generalist agent to \emph{design agents end-to-end} for new tasks. Through iterative skill generation and refinement, the system progressively improves its own capabilities. Experiments on the \emph{General AI Assistants} benchmark and \emph{Humanity's Last Exam} demonstrate sustained gains, achieving 26.2\% and 116.2\% relative improvements in overall accuracy, respectively. Code is available at https://github.com/Memento-Teams/Memento-Skills.

Technical details

Canonical key: arxiv-2603.18743

Cache status: Stale (SWR served)

Generated at: Jun 16, 2026, 4:57 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: missing

Time to repro: a few hours

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.

Use This Implementation Because…

Confidence: medium

Mathews-Tom/armory is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high. CI workflows are present. License is declared (MIT).

Open Mathews-Tom/armory

Reproduction Risks

No repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 55/100, grounding 75/100, status medium.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

Mathews-Tom/armory

best maintained

Maintenance: Active

Confidence: Medium

Reproducibility: Strong

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 251
Last push: Jun 5, 2026 (12d ago)

CIReleasesDependencies

Risk flags

No Docker setup

Memento-Teams/Memento-Skills

alternative

Maintenance: Active

Confidence: Medium

Reproducibility: Limited

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 1,492
Last push: Jun 12, 2026 (6d ago)

ReleasesDependencies

Risk flags

No CI pipeline detected
No Docker setup

shimo4228/contemplative-agent

alternative

Maintenance: Active

Confidence: Low

Reproducibility: Moderate

Matched via arXiv identifier search

Stars: 4
Last push: Jun 15, 2026 (3d ago)

DockerfileReleasesDependencies

Risk flags

No CI pipeline detected
Low confidence match

Best implementation now

Mathews-Tom/armory

Confidence: Medium

Reproducibility: Strong

Curated, production-grade skills for AI coding agents. Battle-tested workflows for developers who use AI seriously.

Stars: 251

Forks: 37

Last push: Jun 5, 2026

License: MIT

Matched via arXiv identifier search

Strong overlap with paper title keywords

Community adoption signal (251 stars)

License ✓

CI ✓

Deps ✓

Docker –

Selected Mathews-Tom/armory as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction readiness

Ready to Run

Time to first repro: hours

Last checked: Jun 16, 2026

Ready to reproduce

· Clone Mathews-Tom/armory and install dependencies from pyproject.toml.
· CI pipeline detected — automated tests are in place.
· Last updated 12 days ago.

Open Mathews-Tom/armory

Quick start

git clone https://github.com/Mathews-Tom/armory.git
pip install -e .

No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.

Additional implementations

Official

No additional official repositories detected.

Community

Memento-Teams/Memento-Skills
Confidence: Medium

Memento-Skills: Let Agents Design Agents

Stars: 1,492

Last push: Jun 12, 2026

Possible but unverified matches (4)

These repositories had low-confidence matching signals and are hidden by default.

shimo4228/contemplative-agent

Confidence: Low

Stars: 4
henrique-simoes/Istara

Confidence: Low

Stars: 13
FluffyAIcode/openclaw-memory-pro-system

Confidence: Low

Stars: 8
krylov48/learn-memento-skills

Confidence: Medium

Stars: 1

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2603.18743 Memento-Skills Reinforcement learning

Datasets

arxiv:2603.18743 Memento-Skills dataset Reinforcement learning benchmark

Spaces

arxiv:2603.18743 Memento-Skills demo Reinforcement learning gradio

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Tasks

Instruction tuning, Agentic tool use

Methods

Reinforcement learning

Domains

Large Language Models, AI Agents

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Instruction tuning Agentic tool use Reinforcement learning Large Language Models AI Agents

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote