No verified implementation yet

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang +6 more

April 6, 2026arXiv: 2604.04804

0 repos~a few days to reproduce

Abstract

Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a \textbf{plug-and-play ski...

Summary

SkillX is a fully automated framework that constructs a reusable, plug-and-play skill knowledge base for LLM agents by learning from past trajectories and sharing skills across agents and environments. This page includes benchmark evidence for Using only the multi-level skills design (Vanilla-Iter1) on GLM-4.6 improves BFC on GLM-4.6. Reproduction guidance focuses on implementation viability and concrete risk controls.

Key Contributions

SkillX is a fully automated framework that constructs a reusable, plug-and-play skill knowledge base for LLM agents by learning from past trajectories and sharing skills across agents and environments.
SkillX introduces a three-tier multi-level skills design that distills raw agent trajectories into hierarchical strategic plans, functional skills, and atomic skills to structure experience.
The framework performs iterative skills refinement by automatically revising skills based on execution feedback, continuously improving the quality of the skill library.
SkillX includes exploratory skills expansion that proactively generates and validates novel skills beyond the initial training data to broaden coverage of the skill library.
A skill library built with a strong backbone agent such as GLM-4.6 can be plugged into weaker base agents, yielding consistent gains in task success and execution efficiency on long-horizon benchmarks.

Reproducibility Notes

Estimate is based on paper-only reproduction flow.

Results & Benchmarks

Task	Dataset	Metric	Value
Agentic tool use	Qwen3-32B	AppWorld	73.33
Agentic tool use	GLM-4.6	AppWorld.	83.33
Agentic tool use	Vanilla-Iter1	AppWorld.	62.35

Hardware Requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Maintained implementation evidence is not confirmed for this paper yet.

Use the Implementation Status and Reproduction Path sections below for the current action plan.

Reproduction Path

Follow this baseline workflow to decide if this paper is worth immediate prototyping.

1
Use the paper and benchmark evidence to scope a baseline reproduction plan.
2
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few daysEstimate is based on paper-only reproduction flow

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2604.04804 SkillX Natural Language Processing

datasets

arxiv:2604.04804 SkillX dataset

spaces

arxiv:2604.04804 SkillX demo

Research Context