OpenTrain AI
No verified implementation yet

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang +6 more

April 6, 2026arXiv: 2604.04804
0 repos~a few days to reproduce
arXiv PDF

Abstract

Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a \textbf{plug-and-play ski...

Summary

SkillX is a fully automated framework that constructs a reusable, plug-and-play skill knowledge base for LLM agents by learning from past trajectories and sharing skills across agents and environments. This page includes benchmark evidence for Using only the multi-level skills design (Vanilla-Iter1) on GLM-4.6 improves BFC on GLM-4.6. Reproduction guidance focuses on implementation viability and concrete risk controls.

Key Contributions

  • SkillX is a fully automated framework that constructs a reusable, plug-and-play skill knowledge base for LLM agents by learning from past trajectories and sharing skills across agents and environments.
  • SkillX introduces a three-tier multi-level skills design that distills raw agent trajectories into hierarchical strategic plans, functional skills, and atomic skills to structure experience.
  • The framework performs iterative skills refinement by automatically revising skills based on execution feedback, continuously improving the quality of the skill library.
  • SkillX includes exploratory skills expansion that proactively generates and validates novel skills beyond the initial training data to broaden coverage of the skill library.
  • A skill library built with a strong backbone agent such as GLM-4.6 can be plugged into weaker base agents, yielding consistent gains in task success and execution efficiency on long-horizon benchmarks.

Reproducibility Notes

  • Estimate is based on paper-only reproduction flow.

Results & Benchmarks

TaskDatasetMetricValue
Agentic tool useQwen3-32BAppWorld73.33
Agentic tool useGLM-4.6AppWorld.83.33
Agentic tool useVanilla-Iter1AppWorld.62.35

Hardware Requirements

  • Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Maintained implementation evidence is not confirmed for this paper yet.

Use the Implementation Status and Reproduction Path sections below for the current action plan.

Reproduction Path

Follow this baseline workflow to decide if this paper is worth immediate prototyping.

  1. 1

    Use the paper and benchmark evidence to scope a baseline reproduction plan.

  2. 2

    Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few daysEstimate is based on paper-only reproduction flow

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.