Evolutionary System Prompt Learning for Reinforcement Learning in LLMs
Lunjun Zhang, Ryan Chen, Bradly C. Stadie · Feb 16, 2026 · Citations: 0
How to use this page
Coverage: StaleUse this page to decide whether the paper is strong enough to influence an eval design. If the signals below are thin, treat it as background context and compare it against the stronger hub pages before making protocol choices.
Paper metadata checked
Feb 25, 2026, 3:50 AM
StaleProtocol signals checked
Feb 25, 2026, 3:50 AM
StaleSignal strength
Low
Model confidence 0.45
Abstract
Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL samples trajectories under multiple system prompts in parallel, then jointly applies RL updates to LLM weights and evolutionary updates to system prompts. System prompts evolve via mutation and crossover, two genetic operators driven by LLM self-reflection; selection is based on relative performance ratings updated across RL iterations. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization setting, E-SPL improves RL success rate from 38.8% $\rightarrow$ 45.1% while also outperforming reflective prompt evolution (40.0%). Overall, our results demonstrate that RL and system prompt evolution are deeply synergistic, and combining the two yields consistent gains in sample efficiency and generalization. Code: https://github.com/LunjunZhang/E-SPL