Matched via arXiv identifier search · Strong overlap with paper title keywords
- Stars
- 361
- Last push
- Jun 16, 2026 (4d ago)
Risk flags
- No CI pipeline detected
- No tagged releases
- No Docker setup
Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao, Shiyang Feng, Zichen Liang, Boyuan Sun, Tianshuo Peng, Yifan Zhou, Xin Li, Jie Zhou, Liang He, Bo Zhang, Lei Bai
Core AI workload signals detected from paper context and implementation/artifact evidence.
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based ...
self-evolving multi-agent framework for end-to-end machine learning algorithm discovery. By extending tree search to Progressive MCGS, MLEvolve enables cross-branch information flow through graph-based reference edges and gradually shifts the search from broad exploration to focused exploitation with an entropy-inspired progressive schedule. To allow the agent to evolve with accumulated experience, we introduce Retrospective Memory, which combines a cold-start domain knowledge base with a dynamic global memory for task-specific experience retrieval and reuse. For stable long-horizon iteration, we further decouple strategic planning from code generation with adaptive coding modes. Evaluation on MLE-Bench shows that MLEvolve achieves state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate under a 12-hour budget (half the standard runtime). Moreover, MLEvolve also outperforms specialized algorithm discovery methods including AlphaEvolve on mathematical algorithm optimization tasks, demonstrating strong cross-domain generalization. Our code is available at https://github.com/InternScience/MLEvolve.
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability.
InternScience/MLEvolve is the best available implementation candidate based on ranking signals, but recommendation confidence is not yet high.
Open InternScience/MLEvolveHardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence graph: 3 refs, 3 links.
Utility signals: depth 100/100, grounding 85/100, status high.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search · Strong overlap with paper title keywords
Risk flags
MLEvolve is an open-source autonomous system for end-to-end machine learning algorithm design and optimization powered by progressive search and experience-driven memory.
Hardware requirements
No dependency manifest — manual reconstruction required
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
No trustworthy model matches right now.
Search models on Hugging FaceTasks
Agentic tool use, Retrieval / indexing
Methods
Transformer, Retrieval-augmented generation
Domains
Natural Language Processing, Large Language Models, AI Agents, Information Retrieval
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.