Skip to content
← Back to explorer

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks

Shiqian Su, Sen Xing, Xuan Dong, Muyan Zhong, Bin Wang, Xizhou Zhu, Yuntao Chen, Wenhai Wang, Yue Deng, Pengxiang Zhu, Ziyuan Liu, Tiantong Li, Jiaheng Yu, Zhe Chen, Lidong Bing, Jifeng Dai · Feb 26, 2026 · Citations: 0

Abstract

Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs have begun to plateau when tackling real-world, complex tasks that require interaction with external tools and dynamic environments. Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance, limited support across diverse benchmarks and tasks, and heavy reliance on costly commercial APIs. In this work, we propose a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow execution to ensure stable and reproducible performance. Extensive experiments demonstrate that MiroFlow consistently achieves state-of-the-art performance across multiple agent benchmarks, including GAIA, BrowseComp-EN/ZH, HLE, xBench-DeepSearch, and notably FutureX. We hope it could serve as an easily accessible, reproducible, and comparable baseline for the deep research community.

HFEPX Relevance Assessment

This paper appears adjacent to HFEPX scope (human-feedback/eval), but does not show strong direct protocol evidence in metadata/abstract.

Eval-Fit Score

0/100 • Low

Treat as adjacent context, not a core eval-method reference.

Human Feedback Signal

Not explicit in abstract metadata

Evaluation Signal

Detected

HFEPX Fit

Adjacent candidate

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: General
  • Extraction source: Runtime deterministic fallback

Evaluation Lens

  • Evaluation modes:
  • Agentic eval: Tool Use, Multi Agent
  • Quality controls: Not reported
  • Confidence: 0.25
  • Flags: low_signal, possible_false_positive, runtime_fallback_extraction

Protocol And Measurement Signals

Benchmarks / Datasets

GAIABrowseCompHLE

Reported Metrics

No metric terms were extracted from the available abstract.

Research Brief

Deterministic synthesis

Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance, limited support across diverse benchmarks and tasks, and heavy… HFEPX signals include Tool Use, Multi Agent with confidence 0.25. Updated from current HFEPX corpus.

Generated Mar 2, 2026, 10:05 PM · Grounded in abstract + metadata only

Key Takeaways

  • Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance,…
  • In this work, we propose a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep…

Researcher Actions

  • Treat this as method context, then pivot to protocol-specific HFEPX hubs.
  • Cross-check benchmark overlap: GAIA, BrowseComp, HLE.
  • Verify metric definitions before comparing against your eval pipeline.

Caveats

  • Generated from title, abstract, and extracted metadata only; full-paper implementation details are not parsed.
  • Low-signal flag detected: protocol relevance may be indirect.

Research Summary

Contribution Summary

  • Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance, limited support across diverse benchmarks and tasks, and heavy…
  • In this work, we propose a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow…
  • Extensive experiments demonstrate that MiroFlow consistently achieves state-of-the-art performance across multiple agent benchmarks, including GAIA, BrowseComp-EN/ZH, HLE, xBench-DeepSearch, and notably FutureX.

Why It Matters For Eval

  • Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance, limited support across diverse benchmarks and tasks, and heavy…
  • In this work, we propose a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow…

Researcher Checklist

  • Gap: Human feedback protocol is explicit

    No explicit human feedback protocol detected.

  • Gap: Evaluation mode is explicit

    No clear evaluation mode extracted.

  • Gap: Quality control reporting appears

    No calibration/adjudication/IAA control explicitly detected.

  • Pass: Benchmark or dataset anchors are present

    Detected: GAIA, BrowseComp, HLE

  • Gap: Metric reporting is present

    No metric terms extracted.

Related Papers

Papers are ranked by protocol overlap, extraction signal alignment, and semantic proximity.

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.