OpenTrain AI
Maintained implementation availablepytorchPretrained Models Available

Visual Planning: Let's Think Only with Images

Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang +2 more

May 16, 2025arXiv: 2505.11409
2 repos321 stars~a few days to reproduce
arXiv PDF

Abstract

Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely on pure text as the medium for both expressing and structuring reasoning, even when visual information is present. In this work, we argue that language may not always be the most natural or effective modality for reason...

Results & Benchmarks

TaskDatasetMetricValue
Reinforcement learningxxx - DirectEM68.6
Reinforcement learningxxx - w/ CoordinatesEM74.4
Reinforcement learningxxx - w/ ASCIIEM73.1

Hardware Requirements

  • Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

[ICLR 2026 Oral] Visual Planning: Let's Think Only with Images

321 11 Feb 2026 MIT
License
CI
Deps
Docker
  • Selected yix8/visualplanning as the strongest maintained implementation for new work.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with yix8/visualplanning and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing

Additional Implementations

Official

No additional official repositories detected.

Community

  • yix8/VisualPlanningConfidence: low

    [ICLR 2026 Oral] Visual Planning: Let's Think Only with Images

    Stars: 321Forks: 11Last push: Feb 2026License: MIT

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.