MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation
Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani +1 more
Abstract
Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings.MALLVi present a Multi Agent Large Language and Vision framework that enables closed loop feedback driven robotic manipulation. Gi...
Summary
MALLVi introduces a multi-agent LLM/VLM pipeline for robotic manipulation that splits.
Key Contributions
- MALLVi is a multi-agent large language and vision framework for robotic manipulation that takes a natural language instruction and a scene image, decomposes the task, localizes objects, plans atomic actions, executes.
- The MALLVi architecture coordinates specialized agents—Decomposer, Localizer, Thinker, and Reflector, plus an optional Descriptor agent—to divide perception, localization, high-level reasoning, and error recovery.
- The Reflector agent in MALLVi performs targeted error detection and recovery by selectively reactivating only the relevant upstream agents instead of triggering a full task replanning cycle, enabling efficient.
Reproducibility Notes
- Estimate is based on paper-only reproduction flow.
Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Instruction tuning | MATH | Put Shape | 65 |
| Instruction tuning | PerAct | Put in Drawer | 68 |
| Instruction tuning | Single-Agent | Put in Drawer | 73 |
Hardware Requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Best Implementation
Maintained implementation evidence is not confirmed for this paper yet.
Use the Implementation Status and Reproduction Path sections below for the current action plan.
Reproduction Path
Follow this baseline workflow to decide if this paper is worth immediate prototyping.
- 1
Use the paper and benchmark evidence to scope a baseline reproduction plan.
- 2
Track assumptions and missing details in an experiment log before coding.
Additional Implementations
No additional verified repositories beyond the primary recommendation.
Hugging Face Artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches: