No verified implementation yet

MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation

Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani +1 more

February 18, 2026arXiv: 2602.16898

0 repos~a few days to reproduce

Abstract

Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings.MALLVi present a Multi Agent Large Language and Vision framework that enables closed loop feedback driven robotic manipulation. Gi...

Summary

MALLVi introduces a multi-agent LLM/VLM pipeline for robotic manipulation that splits.

Key Contributions

MALLVi is a multi-agent large language and vision framework for robotic manipulation that takes a natural language instruction and a scene image, decomposes the task, localizes objects, plans atomic actions, executes.
The MALLVi architecture coordinates specialized agents—Decomposer, Localizer, Thinker, and Reflector, plus an optional Descriptor agent—to divide perception, localization, high-level reasoning, and error recovery.
The Reflector agent in MALLVi performs targeted error detection and recovery by selectively reactivating only the relevant upstream agents instead of triggering a full task replanning cycle, enabling efficient.

Reproducibility Notes

Estimate is based on paper-only reproduction flow.

Results & Benchmarks

Task	Dataset	Metric	Value
Instruction tuning	MATH	Put Shape	65
Instruction tuning	PerAct	Put in Drawer	68
Instruction tuning	Single-Agent	Put in Drawer	73

Hardware Requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Maintained implementation evidence is not confirmed for this paper yet.

Use the Implementation Status and Reproduction Path sections below for the current action plan.

Reproduction Path

Follow this baseline workflow to decide if this paper is worth immediate prototyping.

1
Use the paper and benchmark evidence to scope a baseline reproduction plan.
2
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few daysEstimate is based on paper-only reproduction flow

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2602.16898 MALLVI Multi-Agent

datasets

arxiv:2602.16898 MALLVI dataset Instruction tuning dataset

spaces

arxiv:2602.16898 MALLVI demo Instruction tuning demo

Research Context