ManiTaskGen: A Comprehensive Task Generator for Benchmarking and Improving Vision-Language Agents on Embodied Decision-Making

Submitted to CVPR 2026

ManiTaskGen is introduced as a novel system that addresses this limitation by automatically generating a comprehensive, diverse, and logically near-exhaustive set of mobile manipulation tasks for any given scene. This system provides a crucial resource for both the rigorous evaluation and iterative improvement of Vision-Language Agents (VLAs) on embodied decision-making.

You can find the website of this project at https://manitaskgen.github.io/

Recommended citation: Liu Dai* ,Haina Wang*, Weikang Wan, and Hao Su. (2025). "ManiTaskGen: A Comprehensive Task Generator for Benchmarking and Improving Vision-Language Agents on Embodied Decision-Making." arXiv preprint arXiv:2505.20726.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)