Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Zhi Gao, Bofei Zhang, Pengxiang Li 0002, Xiaojian Ma 0001, Tao Yuan, Yue Fan, Yuwei Wu 0001, Yunde Jia, Song Chun Zhu, Qing Li 0003. Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.