IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes

Haochen Zhang 0001, Nader Zantout, Pujith Kachana, Ji Zhang 0003, Wenshan Wang. IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes. In IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, GA, USA, May 19-23, 2025. pages 1677-1683, IEEE, 2025. [doi]

Abstract

Abstract is missing.