VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model

Beichen Wang, Juexiao Zhang, Shuwen Dong, Irving Fang, Chen Feng 0002. VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, October 19-25, 2025. pages 17215-17222, IEEE, 2025. [doi]

Abstract

Abstract is missing.