Hongkai Wei, Yang Yang, Shijie Sun, Mingtao Feng, Xiangyu Song, Qi Lei, Hongli Hu, Rong Wang, Huansheng Song, Naveed Akhtar, Ajmal Saeed Mian. Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pages 13886-13896, Computer Vision Foundation / IEEE, 2025. [doi]
Abstract is missing.