Uni-Sight: An E2E Vision-Language-Action System Unifying Multi-View Alignment and Multi-Modal Fusion

Daixun Li, Sibo He, Jiayun Tian, Yusi Zhang, Weiying Xie, Mingxiang Cao, Donglai Liu, Zirui Li, Tianlin Hui, Rui Huang, Yunsong Li 0001. Uni-Sight: An E2E Vision-Language-Action System Unifying Multi-View Alignment and Multi-Modal Fusion. In Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Luca Rossetto, Stevan Rudinac, Duc-Tien Dang-Nguyen, Wen-Huang Cheng, Phoebe Chen, Jenny Benois-Pineau, editors, Proceedings of the 33rd ACM International Conference on Multimedia, MM 2025, Dublin, Ireland, October 27-31, 2025. pages 7142-7151, ACM, 2025. [doi]

Abstract

Abstract is missing.