Zongyu Lin, Wei Liu, Chen Chen 0005, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice, Zhengfeng Lai, Liangchen Song, Bowen Zhang 0002, Cha Chen, Yiran Fei, Lezhi Li, Yinfei Yang, Yizhou Sun, Kai-Wei Chang 0001. STIV: Scalable Text and Image Conditioned Video Generation. In IEEE/CVF International Conference on Computer Vision, ICCV 2025, Honolulu, HI, USA, October 19-25, 2025. pages 16249-16259, IEEE, 2025. [doi]
Abstract is missing.