Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis

Tongtong Su, Chengyu Wang 0001, Bingyan Liu, Jun Huang, Dongming Lu. Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pages 18209-18218, Computer Vision Foundation / IEEE, 2025. [doi]

Abstract

Abstract is missing.