Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen. Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 7151-7161, IEEE, 2024. [doi]
Abstract is missing.