Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

Yating Xu, Conghui Hu, Gim Hee Lee. Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024. pages 5603-5612, IEEE, 2024. [doi]

Abstract

Abstract is missing.