Tanvir Mahmud, Diana Marculescu. AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, HI, USA, January 2-7, 2023. pages 5147-5156, IEEE, 2023. [doi]
Abstract is missing.