Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions Through Masked Modeling

Shentong Mo, Pedro Morgado 0001. Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions Through Masked Modeling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 27176-27186, IEEE, 2024. [doi]

Abstract

Abstract is missing.