Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization

Yixuan He, Xing Xu 0001, Xin Liu, Weihua Ou, Huimin Lu. Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization. In 2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, July 5-9, 2021. pages 1-6, IEEE, 2021. [doi]

Abstract

Abstract is missing.