Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization

Yixuan He, Xing Xu 0001, Xin Liu, Weihua Ou, Huimin Lu. Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization. In 2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, July 5-9, 2021. pages 1-6, IEEE, 2021. [doi]

Authors

Yixuan He

This author has not been identified. Look up 'Yixuan He' in Google

Xing Xu 0001

This author has not been identified. Look up 'Xing Xu 0001' in Google

Xin Liu

This author has not been identified. Look up 'Xin Liu' in Google

Weihua Ou

This author has not been identified. Look up 'Weihua Ou' in Google

Huimin Lu

This author has not been identified. Look up 'Huimin Lu' in Google