Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization

Yixuan He, Xing Xu 0001, Xin Liu, Weihua Ou, Huimin Lu. Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization. In 2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, July 5-9, 2021. pages 1-6, IEEE, 2021. [doi]

@inproceedings{He0LOL21,
  title = {Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization},
  author = {Yixuan He and Xing Xu 0001 and Xin Liu and Weihua Ou and Huimin Lu},
  year = {2021},
  doi = {10.1109/ICME51207.2021.9428081},
  url = {https://doi.org/10.1109/ICME51207.2021.9428081},
  researchr = {https://researchr.org/publication/He0LOL21},
  cites = {0},
  citedby = {0},
  pages = {1-6},
  booktitle = {2021 IEEE International Conference on Multimedia and Expo, ICME 2021, Shenzhen, China, July 5-9, 2021},
  publisher = {IEEE},
  isbn = {978-1-6654-3864-3},
}