The following publications are possibly variants of this publication:
- Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event LocalizationHanyu Xuan, Zhenyu Zhang, Shuo Chen, Jian Yang, Yan Yan. AAAI 2020: 279-286 [doi]
- Temporal and Cross-modal Attention for Audio-Visual Zero-Shot LearningOtniel-Bogdan Mercea, Thomas Hummel 0001, A. Sophia Koepke, Zeynep Akata. eccv 2022: 488-505 [doi]
- Audio-Visual Speaker Recognition with a Cross-Modal Discriminative NetworkRuijie Tao, Rohan Kumar Das, Haizhou Li 0001. interspeech 2020: 2242-2246 [doi]
- Bi-Directional Modality Fusion Network For Audio-Visual Event LocalizationShuo Liu, Weize Quan, Yuan Liu, Dong-Ming Yan 0001. icassp 2022: 4868-4872 [doi]
- Dense Modality Interaction Network for Audio-Visual Event LocalizationShuo Liu, Weize Quan, Chaoqun Wang, Yuan Liu, Bin Liu 0041, Dong-Ming Yan 0001. tmm, 25:2734-2748, 2023. [doi]