Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition

Liangfa Wei, Jie Zhang, Junfeng Hou, Lirong Dai. Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2020, Auckland, New Zealand, December 7-10, 2020. pages 638-643, IEEE, 2020. [doi]

Abstract

Abstract is missing.