A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition

Yuwu Tang, Ying Hu, Liang He, Hao Huang. A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition. Speech Communication, 143:21-32, 2022. [doi]

Abstract

Abstract is missing.