A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition

Yuwu Tang, Ying Hu, Liang He, Hao Huang. A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition. Speech Communication, 143:21-32, 2022. [doi]

Authors

Yuwu Tang

This author has not been identified. Look up 'Yuwu Tang' in Google

Ying Hu

This author has not been identified. Look up 'Ying Hu' in Google

Liang He

This author has not been identified. Look up 'Liang He' in Google

Hao Huang

This author has not been identified. Look up 'Hao Huang' in Google