Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

Hung Le, Doyen Sahoo, Nancy F. Chen, Steven C. H. Hoi. Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation. Computer Speech & Language, 63:101095, 2020. [doi]

Abstract

Abstract is missing.