Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Thao Le Minh, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda. Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances. In Jérôme Lang, editor, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. pages 1546-1553, ijcai.org, 2018. [doi]

Abstract

Abstract is missing.