Look, listen, and decode: Multimodal speech recognition with images

Felix Sun, David F. Harwath, James R. Glass. Look, listen, and decode: Multimodal speech recognition with images. In 2016 IEEE Spoken Language Technology Workshop, SLT 2016, San Diego, CA, USA, December 13-16, 2016. pages 573-578, IEEE, 2016. [doi]

Abstract

Abstract is missing.