A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions

Liming Wang, Mark Hasegawa-Johnson. A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions. In Helen Meng, Bo Xu 0011, Thomas Fang Zheng, editors, Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020. pages 1456-1460, ISCA, 2020. [doi]

Abstract

Abstract is missing.