Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval

Liming Wang, Xinsheng Wang, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak. Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021. pages 7603-7607, IEEE, 2021. [doi]

Abstract

Abstract is missing.