A Transformer-Based Audio Captioning Model with Keyword Estimation

Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito. A Transformer-Based Audio Captioning Model with Keyword Estimation. In Helen Meng, Bo Xu 0011, Thomas Fang Zheng, editors, Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020. pages 1977-1981, ISCA, 2020. [doi]

Abstract

Abstract is missing.