Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos

Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux. Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos. In Naomi Harte, Julie Carson-Berndsen, Gareth Jones, editors, 24th Annual Conference of the International Speech Communication Association, Interspeech 2023, Dublin, Ireland, August 20-24, 2023. pages 4663-4667, ISCA, 2023. [doi]

Abstract

Abstract is missing.