Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux. Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos. In Naomi Harte, Julie Carson-Berndsen, Gareth Jones, editors, 24th Annual Conference of the International Speech Communication Association, Interspeech 2023, Dublin, Ireland, August 20-24, 2023. pages 4663-4667, ISCA, 2023. [doi]
Abstract is missing.