VisualSpeech: Enhancing Prosody Modeling in TTS Using Video

Shumin Que, Anton Ragni. VisualSpeech: Enhancing Prosody Modeling in TTS Using Video. In Odette Scharenborg, Catharine Oertel, Khiet Truong, editors, 26th Annual Conference of the International Speech Communication Association, Interspeech 2025, Rotterdam, The Netherlands, 17-21 August 2025. ISCA, 2025. [doi]

Abstract

Abstract is missing.