Weiting Tan, Jiachen Lian, Hirofumi Inaguma, Paden Tomasello, Philipp Koehn, Xutai Ma. Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 2600-2617, Association for Computational Linguistics, 2025. [doi]
Abstract is missing.