CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis

Yi Meng, Xiang Li, Zhiyong Wu 0001, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng. CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis. In Hanseok Ko, John H. L. Hansen, editors, Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022. pages 5533-5537, ISCA, 2022. [doi]

Authors

Yi Meng

This author has not been identified. Look up 'Yi Meng' in Google

Xiang Li

This author has not been identified. Look up 'Xiang Li' in Google

Zhiyong Wu 0001

This author has not been identified. Look up 'Zhiyong Wu 0001' in Google

Tingtian Li

This author has not been identified. Look up 'Tingtian Li' in Google

Zixun Sun

This author has not been identified. Look up 'Zixun Sun' in Google

Xinyu Xiao

This author has not been identified. Look up 'Xinyu Xiao' in Google

Chi Sun

This author has not been identified. Look up 'Chi Sun' in Google

Hui Zhan

This author has not been identified. Look up 'Hui Zhan' in Google

Helen Meng

This author has not been identified. Look up 'Helen Meng' in Google