ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Huadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, Jinzheng He, Zhou Zhao. ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. pages 15957-15969, Association for Computational Linguistics, 2023. [doi]

Authors

Huadai Liu

This author has not been identified. Look up 'Huadai Liu' in Google

Rongjie Huang

This author has not been identified. Look up 'Rongjie Huang' in Google

Xuan Lin

This author has not been identified. Look up 'Xuan Lin' in Google

Wenqiang Xu

This author has not been identified. Look up 'Wenqiang Xu' in Google

Maozong Zheng

This author has not been identified. Look up 'Maozong Zheng' in Google

Hong Chen

This author has not been identified. Look up 'Hong Chen' in Google

Jinzheng He

This author has not been identified. Look up 'Jinzheng He' in Google

Zhou Zhao

This author has not been identified. Look up 'Zhou Zhao' in Google