NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan 0003, Detai Xin, Dongchao Yang, Eric Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu 0001, Tao Qin 0001, Xiangyang Li 0001, Wei Ye 0004, Shikun Zhang, Jiang Bian 0002, Lei He 0005, Jinyu Li 0001, Sheng Zhao. NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. pages 22605-22623, OpenReview.net, 2024. [doi]

Abstract

Abstract is missing.