Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior

Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu. Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. pages 6699-6703, IEEE, 2020. [doi]

Authors

Guangzhi Sun

This author has not been identified. Look up 'Guangzhi Sun' in Google

Yu Zhang

This author has not been identified. Look up 'Yu Zhang' in Google

Ron J. Weiss

This author has not been identified. Look up 'Ron J. Weiss' in Google

Yuan Cao

This author has not been identified. Look up 'Yuan Cao' in Google

Heiga Zen

This author has not been identified. Look up 'Heiga Zen' in Google

Andrew Rosenberg

This author has not been identified. Look up 'Andrew Rosenberg' in Google

Bhuvana Ramabhadran

This author has not been identified. Look up 'Bhuvana Ramabhadran' in Google

Yonghui Wu

This author has not been identified. Look up 'Yonghui Wu' in Google