VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion

Disong Wang, Shan Yang, Dan Su 0002, Xunying Liu, Dong Yu 0001, Helen Meng. VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. pages 7252-7256, IEEE, 2022. [doi]

Authors

Disong Wang

This author has not been identified. Look up 'Disong Wang' in Google

Shan Yang

This author has not been identified. Look up 'Shan Yang' in Google

Dan Su 0002

This author has not been identified. Look up 'Dan Su 0002' in Google

Xunying Liu

This author has not been identified. Look up 'Xunying Liu' in Google

Dong Yu 0001

This author has not been identified. Look up 'Dong Yu 0001' in Google

Helen Meng

This author has not been identified. Look up 'Helen Meng' in Google