Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura. Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data. In Hanseok Ko, John H. L. Hansen, editors, Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022. pages 526-530, ISCA, 2022. [doi]

Abstract

Abstract is missing.