Chunhui Lu, Xue Wen 0002, Liming Song, Junkwang Oh. Robust Neural Codec Language Modeling with Phoneme Position Prediction for Zero-Shot TTS. In Odette Scharenborg, Catharine Oertel, Khiet Truong, editors, 26th Annual Conference of the International Speech Communication Association, Interspeech 2025, Rotterdam, The Netherlands, 17-21 August 2025. ISCA, 2025. [doi]
Abstract is missing.