Abstract is missing.
- Advocating for text input in multi-speaker text-to-speech systemsGérard Bailly, Martin Lenglet, Olivier Perrotin, Esther Klabbers. 1-7 [doi]
- Spell4TTS: Acoustically-informed spellings for improving text-to-speech pronunciationsJason Fong, Hao Tang, Simon King 0001. 8-13 [doi]
- A Comparative Analysis of Pretrained Language Models for Text-to-SpeechMarcel Granero Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman. 14-20 [doi]
- Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language SelectionPhat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers. 21-26 [doi]
- Importance of Human Factors in Text-To-Speech EvaluationsLev Finkelstein, Joshua Camp, Rob Clark. 27-33 [doi]
- Re-examining the quality dimensions of synthetic speechFritz Seebauer, Michael Kuhlmann, Reinhold Haeb-Umbach, Petra Wagner. 34-40 [doi]
- Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluationAmbika Kirkland, Shivam Mehta, Harm Lameris, Gustav Eje Henter, Éva Székely, Joakim Gustafson. 41-47 [doi]
- MooseNet: A Trainable Metric for Synthesized Speech with a PLDA ModuleOndrej Plátek, Ondrej Dusek. 48-54 [doi]
- Cross-lingual transfer using phonological features for resource-scarce text-to-speechJohannes A. Louw. 55-61 [doi]
- Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertionYuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari. 62-68 [doi]
- Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTSHarm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely. 69-74 [doi]
- Synthesising turn-taking cues using natural conversational dataJohannah O'Mahony, Catherine Lai, Simon King 0001. 75-80 [doi]
- StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep EmbeddingsArnab Das, Suhita Ghosh, Tim Polzehl, Ingo Siegert, Sebastian Stober. 81-87 [doi]
- PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational AutoencoderKou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko. 88-93 [doi]
- Federated Learning for Human-in-the-Loop Many-to-Many Voice ConversionRyunosuke Hirai, Yuki Saito, Hiroshi Saruwatari. 94-99 [doi]
- HiFi-VC: High Quality ASR-based Voice ConversionAnton Kashkin, Ivan Karpukhin, Svyatoslav Shishkin. 100-105 [doi]
- EmoSpeech: guiding FastSpeech2 towards Emotional Text to SpeechDaria Diatlova, Vitalii Shutov. 106-112 [doi]
- Controllable Emphasis with zero data for text-to-speechArnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova. 113-119 [doi]
- Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive ControlMartin Lenglet, Olivier Perrotin, Gérard Bailly. 120-126 [doi]
- Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis ProsodySofoklis Kakouros, Juraj Simko, Martti Vainio, Antti Suni. 127-133 [doi]
- An analysis on the effects of speaker embedding choice in non auto-regressive TTSAdriana Stan, Johannah O'Mahony. 134-138 [doi]
- Audiobook synthesis with long-form neural text-to-speechWeicheng Zhang, Cheng-chieh Yeh, Will Beckman, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky. 139-143 [doi]
- Improving the quality of neural TTS using long-form content and multi-speaker multi-style modelingTuomo Raitio, Javier Latorre, Andrea Davis, Tuuli Morrill, Ladan Golipour. 144-149 [doi]
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesisShivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter. 150-156 [doi]
- Diffusion Transformer for Adaptive Text-to-SpeechHaolin Chen, Philip N. Garner. 157-162 [doi]
- On the Use of Self-Supervised Speech Representations in Spontaneous Speech SynthesisSiyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely. 163-169 [doi]
- Voice Cloning: Training Speaker Selection with Limited Multi-Speaker CorpusDavid Guennec, Lily Wadoux, Aghilas Sini, Nelly Barbot, Damien Lolive. 170-176 [doi]
- Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time WarpingRavi Shankar, Archana Venkataraman. 177-183 [doi]
- Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel DataJarod Duret, Yannick Estève, Titouan Parcollet. 184-190 [doi]
- Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination TestsKishor Kayyar Lakshminarayana, Christian Dittmar, Nicola Pia, Emanuël A. P. Habets. 191-196 [doi]
- Better Replacement for TTS Naturalness EvaluationSajad Shirali-Shahreza, Gerald Penn. 197-203 [doi]
- The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized LecturesMikey Elmers, Éva Székely. 204-210 [doi]
- SPTK4: An Open-Source Software Toolkit for Speech Signal ProcessingTakenori Yoshimura, Takato Fujimoto, Keiichiro Oura, Keiichi Tokuda. 211-217 [doi]
- FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From SpectrogramsLev Finkelstein, Chun-an Chan, Vincent Wan, Heiga Zen, Rob Clark. 218-224 [doi]
- Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applicationsBiel Tura Vecino, Adam Gabrys, Daniel Matwicki, Andrzej Pomirski, Tom Iddon, Marius Cotescu, Jaime Lorenzo-Trueba. 225-229 [doi]
- Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech SynthesisIbrahim Ibrahimov, Gábor Gosztolya, Tamás Gábor Csapó. 230-235 [doi]
- Universal Approach to Multilingual Multispeaker Child Speech SynthesisUniversal Approach to Multilingual Multispeaker Child Speech SynthesisShaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh. 236-237 [doi]
- Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech IntelligibilitySeraphina Fong, Marco Matassoni, Gianluca Esposito, Alessio Brutti. 238-239 [doi]
- Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised modelsMaxime Jacquelin, Maeva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin. 240-241 [doi]
- SarcasticSpeech: Speech Synthesis for Sarcasm in Low-Resource ScenariosZhu Li, Xiyuan Gao, Shekhar Nayak, Matt Coler. 242-243 [doi]
- Recovering Discrete Prosody Inputs via Invert-ClassifyNicholas Sanders, Korin Richmond. 244-245 [doi]
- Using a Large Language Model to Control Speaking Style for Expressive TTSAtli Thor Sigurgeirsson, Simon King 0001. 246-247 [doi]
- NaijaTTS: A pitch-controllable TTS model for Nigerian PidginEmmett Strickland, Dana Aubakirova, Dorin Doncenco, Diego Torres, Marc Evrard. 248-249 [doi]