12th ISCA Speech Synthesis Workshop, SSW 2023, Grenoble, France, August 26-28, 2023 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Gérard Bailly, Thomas Hueber, Damien Lolive, Nicolas Obin, Olivier Perrotin, editors, 12th ISCA Speech Synthesis Workshop, SSW 2023, Grenoble, France, August 26-28, 2023. ISCA, 2023. [doi]

Conference: ssw2023

Abstract is missing.

Advocating for text input in multi-speaker text-to-speech systemsGérard Bailly, Martin Lenglet, Olivier Perrotin, Esther Klabbers. 1-7 [doi]

Spell4TTS: Acoustically-informed spellings for improving text-to-speech pronunciationsJason Fong, Hao Tang, Simon King 0001. 8-13 [doi]

A Comparative Analysis of Pretrained Language Models for Text-to-SpeechMarcel Granero Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman. 14-20 [doi]

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language SelectionPhat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers. 21-26 [doi]

Importance of Human Factors in Text-To-Speech EvaluationsLev Finkelstein, Joshua Camp, Rob Clark. 27-33 [doi]

Re-examining the quality dimensions of synthetic speechFritz Seebauer, Michael Kuhlmann, Reinhold Haeb-Umbach, Petra Wagner. 34-40 [doi]

Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluationAmbika Kirkland, Shivam Mehta, Harm Lameris, Gustav Eje Henter, Éva Székely, Joakim Gustafson. 41-47 [doi]

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA ModuleOndrej Plátek, Ondrej Dusek. 48-54 [doi]

Cross-lingual transfer using phonological features for resource-scarce text-to-speechJohannes A. Louw. 55-61 [doi]

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertionYuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari. 62-68 [doi]

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTSHarm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely. 69-74 [doi]

Synthesising turn-taking cues using natural conversational dataJohannah O'Mahony, Catherine Lai, Simon King 0001. 75-80 [doi]

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep EmbeddingsArnab Das, Suhita Ghosh, Tim Polzehl, Ingo Siegert, Sebastian Stober. 81-87 [doi]

PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational AutoencoderKou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko. 88-93 [doi]

Federated Learning for Human-in-the-Loop Many-to-Many Voice ConversionRyunosuke Hirai, Yuki Saito, Hiroshi Saruwatari. 94-99 [doi]

HiFi-VC: High Quality ASR-based Voice ConversionAnton Kashkin, Ivan Karpukhin, Svyatoslav Shishkin. 100-105 [doi]

EmoSpeech: guiding FastSpeech2 towards Emotional Text to SpeechDaria Diatlova, Vitalii Shutov. 106-112 [doi]

Controllable Emphasis with zero data for text-to-speechArnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova. 113-119 [doi]

Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive ControlMartin Lenglet, Olivier Perrotin, Gérard Bailly. 120-126 [doi]

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis ProsodySofoklis Kakouros, Juraj Simko, Martti Vainio, Antti Suni. 127-133 [doi]

An analysis on the effects of speaker embedding choice in non auto-regressive TTSAdriana Stan, Johannah O'Mahony. 134-138 [doi]

Audiobook synthesis with long-form neural text-to-speechWeicheng Zhang, Cheng-chieh Yeh, Will Beckman, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky. 139-143 [doi]

Improving the quality of neural TTS using long-form content and multi-speaker multi-style modelingTuomo Raitio, Javier Latorre, Andrea Davis, Tuuli Morrill, Ladan Golipour. 144-149 [doi]

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesisShivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter. 150-156 [doi]

Diffusion Transformer for Adaptive Text-to-SpeechHaolin Chen, Philip N. Garner. 157-162 [doi]

On the Use of Self-Supervised Speech Representations in Spontaneous Speech SynthesisSiyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely. 163-169 [doi]

Voice Cloning: Training Speaker Selection with Limited Multi-Speaker CorpusDavid Guennec, Lily Wadoux, Aghilas Sini, Nelly Barbot, Damien Lolive. 170-176 [doi]

Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time WarpingRavi Shankar, Archana Venkataraman. 177-183 [doi]

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel DataJarod Duret, Yannick Estève, Titouan Parcollet. 184-190 [doi]

Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination TestsKishor Kayyar Lakshminarayana, Christian Dittmar, Nicola Pia, Emanuël A. P. Habets. 191-196 [doi]

Better Replacement for TTS Naturalness EvaluationSajad Shirali-Shahreza, Gerald Penn. 197-203 [doi]

The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized LecturesMikey Elmers, Éva Székely. 204-210 [doi]

SPTK4: An Open-Source Software Toolkit for Speech Signal ProcessingTakenori Yoshimura, Takato Fujimoto, Keiichiro Oura, Keiichi Tokuda. 211-217 [doi]

FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From SpectrogramsLev Finkelstein, Chun-an Chan, Vincent Wan, Heiga Zen, Rob Clark. 218-224 [doi]

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applicationsBiel Tura Vecino, Adam Gabrys, Daniel Matwicki, Andrzej Pomirski, Tom Iddon, Marius Cotescu, Jaime Lorenzo-Trueba. 225-229 [doi]

Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech SynthesisIbrahim Ibrahimov, Gábor Gosztolya, Tamás Gábor Csapó. 230-235 [doi]

Universal Approach to Multilingual Multispeaker Child Speech SynthesisUniversal Approach to Multilingual Multispeaker Child Speech SynthesisShaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh. 236-237 [doi]

Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech IntelligibilitySeraphina Fong, Marco Matassoni, Gianluca Esposito, Alessio Brutti. 238-239 [doi]

Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised modelsMaxime Jacquelin, Maeva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin. 240-241 [doi]

SarcasticSpeech: Speech Synthesis for Sarcasm in Low-Resource ScenariosZhu Li, Xiyuan Gao, Shekhar Nayak, Matt Coler. 242-243 [doi]

Recovering Discrete Prosody Inputs via Invert-ClassifyNicholas Sanders, Korin Richmond. 244-245 [doi]

Using a Large Language Model to Control Speaking Style for Expressive TTSAtli Thor Sigurgeirsson, Simon King 0001. 246-247 [doi]

NaijaTTS: A pitch-controllable TTS model for Nigerian PidginEmmett Strickland, Dana Aubakirova, Dorin Doncenco, Diego Torres, Marc Evrard. 248-249 [doi]

runs on WebDSL