11th ISCA Speech Synthesis Workshop, SSW 2021, Budapest, Hungary, August 26-28, 2021 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Géza Németh, editor, 11th ISCA Speech Synthesis Workshop, SSW 2021, Budapest, Hungary, August 26-28, 2021. ISCA, 2021. [doi]

Conference: ssw2021

Abstract is missing.

Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speechSai Sirisha Rallabandi, Babak Naderi, Sebastian Möller 0001. 1-6 [doi]

Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue ImagingTamás Gábor Csapó. 7-12 [doi]

Impact of Segmentation and Annotation in French end-to-end SynthesisMartin Lenglet, Olivier Perrotin, Gérard Bailly. 13-18 [doi]

Pathological voice adaptation with autoencoder-based voice conversionMarc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velázquez, Odette Scharenborg. 19-24 [doi]

Location, Location: Enhancing the Evaluation of Text-to-Speech synthesis using the Rapid Prosody Transcription ParadigmElijah Gutierrez, Pilar Oplustil Gallegos, Catherine Lai. 25-30 [doi]

Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory InputTamás Gábor Csapó, László Tóth 0001, Gábor Gosztolya, Alexandra Markó. 31-36 [doi]

Combining speakers of multiple languages to improve quality of neural voicesJavier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie, Yannis Stylianou. 37-42 [doi]

Methods of slowing down speechChristina Tånnander, Jens Edlund. 43-47 [doi]

Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesisJoakim Gustafson, Jonas Beskow, Éva Székely. 48-53 [doi]

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue ImagingCsaba Zainkó, László Tóth 0001, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó. 54-59 [doi]

Improving Emotional TTS with an Emotion Intensity Input from Unsupervised ExtractionBastian Schnell, Philip N. Garner. 60-65 [doi]

Acquiring conversational speaking style from multi-speaker spontaneous dialog corpus for prosody-controllable sequence-to-sequence speech synthesisSlava Shechtman, Avrech Ben-David. 66-71 [doi]

EmoCat: Language-agnostic Emotional Voice ConversionBastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman, Jaime Lorenzo-Trueba. 72-77 [doi]

Enhancing audio quality for expressive Neural Text-to-SpeechAbdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba, Viacheslav Klimkov. 78-83 [doi]

Are we truly modeling expressiveness? A study on expressive TTS in Brazilian Portuguese for real-life application stylesLucas H. Ueda, Paula D. P. Costa, Flávio Olmos Simões, Mário Uliani Neto. 84-89 [doi]

Vocal tract area function extraction using ultrasound for articulatory speech synthesisDebasish Ray Mohapatra, Pramit Saha, Yadong Liu, Bryan Gick, Sidney Fels. 90-95 [doi]

Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive SpeechRaahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt. 96-101 [doi]

Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologiesPaul Konstantin Krug, Simon Stone, Peter Birkholz. 102-107 [doi]

Perception of smiling voice in spontaneous speech synthesisAmbika Kirkland, Marcin Wlodarczak, Joakim Gustafson, Éva Székely. 108-112 [doi]

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant EnvironmentsAlejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati, Thomas Drugman. 113-117 [doi]

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody ControlKonstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris, Georgia Maniati. 118-123 [doi]

Exploring Disentanglement with Multilingual and Monolingual VQ-VAEJennifer Williams 0001, Jason Fong, Erica Cooper, Junichi Yamagishi. 124-129 [doi]

Text-to-Speech Synthesis Techniques for MIDI-to-Audio SynthesisErica Cooper, Xin Wang 0037, Junichi Yamagishi. 130-135 [doi]

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performanceHieu-Thi Luong, Junichi Yamagishi. 136-141 [doi]

Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear predictionPatrick Lumban Tobing, Tomoki Toda. 142-147 [doi]

Factors Affecting the Evaluation of Synthetic Speech in ContextJohannah O'Mahony, Pilar Oplustil Gallegos, Catherine Lai, Simon King 0001. 148-153 [doi]

Non-native English lexicon creation for bilingual speech synthesisArun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne. 154-159 [doi]

Cross-lingual Transfer of Phonological Features for Low-resource Speech SynthesisDan Wells, Korin Richmond. 160-165 [doi]

Mind your p's and k's - Comparing obstruents across TTS voices of the Blizzard Challenge 2013Ayushi Pandey, Sébastien Le Maguer, Julie Carson-Berndsen, Naomi Harte. 166-171 [doi]

Improving Polyglot Speech Synthesis through Multi-task and Adversarial LearningJason Fong, Jilong Wu, Prabhav Agrawal, Andrew Gibiansky, Thilo Köhler, Qing He. 172-176 [doi]

Multi-Scale Spectrogram Modelling for Neural Text-to-SpeechAmmar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangen, Sri Karlapati, Thomas Drugman. 177-182 [doi]

How do Voices from Past Speech Synthesis Challenges Compare Today?Erica Cooper, Junichi Yamagishi. 183-188 [doi]

Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational AutoencoderKazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari. 189-194 [doi]

Liaison and Pronunciation Learning in End-to-End Text-to-Speech in FrenchJason Taylor, Sébastien Le Maguer, Korin Richmond. 195-199 [doi]

FeatherTTS: Robust and Efficient attention based Neural TTSQiao Tian, Chao Liu 0030, Zewang Zhang, Heng Lu, Linghui Chen, Bin Wei, Pujiang He, Shan Liu 0001. 200-204 [doi]

Comparing acoustic and textual representations of previous linguistic context for improving Text-to-SpeechPilar Oplustil Gallegos, Johannah O'Mahony, Simon King 0001. 205-210 [doi]

Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word EmbeddingsWataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari. 211-215 [doi]

Lipsyncing efforts for transcreating lecture videos in Indian languagesMano Ranjith Kumar M., Jom Kuriakose, Karthik Pandia D. S, Hema A. Murthy. 216-221 [doi]

Homograph disambiguation with contextual word embeddings for TTS systemsMarco Nicolis, Viacheslav Klimkov. 222-226 [doi]

Analysing Temporal Sensitivity of VQ-VAE Sub-Phone CodebooksJason Fong, Jennifer Williams 0001, Simon King 0001. 227-231 [doi]

runs on WebDSL