Abstract is missing.
- The cognitive status of simple and complex modelsJanet B. Pierrehumbert. [doi]
- Doing Something we Never could with Spoken Language Technologies-from early days to the era of deep learningLin-Shan Lee. [doi]
- Brain networks enabling speech perception in everyday settingsBarbara G. Shinn-Cunningham. [doi]
- Successes, Challenges and Opportunities for Speech Technology in Conversational AgentsShehzad Mevawalla. [doi]
- On the Comparison of Popular End-to-End Models for Large Scale Speech RecognitionJinyu Li, Yu Wu 0012, Yashesh Gaur, Chengyi Wang 0002, Rui Zhao, Shujie Liu 0001. 1-5 [doi]
- SAN-M: Memory Equipped Self-Attention for End-to-End Speech RecognitionZhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin. 6-10 [doi]
- Contextual RNN-T for Open Domain ASRMahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf. 11-15 [doi]
- ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech RecognitionJing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei 0001, Tao Ma. 16-20 [doi]
- Compressing LSTM Networks with Hierarchical Coarse-Grain SparsityDeepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo. 21-25 [doi]
- BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion ExampleTimo Lohrenz, Tim Fingscheidt. 26-30 [doi]
- Relative Positional Encoding for Speech Recognition and Direct TranslationNgoc-Quan Pham, Thanh-Le Ha, Tuan Nam Nguyen, Thai Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues, Alex Waibel. 31-35 [doi]
- Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of SpeakersNaoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka. 36-40 [doi]
- Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation FrameworkTakashi Fukuda, Samuel Thomas 0001. 41-45 [doi]
- Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech RecognitionJinhwan Park, Wonyong Sung. 46-50 [doi]
- Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech RecognitionGuanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao. 51-55 [doi]
- Neural Spatio-Temporal Beamformer for Target Speech SeparationYong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Chao Weng, Jianming Liu, Dong Yu 0001. 56-60 [doi]
- Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector AnalysisLi Li 0063, Kazuhito Koishida, Shoji Makino. 61-65 [doi]
- End-to-End Multi-Look Keyword SpottingMeng Yu 0003, Xuan Ji, Bo Wu, Dan Su 0002, Dong Yu 0001. 66-70 [doi]
- Differential Beamforming for Uniform Circular Array with Directional MicrophonesWeilong Huang, Jinwei Feng. 71-75 [doi]
- Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech EnhancementJun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee. 76-80 [doi]
- An End-to-End Architecture of Online Multi-Channel Speech SeparationJian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie. 81-85 [doi]
- Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source SeparationYu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi. 86-90 [doi]
- Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and DereverberationTomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki. 91-95 [doi]
- A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 ChallengeYanhui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan, Chin-Hui Lee. 96-100 [doi]
- Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural ConversationYoussef Hmamouche, Laurent Prévot 0001, Magalie Ochs, Thierry Chaminade. 101-105 [doi]
- Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG SignalsDi Zhou, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Zhuo Zhang. 106-110 [doi]
- Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will TellChongyuan Lian, Tianqi Wang, Mingxiao Gu, Manwa L. Ng, Feiqi Zhu, Lan Wang, Nan Yan. 111-115 [doi]
- Congruent Audiovisual Speech Enhances Cortical Envelope Tracking During Auditory Selective AttentionZhen Fu, Jing Chen 0019. 116-120 [doi]
- Contribution of RMS-Level-Based Speech Segments to Target Speech Decoding Under Noisy ConditionsLei Wang 0074, Ed X. Wu, Fei Chen 0011. 121-124 [doi]
- Cortical Oscillatory Hierarchy for Natural Sentence ProcessingBin Zhao, Jianwu Dang, Gaoyan Zhang, Masashi Unoki. 125-129 [doi]
- Comparing EEG Analyses with Different Epoch Alignments in an Auditory Lexical Decision ExperimentLouis ten Bosch, Kimberley Mulder, Lou Boves. 130-134 [doi]
- Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and GaitTanya Talkar, Sophia Yuditskaya, James R. Williamson, Adam C. Lammert, Hrishikesh Rao 0002, Daniel J. Hannon, Anne O'Brien, Gloria Vergara-Diaz, Richard DeLaura, Douglas E. Sturim, Gregory Ciccarelli, Ross Zafonte, Jeff Palmer, Paolo Bonato, Thomas F. Quatieri. 135-139 [doi]
- Towards Learning a Universal Non-Semantic Representation of SpeechJoel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Félix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv. 140-144 [doi]
- Poetic Meter Classification Using i-Vector-MTF FusionRajeev Rajan, Aiswarya Vinod Kumar, Ben P. Babu. 145-149 [doi]
- Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating MechanismWang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie. 150-154 [doi]
- Automatic Analysis of Speech Prosody in DutchNa Hu, Berit Janssen, Judith Hanssen, Carlos Gussenhoven, Aoju Chen. 155-159 [doi]
- Learning Voice Representation Using Knowledge Distillation for Automatic Voice CastingAdrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre. 160-164 [doi]
- Enhancing Formant Information in Spectrographic Display of SpeechB. Yegnanarayana, Joseph M. Anand, Vishala Pannala. 165-169 [doi]
- Unsupervised Methods for Evaluating Speech RepresentationsMichael Gump, Wei-Ning Hsu, James R. Glass. 170-174 [doi]
- Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise EnvironmentsDung N. Tran, Uros Batricevic, Kazuhito Koishida. 175-179 [doi]
- Nonlinear ISA with Auxiliary Variables for Learning Speech RepresentationsAmrith Setlur, Barnabás Póczos, Alan W. Black. 180-184 [doi]
- Harmonic Lowering for Accelerating Harmonic Convolution for Audio SignalsHirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi, Hiroshi Saruwatari. 185-189 [doi]
- Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural VocodersYang Ai, Zhen-Hua Ling. 190-194 [doi]
- FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear PredictionQiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu. 195-199 [doi]
- VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-Nested Adversarial NetworkJinhyeok Yang, Junmo Lee, Young Ik Kim, Hoon-Young Cho, Injung Kim. 200-204 [doi]
- Lightweight LPCNet-Based Neural Vocoder with Tensor DecompositionHiroki Kanagawa, Yusuke Ijima. 205-209 [doi]
- WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPUPo-Chun Hsu, Hung-yi Lee. 210-214 [doi]
- What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTSBrooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber. 215-219 [doi]
- Fast and Lightweight On-Device TTS with Tacotron2 and LPCNetVadim Popov, Stanislav Kamenev, Mikhail Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy, Denis Parkhomenko. 220-224 [doi]
- Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced SpeedWei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang 0031, Xiaodong He 0002, Bowen Zhou. 225-229 [doi]
- Can Auditory Nerve Models Tell us What's Different About WaveNet Vocoded Speech?Sébastien Le Maguer, Naomi Harte. 230-234 [doi]
- Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording ConditionsDipjyoti Paul, Yannis Pantazis, Yannis Stylianou. 235-239 [doi]
- Neural Homomorphic VocoderZhijun Liu, Kuan Chen, Kai Yu. 240-244 [doi]
- Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's SpeechRoberto Gretter, Marco Matassoni, Daniele Falavigna, Keelan Evanini, Chee Wee Leong. 245-249 [doi]
- The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR ChallengeTien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen. 250-254 [doi]
- Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA SystemsKate M. Knill, Linlin Wang, Yu Wang 0027, Xixin Wu, Mark J. F. Gales. 255-259 [doi]
- Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's SpeechHemant Kumar Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo. 260-264 [doi]
- UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's SpeechMostafa Ali Shahin, Renée Lu, Julien Epps, Beena Ahmed. 265-268 [doi]
- End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based AttractorsShota Horiguchi, Yusuke Fujita, Shinji Watanabe 0001, Yawen Xue, Kenji Nagamatsu. 269-273 [doi]
- Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party ScenarioIvan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Y. Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko. 274-278 [doi]
- New Advances in Speaker DiarizationHagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory. 279-283 [doi]
- Self-Attentive Similarity Measurement Strategies in Speaker DiarizationQingjian Lin, Yu Hou, Ming Li. 284-288 [doi]
- Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised LearningJixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno. 289-293 [doi]
- Deep Self-Supervised Hierarchical Clustering for Speaker DiarizationPrachi Singh, Sriram Ganapathy. 294-298 [doi]
- Spot the Conversation: Speaker Diarisation in the WildJoon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman. 299-303 [doi]
- Learning Contextual Language Embeddings for Monaural Multi-Talker Speech RecognitionWangyou Zhang, Yanmin Qian. 304-308 [doi]
- Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech RecognitionZhihao Du, Jiqing Han, Xueliang Zhang. 309-313 [doi]
- Anti-Aliasing Regularization in Stacking LayersAntoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar. 314-318 [doi]
- Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party TranscriptionAndrei Andrusenko, Aleksandr Laptev, Ivan Medennikov. 319-323 [doi]
- End-to-End Far-Field Speech Recognition with Unified Dereverberation and BeamformingWangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe 0001, Yanmin Qian. 324-328 [doi]
- Quaternion Neural Networks for Multi-Channel Distant Speech RecognitionXinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas D. Lane, Mohamed Morchid. 329-333 [doi]
- Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party ScenarioHangting Chen, Pengyuan Zhang, Qian Shi, Zuozhen Liu. 334-338 [doi]
- Neural Speech Separation Using Spatially Distributed MicrophonesDongmei Wang, Zhuo Chen, Takuya Yoshioka. 339-343 [doi]
- Utterance-Wise Meeting Transcription System Using Asynchronous Distributed MicrophonesShota Horiguchi, Yusuke Fujita, Kenji Nagamatsu. 344-348 [doi]
- Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 DatasetJack Deadman, Jon Barker. 349-353 [doi]
- Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from SpeechM. Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn W. Schuller, Tanja Schultz, Alberto Abad, Isabel Trancoso. 354-358 [doi]
- Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical FeaturesJiaxuan Zhang, Sarah Ita Levitan, Julia Hirschberg. 359-363 [doi]
- Multi-Modal Attention for Speech Emotion RecognitionZexu Pan, Zhaojie Luo, Jichen Yang, Haizhou Li 0001. 364-368 [doi]
- WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion RecognitionGuang Shen, Riwei Lai, Rui Chen, Yu Zhang, Kejia Zhang, Qilong Han, Hongtao Song. 369-373 [doi]
- A Multi-Scale Fusion Framework for Bimodal Speech Emotion RecognitionMing Chen, Xudong Zhao 0004. 374-378 [doi]
- Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion RecognitionPengfei Liu, Kun Li 0003, Helen Meng. 379-383 [doi]
- Multi-Modal Embeddings Using Multi-Task Learning for Emotion RecognitionAparna Khare, Srinivas Parthasarathy, Shiva Sundaram. 384-388 [doi]
- Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition NetworkJeng-Lin Li, Chi-Chun Lee. 389-393 [doi]
- Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion RecognitionZheng Lian, Jianhua Tao, Bin Liu, Jian Huang 0014, Zhanlei Yang, Rongjun Li. 394-398 [doi]
- ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control EnvironmentBo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin 0006. 399-403 [doi]
- Developing an Open-Source Corpus of Yoruba SpeechAlexander Gutkin, Isin Demirsahin, Oddur Kjartansson, Clara Rivera, Kólá Túbosún. 404-408 [doi]
- ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact CentersJung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Hyeji Kim, Eunmi Kim, Soojin Kim, Hyun-Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim 0001. 409-413 [doi]
- LAIX Corpus of Chinese Learner English: Towards a Benchmark for L2 English ASRYanhong Wang, Huan Luan, Jiahong Yuan, Bin Wang, Hui Lin. 414-418 [doi]
- Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English ProficiencyVikram Ramanarayanan. 419-423 [doi]
- CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation AssessmentSi Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong. 424-428 [doi]
- FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday TopicsKatri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo. 429-433 [doi]
- DiPCo - Dinner Party CorpusMaarten Van Segbroeck, Ahmed-Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas. 434-436 [doi]
- Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical InterviewsBo Wang 0034, Yue Wu, Niall Taylor, Terry J. Lyons, Maria Liakata, Alejo J. Nevado-Holgado, Kate E. A. Saunders. 437-441 [doi]
- FT Speech: Danish Parliament Speech CorpusAndreas Kirkedal, Marija Stepanovic, Barbara Plank. 442-446 [doi]
- Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language RecognitionRaphaël Duroselle, Denis Jouvet, Irina Illina. 447-451 [doi]
- The XMUSPEECH System for the AP19-OLR ChallengeZheng Li, Miao Zhao, Jing Li, Yiming Zhi, Lin Li, Qingyang Hong. 452-456 [doi]
- On the Usage of Multi-Feature Integration for Speaker Verification and Language IdentificationZheng Li, Miao Zhao, Jing Li, Lin Li, Qingyang Hong. 457-461 [doi]
- What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information?Shammur A. Chowdhury, Ahmed M. Ali, Suwon Shon, James R. Glass. 462-466 [doi]
- Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification DatasetsMatias Lindgren, Tommi Jauhiainen, Mikko Kurimo. 467-471 [doi]
- Learning Intonation Pattern Embeddings for Arabic Dialect IdentificationAitor Arronte Alvarez, Elsayed Sabry Abdelaal Issa. 472-476 [doi]
- Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic LanguagesBadr M. Abdullah, Tania Avgustinova, Bernd Möbius, Dietrich Klakow. 477-481 [doi]
- ICE-Talk: An Interface for a Controllable Expressive Talking MachineNoé Tits, Kevin El Haddad, Thierry Dutoit. 482-483 [doi]
- Kaldi-Web: An Installation-Free, On-Device Speech Recognition SystemMathieu Hu, Laurent Pierron, Emmanuel Vincent 0001, Denis Jouvet. 484-485 [doi]
- Soapbox Labs Verification Platform for Child SpeechAmelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Agape Deng, Arnaud Letondor, Robert O'Regan, Qiru Zhou. 486-487 [doi]
- SoapBox Labs Fluency Assessment Platform for Child SpeechAmelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Gloria Montoya Gomez, Agape Deng, Arnaud Letondor, Niall Mullally, Adrian Hempel, Robert O'Regan, Qiru Zhou. 488-489 [doi]
- CATOTRON - A Neural Text-to-Speech System in CatalanBaybars Külebi, Alp Öktem, Alex Peiró Lilja, Santiago Pascual, Mireia Farrús. 490-491 [doi]
- Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog TechnologyVikram Ramanarayanan, Oliver Roesler, Michael Neumann, David Pautler, Doug Habberstad, Andrew Cornish, Hardik Kothare, Vignesh Murali, Jackson Liscombe, Dirk Schnelle-Walka, Patrick L. Lange, David Suendermann-Oeft. 492-493 [doi]
- VoiceID on the Fly: A Speaker Recognition System that Learns from ScratchBaihan Lin, Xinxin Zhang. 494-495 [doi]
- Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition ModelsZhao Ren, Jing Han 0010, Nicholas Cummins, Björn W. Schuller. 496-500 [doi]
- End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR ModelHan Feng, Sei Ueno, Tatsuya Kawahara. 501-505 [doi]
- Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit NetworkBo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee. 506-510 [doi]
- An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect RecognitionAdria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller. 511-515 [doi]
- Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion RecognitionKusha Sridhar, Carlos Busso. 516-520 [doi]
- Augmenting Generative Adversarial Networks for Speech Emotion RecognitionSiddique Latif, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller. 521-525 [doi]
- Speech Emotion Recognition 'in the Wild' Using an AutoencoderVipula Dissanayake, Haimo Zhang, Mark Billinghurst, Suranga Nanayakkara. 526-530 [doi]
- Emotion Profile Refinery for Speech Emotion ClassificationShuiyang Mao, Pak-Chung Ching, Tan Lee. 531-535 [doi]
- Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized AdaptationSung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee. 536-540 [doi]
- Fast and Slow Acoustic ModelKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu. 541-545 [doi]
- Self-Distillation for Improving CTC-Transformer-Based ASR SystemsTakafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix. 546-550 [doi]
- Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on SwitchboardZoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury. 551-555 [doi]
- Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text SelectionZhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno. 556-560 [doi]
- PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASRYiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur. 561-565 [doi]
- CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low LatencyKeyu An, Hongyu Xiang, Zhijian Ou. 566-570 [doi]
- CTC-Synchronous Training for Monotonic Attention ModelHirofumi Inaguma, Masato Mimura, Tatsuya Kawahara. 571-575 [doi]
- Continual Learning for Multi-Dialect Acoustic ModelsBrady Houston, Katrin Kirchhoff. 576-580 [doi]
- SpecSwap: A Simple Data Augmentation Method for End-to-End Speech RecognitionXingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su 0002, Helen Meng. 581-585 [doi]
- RECOApy: Data Recording, Pre-Processing and Phonetic Transcription for End-to-End Speech-Based ApplicationsAdriana Stan. 586-590 [doi]
- Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech RecognizerYuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Françoise Beaufays. 591-595 [doi]
- Statistical Testing on ASR Performance via Blockwise BootstrapZhe Liu, Fuchun Peng. 596-600 [doi]
- Sentence Level Estimation of Psycholinguistic Norms Using Joint Multidimensional AnnotationsAnil Ramakrishna, Shrikanth Narayanan. 601-605 [doi]
- Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition SystemKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhijie Yan. 606-610 [doi]
- Confidence Measures in Encoder-Decoder Models for Speech RecognitionAlejandro Woodward, Clara Bonnín, Issey Masuda, David Varas, Elisenda Bou-Balust, Juan Carlos Riveiro. 611-615 [doi]
- Word Error Rate Estimation Without ASR Output: e-WER2Ahmed Ali 0002, Steve Renals. 616-620 [doi]
- An Evaluation of Manual and Semi-Automatic Laughter AnnotationBogdan Ludusan, Petra Wagner. 621-625 [doi]
- Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual "be"Joshua L. Martin, Kevin Tang. 626-630 [doi]
- Secondary Phonetic Cues in the Production of the Nasal Short-a System in California EnglishGeorgia Zellou, Rebecca Scarborough, Renee Kemp. 631-635 [doi]
- Acoustic Properties of Strident Fricatives at the Edges: Implications for Consonant DiscriminationLouis-Marie Lorin, Lorenzo Maselli, Léo Varnet, Maria Giavazzi. 636-640 [doi]
- 2 Context: Phonology and PhoneticsMingqiong Luo. 641-645 [doi]
- Voicing Distinction of Obstruents in the Hangzhou Wu Chinese DialectYang Yue, Fang Hu. 646-650 [doi]
- The Phonology and Phonetics of Kaifeng Mandarin VowelsLei Wang. 651-655 [doi]
- Microprosodic Variability in Plosives in German and Austrian GermanMargaret Zellers, Barbara Schuppler. 656-660 [doi]
- Er-Suffixation in Southwestern Mandarin: An EMA and Ultrasound StudyJing Huang, Feng-fan Hsieh, Yueh-Chin Chang. 661-665 [doi]
- Electroglottographic-Phonetic Study on Korean Phonation Induced by Tripartite Plosives in Yanbian KoreanYinghao Li, Jinghua Zhang. 666-670 [doi]
- Modeling Global Body Configurations in American Sign LanguageNicholas Wilkins, Max Cordes Galbraith, Ifeoma Nwogu. 671-675 [doi]
- Augmenting Turn-Taking Prediction with Wearable Eye Activity During ConversationHang Li, Siyuan Chen, Julien Epps. 676-680 [doi]
- CAM: Uninteresting Speech DetectorWeiyi Lu, Yi Xu, Peng Yang, Belinda Zeng. 681-685 [doi]
- Mixed Case Contextual ASR Using Capitalization MasksDiamantino Caseiro, Pat Rondon, Quoc-Nam Le-The, Petar Aleksic. 686-690 [doi]
- Speech Recognition and Multi-Speaker Diarization of Long ConversationsHuanru Henry Mao, Shuyang Li, Julian J. McAuley, Garrison W. Cottrell. 691-695 [doi]
- Investigation of Data Augmentation Techniques for Disordered Speech RecognitionMengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng. 696-700 [doi]
- A Real-Time Robot-Based Auxiliary System for Risk Evaluation of COVID-19 InfectionWenqi Wei, Jianzong Wang, Jiteng Ma, Ning Cheng, Jing Xiao. 701-705 [doi]
- An Utterance Verification System for Word Naming Therapy in AphasiaDavid S. Barbera, Mark A. Huckvale, Victoria Fleming, Emily Upton, Henry Coley-Fisher, Ian Shaw, William Latham, Alexander P. Leff, Jenny Crinion. 706-710 [doi]
- Exploiting Cross-Domain Visual Feature Generation for Disordered Speech RecognitionShansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng. 711-715 [doi]
- Joint Prediction of Punctuation and Disfluency in Speech TranscriptsBinghuai Lin, Liyuan Wang. 716-720 [doi]
- Focal Loss for Punctuation PredictionJiangyan Yi, Jianhua Tao, Zhengkun Tian, Ye Bai, Cunhang Fan. 721-725 [doi]
- Improving X-Vector and PLDA for Text-Dependent Speaker VerificationZhuxin Chen, Yue Lin. 726-730 [doi]
- SdSV Challenge 2020: Large-Scale Evaluation of Short-Duration Speaker VerificationHossein Zeinali, Kong-Aik Lee, Jahangir Alam, Lukás Burget. 731-735 [doi]
- The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020Tao Jiang, Miao Zhao, Lin Li, Qingyang Hong. 736-740 [doi]
- Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim. 741-745 [doi]
- The TalTech Systems for the Short-Duration Speaker Verification Challenge 2020Tanel Alumäe, Jörgen Valk. 746-750 [doi]
- Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020Peng Shen, Xugang Lu, Hisashi Kawai. 751-755 [doi]
- Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score NormalizationJenthe Thienpondt, Brecht Desplanques, Kris Demuynck. 756-760 [doi]
- BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukás Burget, Oldrich Plchot, Ondrej Glembek, Ondrej Novotný, Pavel Matejka. 761-765 [doi]
- Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker VerificationVijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan. 766-770 [doi]
- Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial LearningJing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai 0001. 771-775 [doi]
- Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker RecognitionShaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna. 776-780 [doi]
- Non-Parallel Many-to-Many Voice Conversion with PSR-StarGANYanping Li, Dongxiang Xu, Yan Zhang, Yang Wang, Binbin Chen. 781-785 [doi]
- TTS Skins: Speaker Conversion via ASRAdam Polyak, Lior Wolf, Yaniv Taigman. 786-790 [doi]
- GAZEV: GAN-Based Zero-Shot Voice Conversion Over Non-Parallel Speech CorpusZining Zhang, Bingsheng He, Zhenjie Zhang. 791-795 [doi]
- Spoken Content and Voice Factorization for Few-Shot Speaker AdaptationTao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Rongxiu Zhong. 796-800 [doi]
- Unsupervised Cross-Domain Singing Voice ConversionAdam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman. 801-805 [doi]
- Attention-Based Speaker Embeddings for One-Shot Voice ConversionTatsuma Ishihara, Daisuke Saito. 806-810 [doi]
- Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial TrainingJian Cong, Shan Yang, Lei Xie 0001, Guoqiao Yu, Guanglu Wan. 811-815 [doi]
- Gated Multi-Head Attention Pooling for Weakly Labelled Audio TaggingSixin Hong, Yuexian Zou, Wenwu Wang. 816-820 [doi]
- Environmental Sound Classification with Parallel Temporal-Spectral AttentionHelin Wang, Yuexian Zou, Dading Chong, Wenwu Wang. 821-825 [doi]
- Contrastive Predictive Coding of Audio with an AdversaryLuyu Wang, Kazuya Kawakami, Aäron Van Den Oord. 826-830 [doi]
- Memory Controlled Sequential Self Attention for Sound RecognitionArjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos. 831-835 [doi]
- Dual Stage Learning Based Dynamic Time-Frequency Mask Generation for Audio Event ClassificationDonghyeon Kim, Jaihyun Park, David K. Han, Hanseok Ko. 836-840 [doi]
- An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event DetectionXu Zheng, Yan Song 0001, Jie Yan, Li-Rong Dai 0001, Ian McLoughlin, Lin Liu. 841-845 [doi]
- A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average PoolingChieh-Chi Kao, Bowen Shi, Ming Sun, Chao Wang. 846-850 [doi]
- Intra-Utterance Similarity Preserving Knowledge Distillation for Audio TaggingChun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang. 851-855 [doi]
- Two-Stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-Token Connectionist Temporal ClassificationIn Young Park, Hong Kook Kim. 856-860 [doi]
- SpeechMix - Augmenting Deep Sound Recognition Using Hidden Space InterpolationsAmit Jindal, Narayanan Elavathur Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Di Jin, Ramit Sawhney, Rajiv Ratn Shah. 861-865 [doi]
- End-to-End Neural Transformer Based Spoken Language UnderstandingMartin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann. 866-870 [doi]
- Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language UnderstandingChen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen 0002, Kai Yu 0004. 871-875 [doi]
- Speech to Semantics: Improve ASR and NLU Jointly via All-Neural InterfacesMilind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow. 876-880 [doi]
- Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student LearningPavel Denisov, Ngoc Thang Vu. 881-885 [doi]
- Context Dependent RNNLM for Automatic Transcription of ConversationsSrikanth Raj Chetupalli, Sriram Ganapathy. 886-890 [doi]
- Improving End-to-End Speech-to-Intent Classification with ReptileYusheng Tian, Philip John Gorinski. 891-895 [doi]
- Speech to Text Adaptation: Towards an Efficient Cross-Modal DistillationWon-Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim. 896-900 [doi]
- Towards an ASR Error Robust Spoken Language Understanding SystemWeitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss. 901-905 [doi]
- End-to-End Spoken Language Understanding Without Full TranscriptsHong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras. 906-910 [doi]
- Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical StudyKarthik Gopalakrishnan 0001, Behnam Hedayatnia, Longshaokan Wang, Yang Liu, Dilek Hakkani-Tür. 911-915 [doi]
- AutoSpeech: Neural Architecture Search for Speaker RecognitionShaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang. 916-920 [doi]
- Densely Connected Time Delay Neural Network for Speaker VerificationYa-Qi Yu, Wu-Jun Li. 921-925 [doi]
- Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker VerificationSiqi Zheng, Yun Lei, Hongbin Suo. 926-930 [doi]
- Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query AttentionMyunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim. 931-935 [doi]
- Vector-Based Attentive Pooling for Text-Independent Speaker VerificationYanfeng Wu, Chenkai Guo, Hongcan Gao, Xiaolei Hou, Jing Xu 0008. 936-940 [doi]
- Self-Attention Encoding and Pooling for Speaker RecognitionPooyan Safari, Miquel India, Javier Hernando. 941-945 [doi]
- ARET: Aggregated Residual Extended Time-Delay Neural Networks for Speaker VerificationRuiteng Zhang, Jianguo Wei, Wenhuan Lu, Longbiao Wang, Meng Liu, Lin Zhang, Jiayu Jin, Junhai Xu. 946-950 [doi]
- Adversarial Separation Network for Speaker RecognitionHanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong-Aik Lee, Jianguo Wei. 951-955 [doi]
- Text-Independent Speaker Verification with Dual Attention NetworkJingyu Li, Tan Lee. 956-960 [doi]
- Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker VerificationXiaoyang Qu, Jianzong Wang, Jing Xiao. 961-965 [doi]
- Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech RecognitionChao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu. 966-970 [doi]
- Semantic Mask for Transformer Based End-to-End Speech RecognitionChengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li, Shujie Liu 0001, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001. 971-975 [doi]
- Faster, Simpler and More Accurate Hybrid ASR Systems Using WordpiecesFrank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig. 976-980 [doi]
- A Federated Approach in Training Acoustic ModelsDimitrios Dimitriadis, Ken'ichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez. 981-985 [doi]
- On Semi-Supervised LF-MMI Training of Acoustic Models with Limited DataImran A. Sheikh, Emmanuel Vincent 0001, Irina Illina. 986-990 [doi]
- On Front-End Gain Invariant Modeling for Wake Word SpottingYixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Naga Prasad Vitaladevuni. 991-995 [doi]
- Unsupervised Regularization-Based Adaptive Training for Speech RecognitionFenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du. 996-1000 [doi]
- On the Robustness and Training Dynamics of Raw Waveform ModelsErfan Loweimi, Peter Bell 0001, Steve Renals. 1001-1005 [doi]
- Iterative Pseudo-Labeling for Speech RecognitionQiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert. 1006-1010 [doi]
- Smart Tube: A Biofeedback System for Vocal Training and Therapy Through Tube PhonationNaoko Kawamura, Tatsuya Kitamura, Kenta Hamada. 1011-1012 [doi]
- VCTUBE : A Library for Automatic Speech Data AnnotationSeong Choi, Seunghoon Jeong, Jeewoo Yoon, Migyeong Yang, Minsam Ko, Eunil Park, Jinyoung Han, Munyoung Lee, Seonghee Lee. 1013-1014 [doi]
- A Mandarin L2 Learning APP with Mispronunciation Detection and FeedbackYanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang, Yujia Jin. 1015-1016 [doi]
- Rapid Enhancement of NLP Systems by Acquisition of Data in Correlated DomainsTejas Udayakumar, Kinnera Saranu, Mayuresh Sanjay Oak, Ajit Ashok Saunshikhar, Sandip Shriram Bapat. 1017-1018 [doi]
- Computer-Assisted Language Learning System: Automatic Speech Evaluation for Children Learning Malay and TamilKe Shi, Kye Min Tan, Richeng Duan, Siti Umairah Md. Salleh, Nur Farah Ain Suhaimi, Rajan Vellu, Ngoc Thuy Huong Helen Thai, Nancy F. Chen. 1019-1020 [doi]
- Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPUTakaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari. 1021-1022 [doi]
- A Dynamic 3D Pronunciation Teaching Model Based on Pronunciation Attributes and AnatomyXiaoli Feng, Yanlu Xie, Yayue Deng, Boxue Li. 1023-1024 [doi]
- End-to-End Deep Learning Speech Recognition Model for Silent Speech ChallengeNaoki Kimura, Zixiong Su, Takaaki Saeki. 1025-1026 [doi]
- Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?Jialu Li, Mark Hasegawa-Johnson. 1027-1031 [doi]
- Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian LanguagesMartha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz. 1032-1036 [doi]
- Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task LearningWenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi, Takahiro Shinozaki. 1037-1041 [doi]
- Multi-Encoder-Decoder Transformer for Code-Switching Speech RecognitionXinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li, Haizhou Li 0001. 1042-1046 [doi]
- Multilingual Acoustic and Language Modeling for Ethio-Semitic LanguagesSolomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz. 1047-1051 [doi]
- Multilingual Jointly Trained Acoustic and Written Word EmbeddingsYushi Hu, Shane Settle, Karen Livescu. 1052-1056 [doi]
- Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial NetworksChia-Yu Li, Ngoc Thang Vu. 1057-1061 [doi]
- Data Augmentation for Code-Switch Language Modeling by Fusing Multiple Text Generation MethodsXinhui Hu, Qi Zhang, Lei Yang, Binbin Gu, Xinkang Xu. 1062-1066 [doi]
- A 43 Language Multilingual Punctuation Prediction Neural Network ModelXinxing Li, Edward Lin. 1067-1071 [doi]
- Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech RecognitionJisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee. 1072-1075 [doi]
- Multi-Task Siamese Neural Network for Improving Replay Attack DetectionPatrick von Platen, Fei Tao, Gökhan Tür. 1076-1080 [doi]
- POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop NoiseKosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima, Kong-Aik Lee. 1081-1085 [doi]
- Dual-Adversarial Domain Adaptation for Generalized Replay Attack DetectionHongji Wang, Heinrich Dinkel, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004. 1086-1090 [doi]
- Self-Supervised Pre-Training with Acoustic Configurations for Replay Spoofing DetectionHye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu. 1091-1095 [doi]
- Competency Evaluation in Voice Mimicking Using Acoustic CuesAbhijith Girish, Adharsh Sabu, Akshay Prasannan Latha, Rajeev Rajan. 1096-1100 [doi]
- Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech AttacksZhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li 0001. 1101-1105 [doi]
- Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band ClassifiersHemlata Tak, Jose Patino 0001, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco. 1106-1110 [doi]
- Investigating Light-ResNet Architecture for Spoofing Detection Under Mismatched ConditionsPrasanth Parasu, Julien Epps, Kaavya Sriskandaraja, Gajan Suthokumar. 1111-1115 [doi]
- Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech DetectionZhenchun Lei, Yingen Yang, Changhong Liu, Jihua Ye. 1116-1120 [doi]
- Lightweight Online Noise Reduction on Embedded Devices Using Hierarchical Recurrent Neural NetworksHendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Pascal Zobel, Andreas Maier 0001. 1121-1125 [doi]
- SEANet: A Multi-Modal Speech Enhancement NetworkMarco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek. 1126-1130 [doi]
- Lite Audio-Visual Speech EnhancementShang-Yi Chuang, Yu Tsao 0001, Chen-Chou Lo, Hsin-Min Wang. 1131-1135 [doi]
- ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale CommunicationChristian Bergler, Manuel Schmitt, Andreas Maier 0001, Simeon Smeele, Volker Barth, Elmar Nöth. 1136-1140 [doi]
- A Deep Learning Approach to Active Noise ControlHao Zhang, DeLiang Wang. 1141-1145 [doi]
- Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style ConversionTuan Dinh, Alexander Kain, Kris Tjaden. 1146-1150 [doi]
- End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural NetworksMathias Bach Pedersen, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen 0001. 1151-1155 [doi]
- Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR SystemKenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino. 1156-1160 [doi]
- Automatic Estimation of Intelligibility Measure for Consonants in SpeechAli Abavisani, Mark Hasegawa-Johnson. 1161-1165 [doi]
- Large Scale Evaluation of Importance Maps in Automatic Speech RecognitionViet Anh Trinh, Michael I. Mandel. 1166-1170 [doi]
- Neural Architecture Search on Acoustic Scene ClassificationJixiang Li, Chuming Liang, Bo Zhang 0046, Zhao Wang, Fei Xiang, Xiangxiang Chu. 1171-1175 [doi]
- Acoustic Scene Classification Using Audio TaggingJee-weon Jung, Hye-jin Shim, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu. 1176-1180 [doi]
- ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene ClassificationLiwen Zhang, Jiqing Han, Ziqiang Shi. 1181-1185 [doi]
- Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural NetworkJivitesh Sharma, Ole-Christoffer Granmo, Morten Goodwin. 1186-1190 [doi]
- Acoustic Scene Analysis with Multi-Head Attention NetworksWeimin Wang, Weiran Wang, Ming Sun, Chao Wang. 1191-1195 [doi]
- Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene ClassificationHu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee. 1196-1200 [doi]
- An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial UtterancesHu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee. 1201-1205 [doi]
- Attention-Driven Projections for Soundscape ClassificationDhanunjaya Varma Devalraju, H. Muralikrishna, Padmanabhan Rajan, Dileep Aroor Dinesh. 1206-1210 [doi]
- Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call DetectionPanagiotis Tzirakis, Alexander Shiarella, Robert Ewers, Björn W. Schuller. 1211-1215 [doi]
- Deep Learning Based Open Set Acoustic Scene ClassificationZuzanna Kwiatkowska, Beniamin Kalinowski, Michal Kosmider, Krzysztof Rykaczewski. 1216-1220 [doi]
- Singing Synthesis: With a Little Help from my AttentionOrazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman. 1221-1225 [doi]
- Peking Opera Synthesis via Duration Informed Attention NetworkYusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu. 1226-1230 [doi]
- DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion SystemLiqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Chunlei Zhang, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu. 1231-1235 [doi]
- Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental MusicYuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li. 1236-1240 [doi]
- Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution MusicHaohe Liu, Lei Xie, Jian Wu, Geng Yang. 1241-1245 [doi]
- Continual Learning in Automatic Speech RecognitionSamik Sadhu, Hynek Hermansky. 1246-1250 [doi]
- Speaker Adaptive Training for Speech Recognition Based on Attention-Over-Attention MechanismGenshun Wan, Jia Pan, Qingran Wang, Jianqing Gao, Zhongfu Ye. 1251-1255 [doi]
- Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language GeneratorYan Huang 0028, Jinyu Li, Lei He, Wenning Wei, William Gale, Yifan Gong. 1256-1260 [doi]
- Speech Transformer with Speaker Aware Persistent MemoryYingzhu Zhao, Chongjia Ni, Cheung Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma. 1261-1265 [doi]
- Adaptive Speaker Normalization for CTC-Based Speech RecognitionFenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du. 1266-1270 [doi]
- Unsupervised Domain Adaptation Under Label Space Mismatch for Speech ClassificationAkhil Mathur, Nadia Berthouze, Nicholas D. Lane. 1271-1275 [doi]
- Learning Fast Adaptation on Cross-Accented Speech RecognitionGenta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung. 1276-1280 [doi]
- Black-Box Adaptation of ASR for Accented SpeechKartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi. 1281-1285 [doi]
- Achieving Multi-Accent ASR via Unsupervised Acoustic Model AdaptationM. A. Tugtekin Turan, Emmanuel Vincent 0001, Denis Jouvet. 1286-1290 [doi]
- Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive FilteringRyu Takeda, Kazunori Komatani. 1291-1295 [doi]
- Adversarially Trained Multi-Singer Sequence-to-Sequence Singing SynthesizerJie Wu, Jian Luan. 1296-1300 [doi]
- Prediction of Head Motion from Speech Waveforms with a Canonical-Correlation-Constrained AutoencoderJinHong Lu, Hiroshi Shimodaira. 1301-1305 [doi]
- XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis SystemPeiling Lu, Jie Wu, Jian Luan, Xu Tan 0003, Li Zhou. 1306-1310 [doi]
- Stochastic Talking Face Generation Using Latent Distribution MatchingRavindra Yadav, Ashish Sardana, Vinay P. Namboodiri, Rajesh M. Hegde. 1311-1315 [doi]
- Speech-to-Singing Conversion Based on Boundary Equilibrium GANDa-Yi Wu, Yi-Hsuan Yang. 1316-1320 [doi]
- Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face ImageShunsuke Goto, Kotaro Onishi, Yuki Saito, Kentaro Tachibana, Koichiro Mori. 1321-1325 [doi]
- Speech Driven Talking Head Generation via Attentional Landmarks Based RepresentationWentao Wang, Yan Wang, Jianqing Sun, Qingsong Liu, Jiaen Liang, Teng Li. 1326-1330 [doi]
- Optimization and Evaluation of an Intelligibility-Improving Signal Processing Approach (IISPA) for the Hurricane Challenge 2.0 with FADEMarc René Schädler. 1331-1335 [doi]
- iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric LearningHaoyu Li, Szu-Wei Fu, Yu Tsao 0001, Junichi Yamagishi. 1336-1340 [doi]
- Intelligibility-Enhancing Speech Modifications - The Hurricane Challenge 2.0Jan Rennies, Henning F. Schepker, Cassia Valentini-Botinhao, Martin Cooke. 1341-1345 [doi]
- Exploring Listeners' Speech Rate PreferencesOlympia Simantiraki, Martin Cooke. 1346-1350 [doi]
- Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and ReverberationFelicitas Bederna, Henning F. Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer, Jan Rennies. 1351-1355 [doi]
- A Sound Engineering Approach to Near End Listening EnhancementCarol Chermaz, Simon King. 1356-1360 [doi]
- Enhancing Speech Intelligibility in Text-To-Speech Synthesis Using Speaking Style ConversionDipjyoti Paul, P. V. Muhammed Shifas, Yannis Pantazis, Yannis Stylianou. 1361-1365 [doi]
- Two Different Mechanisms of Movable Mandible for Vocal-Tract Model with Flexible TongueTakayuki Arai. 1366-1370 [doi]
- Improving the Performance of Acoustic-to-Articulatory Inversion by Removing the Training Loss of Noncritical Portions of Articulatory Channels DynamicallyQiang Fang. 1371-1375 [doi]
- Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-VectorsAravind Illa, Prasanta Kumar Ghosh. 1376-1380 [doi]
- Coarticulation as Synchronised Sequential Target Approximation: An EMA StudyZirui Liu, Yi Xu, Feng-fan Hsieh. 1381-1385 [doi]
- Improved Model for Vocal Folds with a Polyp with Potential ApplicationJônatas Santos, Jugurta Montalvão, Israel Santos. 1386-1390 [doi]
- Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker CharacteristicsLin Zhang, Kiyoshi Honda, Jianguo Wei, Seiji Adachi. 1391-1395 [doi]
- Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural NetworkRenuka Mannem, Navaneetha Gaddam, Prasanta Kumar Ghosh. 1396-1400 [doi]
- An Investigation of the Virtual Lip Trajectories During the Production of Bilabial Stops and Nasal at Different Speaking RatesTilak Purohit, Prasanta Kumar Ghosh. 1401-1405 [doi]
- SpEx+: A Complete Time Domain Speaker Extraction NetworkMeng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li 0001. 1406-1410 [doi]
- Atss-Net: Target Speaker Separation via Attention-Based Neural NetworkTingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li. 1411-1415 [doi]
- Multimodal Target Speech Separation with Voice and Face ReferencesLeyuan Qu, Cornelius Weber, Stefan Wermter. 1416-1420 [doi]
- X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction NetworkZining Zhang, Bingsheng He, Zhenjie Zhang. 1421-1425 [doi]
- Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech SeparationChenda Li, Yanmin Qian. 1426-1430 [doi]
- A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party EnvironmentsYunzhe Hao, Jiaming Xu, Jing Shi 0003, Peng Zhang, Lei Qin, Bo Xu 0002. 1431-1435 [doi]
- Time-Domain Target-Speaker Speech Separation with Waveform-Based Speaker EmbeddingJianshu Zhao, Shengzhou Gao, Takahiro Shinozaki. 1436-1440 [doi]
- Listen to What You Want: Neural Network-Based Universal Sound SelectorTsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki. 1441-1445 [doi]
- Crossmodal Sound Retrieval Based on Specific Target Co-Occurrence Denoted with Weak LabelsMasahiro Yasuda, Yasunori Ohishi, Yuma Koizumi, Noboru Harada. 1446-1450 [doi]
- Speaker-Aware Monaural Speech SeparationJiahao Xu, Kun Hu, Chang Xu, Duc Chung Tran, Zhiyong Wang 0001. 1451-1455 [doi]
- A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image RegionsLiming Wang, Mark Hasegawa-Johnson. 1456-1460 [doi]
- Efficient Wait-k Models for Simultaneous Machine TranslationMaha Elbayad, Laurent Besacier, Jakob Verbeek. 1461-1465 [doi]
- Investigating Self-Supervised Pre-Training for End-to-End Speech TranslationHa Nguyen, Fethi Bougares, Natalia A. Tomashenko, Yannick Estève, Laurent Besacier. 1466-1470 [doi]
- Contextualized Translation of Automatically Segmented SpeechMarco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi. 1471-1475 [doi]
- Self-Training for End-to-End Speech TranslationJuan Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang. 1476-1480 [doi]
- Evaluating and Optimizing Prosodic Alignment for Automatic DubbingMarcello Federico, Yogesh Virkar, Robert Enyedi, Roberto Barra-Chicote. 1481-1485 [doi]
- Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio DatasetsYasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass. 1486-1490 [doi]
- Self-Supervised Representations Improve End-to-End Speech TranslationAnne Wu, Changhan Wang, Juan Pino, Jiatao Gu. 1491-1495 [doi]
- Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw WaveformsJee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu. 1496-1500 [doi]
- Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration UtterancesYoungmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim. 1501-1505 [doi]
- An Adaptive X-Vector Model for Text-Independent Speaker VerificationBin Gu, Wu Guo, Fenglin Ding, Zhen-Hua Ling, Jun Du. 1506-1510 [doi]
- Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort ConditionsSanti Prieto, Alfonso Ortega Giménez, Iván López-Espejo, Eduardo Lleida. 1511-1515 [doi]
- Sum-Product Networks for Robust Automatic Speaker IdentificationAaron Nicolson, Kuldip K. Paliwal. 1516-1520 [doi]
- Segment Aggregation for Short Utterances Speaker Verification Using Raw WaveformsSeung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu. 1521-1525 [doi]
- Siamese X-Vector Reconstruction for Domain Adapted Speaker RecognitionShai Rozenberg, Hagai Aronowitz, Ron Hoory. 1526-1529 [doi]
- Speaker Re-Identification with Speaker Dependent Speech EnhancementYanpei Shi, Qiang Huang 0008, Thomas Hain. 1530-1534 [doi]
- Blind Speech Signal Quality Estimation for Speaker Verification SystemsGalina Lavrentyeva, Marina Volkova, Anastasia Avdeeva, Sergey Novoselov, Artem Gorlanov, Tseren Andzhukaev, Artem Ivanov, Alexander Kozlov. 1535-1539 [doi]
- Investigating Robustness of Adversarial Samples Detection for Automatic Speaker VerificationXu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng. 1540-1544 [doi]
- Modeling ASR Ambiguity for Neural Dialogue State TrackingVaishali Pal, Fabien Guillot, Manish Shrivastava 0001, Jean-Michel Renders, Laurent Besacier. 1545-1549 [doi]
- ASR Error Correction with Augmented Transformer for Entity RetrievalHaoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal, Yang Liu. 1550-1554 [doi]
- Large-Scale Transfer Learning for Low-Resource Spoken Language UnderstandingXueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao. 1555-1559 [doi]
- Data Balancing for Boosting Performance of Low-Frequency Classes in Spoken Language UnderstandingJudith Gaspers, Quynh Ngoc Thi Do, Fabian Triefenbach. 1560-1564 [doi]
- An Interactive Adversarial Reward Learning-Based Spoken Language Understanding SystemYu Wang, Yilin Shen, Hongxia Jin. 1565-1569 [doi]
- Style Attuned Pre-Training and Parameter Efficient Fine-Tuning for Spoken Language UnderstandingJin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-wen Li. 1570-1574 [doi]
- Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial TrainingShota Orihashi, Mana Ihori, Tomohiro Tanaka, Ryo Masumura. 1575-1579 [doi]
- Deep F-Measure Maximization for End-to-End Speech UnderstandingLeda Sari, Mark Hasegawa-Johnson. 1580-1584 [doi]
- An Effective Domain Adaptive Post-Training Method for BERT in Response SelectionTaesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, HeuiSeok Lim. 1585-1589 [doi]
- Confidence Measure for Speech-to-Concept End-to-End Spoken Language UnderstandingAntoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin. 1590-1594 [doi]
- Attention to Indexical Information Improves Voice RecallGrant L. McGuire, Molly Babel. 1595-1599 [doi]
- Categorization of Whistled Consonants by French SpeakersAnaïs Tran Ngoc, Julien Meyer, Fanny Meunier. 1600-1604 [doi]
- Whistled Vowel Identification by French ListenersAnaïs Tran Ngoc, Julien Meyer, Fanny Meunier. 1605-1609 [doi]
- F0 Slope and Mean: Cues to Speech Segmentation in FrenchMaria del Mar Cordero, Fanny Meunier, Nicolas Grimault, Stéphane Pota, Elsa Spinelli. 1610-1614 [doi]
- Does French Listeners' Ability to Use Accentual Information at the Word Level Depend on the Ear of Presentation?Amandine Michelas, Sophie Dufour. 1615-1619 [doi]
- A Perceptual Study of the Five Level Tones in Hmu (Xinzhai Variety)Wen Liu. 1620-1623 [doi]
- Mandarin and English Adults' Cue-Weighting of Lexical StressZhen Zeng, Karen Mattock, Liquan Liu, Varghese Peter, Alba Tuninetti, Feng-Ming Tsao. 1624-1628 [doi]
- Age-Related Differences of Tone Perception in Mandarin-Speaking SeniorsYan Feng, Gang Peng, William Shi-Yuan Wang. 1629-1633 [doi]
- Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI InterlocutorsGeorgia Zellou, Michelle Cohn. 1634-1638 [doi]
- Identifying Important Time-Frequency Locations in Continuous Speech UtterancesHassan Salami Kavaki, Michael I. Mandel. 1639-1643 [doi]
- Raw Sign and Magnitude Spectra for Multi-Head Acoustic ModellingErfan Loweimi, Peter Bell 0001, Steve Renals. 1644-1648 [doi]
- Robust Raw Waveform Speech Recognition Using Relevance Weighted RepresentationsPurvi Agrawal, Sriram Ganapathy. 1649-1653 [doi]
- A Deep 2D Convolutional Network for Waveform-Based Speech RecognitionDino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals. 1654-1658 [doi]
- Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-ConvolutionsLudwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll. 1659-1663 [doi]
- An Alternative to MFCCs for ASRPegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur. 1664-1667 [doi]
- Phase Based Spectro-Temporal Features for Building a Robust ASR SystemAnirban Dutta, Ashishkumar Prabhakar Gudmalwar, Ch. V. Rama Rao. 1668-1672 [doi]
- Deep Scattering Power Spectrum Features for Robust Speech RecognitionNeethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals. 1673-1677 [doi]
- FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech RecognitionTitouan Parcollet, Xinchi Qiu, Nicholas D. Lane. 1678-1682 [doi]
- Bandpass Noise Generation and Augmentation for Unified ASRKshitiz Kumar, Bo Ren, Yifan Gong, Jian Wu. 1683-1687 [doi]
- Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech RecognitionAnurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy. 1688-1692 [doi]
- Introducing the VoicePrivacy InitiativeNatalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco. 1693-1697 [doi]
- The Privacy ZEBRA: Zero Evidence Biometric Recognition AssessmentAndreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans. 1698-1702 [doi]
- X-Vector Singular Value Modification and Statistical-Based Decomposition with Ensemble Regression Modeling for Speaker Anonymization SystemCandy Olivia Mawalim, Kasorn Galajit, Jessada Karnjana, Masashi Unoki. 1703-1707 [doi]
- A Comparative Study of Speech Anonymization MetricsMohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001. 1708-1712 [doi]
- Design Choices for X-Vector Based Speaker AnonymizationBrij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang 0037, Emmanuel Vincent 0001, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi. 1713-1717 [doi]
- Speech Pseudonymisation Assessment Using Voice Similarity MatricesPaul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans. 1718-1722 [doi]
- g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark DatasetKyubyong Park, Seanie Lee. 1723-1727 [doi]
- A Mask-Based Model for Mandarin Chinese Polyphone DisambiguationHaiteng Zhang, Huashan Pan, Xiulin Li. 1728-1732 [doi]
- Perception of Concatenative vs. Neural Text-To-Speech (TTS): Differences in Intelligibility in Noise and Language AttitudesMichelle Cohn, Georgia Zellou. 1733-1737 [doi]
- Enhancing Sequence-to-Sequence Text-to-Speech with MorphologyJason Taylor, Korin Richmond. 1738-1742 [doi]
- Deep MOS Predictor for Synthetic Speech Using Cluster-Based ModelingYeunju Choi, Youngmoon Jung, Hoirin Kim. 1743-1747 [doi]
- Deep Learning Based Assessment of Synthetic Speech NaturalnessGabriel Mittag, Sebastian Möller 0001. 1748-1752 [doi]
- Distant Supervision for Polyphone Disambiguation in Mandarin ChineseJiawen Zhang, Yuanyuan Zhao, Jiaqi Zhu, Jinba Xiao. 1753-1757 [doi]
- An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis DatasetsPilar Oplustil Gallegos, Jennifer Williams, Joanna Rownicka, Simon King. 1758-1762 [doi]
- Understanding the Effect of Voice Quality and Accent on Talker SimilarityAnurag Das, Guanlong Zhao, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna. 1763-1767 [doi]
- Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition Without Length BiasWei Zhou, Ralf Schlüter, Hermann Ney. 1768-1772 [doi]
- Transformer with Bidirectional Decoder for Speech RecognitionXi Chen, Songyang Zhang, Dandan Song, Peng Ouyang, Shouyi Yin. 1773-1777 [doi]
- An Investigation of Phone-Based Subword Units for End-to-End Speech RecognitionWeiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher. 1778-1782 [doi]
- Combination of End-to-End and Hybrid Models for Speech RecognitionJeremy H. M. Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li, Yifan Gong. 1783-1787 [doi]
- Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech RecognitionJihwan Kim, Jisung Wang, Sangki Kim, Yeha Lee. 1788-1792 [doi]
- Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech RecognitionAbhinav Garg, Ashutosh Gupta, Dhananjaya Gowda, Shatrughan Singh, Chanwoo Kim. 1793-1797 [doi]
- LVCSR with Transformer Language ModelsEugen Beck, Ralf Schlüter, Hermann Ney. 1798-1802 [doi]
- DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and AdaptationYi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-yi Lee. 1803-1807 [doi]
- Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission CorpusLukas Stappen, Georgios Rizos, Madina Hasan, Thomas Hain, Björn W. Schuller. 1808-1812 [doi]
- Individual Variation in Language Attitudes Toward Voice-AI: The Role of Listeners' Autistic-Like TraitsMichelle Cohn, Melina Sarian, Kristin Predeck, Georgia Zellou. 1813-1817 [doi]
- Differences in Gradient Emotion Perception: Human vs. Alexa VoicesMichelle Cohn, Eran Raveh, Kristin Predeck, Iona Gessinger, Bernd Möbius, Georgia Zellou. 1818-1822 [doi]
- The MSP-Conversation CorpusLuz Martinez-Lucas, Mohammed Abdelwahab 0001, Carlos Busso. 1823-1827 [doi]
- Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal ProcessingFuxiang Tao, Anna Esposito, Alessandro Vinciarelli. 1828-1832 [doi]
- Speech Sentiment and Customer Satisfaction Estimation in Socialbot ConversationsYelin Kim, Joshua Levy, Yang Liu. 1833-1837 [doi]
- Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral ArgumentsHaley Lepp, Gina-Anne Levow. 1838-1842 [doi]
- Are Germans Better Haters Than Danes? Language-Specific Implicit Prosodies of Types of Hate Speech and How They Relate to Perceived Severity and Societal RulesJana Neitsch, Oliver Niebuhr. 1843-1847 [doi]
- An Objective Voice Gender Scoring System and Identification of the Salient Acoustic MeasuresFuling Chen, Roberto Togneri, Murray Maybery, Diana Tan. 1848-1852 [doi]
- How Ordinal Are Your Data?Sadari Jayawardena, Julien Epps, Zhaocheng Huang. 1853-1857 [doi]
- Correlating Cepstra with Formant Frequencies: Implications for Phonetically-Informed Forensic Voice ComparisonVincent Hughes, Frantz Clermont, Philip Harrison. 1858-1862 [doi]
- Prosody and Breathing: A Comparison Between Rhetorical and Information-Seeking Questions in German and Brazilian PortugueseJana Neitsch, Plínio A. Barbosa, Oliver Niebuhr. 1863-1867 [doi]
- Scaling Processes of Clause Chains in PitjantjatjaraRebecca Defina, Catalina Torres, Hywel Stoakes. 1868-1872 [doi]
- Neutralization of Voicing Distinction of Stops in Tohoku Dialects of Japanese: Field Work and Acoustic MeasurementsAi Mizoguchi, Ayako Hashimoto, Sanae Matsui, Setsuko Imatomi, Ryunosuke Kobayashi, Mafuyu Kitahara. 1873-1877 [doi]
- Correlation Between Prosody and Pragmatics: Case Study of Discourse Markers in French and EnglishLou Lee, Denis Jouvet, Katarina Bartkova, Yvon Keromnes, Mathilde Dargnat. 1878-1882 [doi]
- An Analysis of Prosodic Prominence Cues to Information Structure in Egyptian ArabicDina El Zarka, Anneliese Kelterer, Barbara Schuppler. 1883-1887 [doi]
- Lexical Stress in UrduBenazir Mumtaz, Tina Bögel, Miriam Butt. 1888-1892 [doi]
- Vocal Markers from Sustained Phonation in Huntington's DiseaseRachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan-Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi. 1893-1897 [doi]
- How Rhythm and Timbre Encode Mooré Language in Bendré Drummed SpeechLaure Dentel, Julien Meyer. 1898-1902 [doi]
- Interaction of Tone and Voicing in MizoWendy Lalhminghlui, Priyankoo Sarmah. 1903-1907 [doi]
- Mandarin Lexical Tones: A Corpus-Based Study of Word Length, Syllable Position and Prosodic Position on DurationYaru Wu, Martine Adda-Decker, Lori Lamel. 1908-1912 [doi]
- An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin SpeechYingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang, Peter Birkholz. 1913-1917 [doi]
- Integrating the Application and Realization of Mandarin 3rd Tone Sandhi in the Resolution of Sentence AmbiguityWei Lai, Aini Li. 1918-1922 [doi]
- Neutral Tone in Changde MandarinZhenrui Zhang, Fang Hu. 1923-1927 [doi]
- Pitch Declination and Final Lowering in Northeastern MandarinPing Cui, Jianjing Kuang. 1928-1932 [doi]
- Variation in Spectral Slope and Interharmonic Noise in Cantonese TonesPhil Rose. 1933-1937 [doi]
- The Acoustic Realization of Mandarin Tones in Fast SpeechPing Tang, Shanpeng Li. 1938-1941 [doi]
- Do Face Masks Introduce Bias in Speech Technologies? The Case of Automated Scoring of Speaking ProficiencyAnastassia Loukina, Keelan Evanini, Matthew Mulholland, Ian Blood, Klaus Zechner. 1942-1946 [doi]
- A Low Latency ASR-Free End to End Spoken Language Understanding SystemMohamed Mhiri 0002, Samuel Myer, Vikrant Singh Tomar. 1947-1951 [doi]
- An Audio-Based Wakeword-Independent Verification SystemJoe Wang, Rajath Kumar, Mike Rodehorst, Brian Kulis, Shiv Naga Prasad Vitaladevuni. 1952-1956 [doi]
- Learnable Spectro-Temporal Receptive Fields for Robust Voice Type DiscriminationTyler Vuong, Yangyang Xia, Richard M. Stern. 1957-1961 [doi]
- Low Latency Speech Recognition Using End-to-End PrefetchingShuo-Yiin Chang, Bo Li 0028, David Rybach, Yanzhang He, Wei Li, Tara N. Sainath, Trevor Strohman. 1962-1966 [doi]
- AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech ClassificationJingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie. 1967-1971 [doi]
- Building a Robust Word-Level Wakeword Verification NetworkRajath Kumar, Mike Rodehorst, Joe Wang, Jiacheng Gu, Brian Kulis. 1972-1976 [doi]
- A Transformer-Based Audio Captioning Model with Keyword EstimationYuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito. 1977-1981 [doi]
- Neural Architecture Search for Keyword SpottingTong Mo, Yakun Yu, Mohammad Salameh, Di Niu, Shangling Jui. 1982-1986 [doi]
- Small-Footprint Keyword Spotting with Multi-Scale Temporal ConvolutionXimin Li, Xiaodong Wei, Xiaowei Qin. 1987-1991 [doi]
- Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform ModelXin Wang, Junichi Yamagishi. 1992-1996 [doi]
- Unconditional Audio Generation with Generative Adversarial Networks and Cycle RegularizationJen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, Yi-Hsuan Yang. 1997-2001 [doi]
- Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex SpectraToru Nakashika. 2002-2006 [doi]
- Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length EmbeddingSeungwoo Choi, Seungju Han, Dongyoung Kim, Sungjoo Ha. 2007-2011 [doi]
- Reformer-TTS: Neural Speech Synthesis with Reformer NetworkHyeong Rae Ihm, Joun Yeop Lee, Byoung Jin Choi, Sung Jun Cheon, Nam Soo Kim. 2012-2016 [doi]
- CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram ConversionTakuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. 2017-2021 [doi]
- High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent LatencyNikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Aimilios Chalamandaris, Georgia Maniati, Panos Kakoulidis, Spyros Raptis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis. 2022-2026 [doi]
- DurIAN: Duration Informed Attention Network for Speech SynthesisChengzhu Yu, Heng Lu, Na Hu, Meng Yu 0003, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su 0002, Dong Yu 0001. 2027-2031 [doi]
- Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian ProcessesKentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari. 2032-2036 [doi]
- A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian LanguagesMano Ranjith Kumar M., Sudhanshu Srivastava, Anusha Prakash, Hema A. Murthy. 2037-2041 [doi]
- The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & MasksBjörn W. Schuller, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia Hamilton, Shahin Amiriparian, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, Harald Baumeister, Alexis Deighton MacIntyre, Simone Hantke. 2042-2046 [doi]
- Learning Higher Representations from Pre-Trained Deep Models with Data Augmentation for the COMPARE 2020 Challenge Mask TaskTomoya Koike, Kun Qian 0003, Björn W. Schuller, Yoshiharu Yamamoto. 2047-2051 [doi]
- Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on SpectrogramsSteffen Illium, Robert Müller, Andreas Sedlmeier, Claudia Linnhoff-Popien. 2052-2056 [doi]
- Surgical Mask Detection with Deep Recurrent Phonetic ModelsPhilipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave. 2057-2061 [doi]
- Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE ChallengeClaude Montacié, Marie-José Caraty. 2062-2066 [doi]
- Exploring Text and Audio Embeddings for Multi-Dimension Elderly Emotion RecognitionMariana Julião, Alberto Abad, Helena Moniz. 2067-2071 [doi]
- Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-ChallengesMaxim Markitantov, Denis Dresvyanskiy, Danila Mamontov, Heysem Kaya, Wolfgang Minker, Alexey Karpov 0001. 2072-2076 [doi]
- Analyzing Breath Signals for the Interspeech 2020 ComParE ChallengeJohn Mendonça, Francisco Teixeira, Isabel Trancoso, Alberto Abad. 2077-2081 [doi]
- Deep Attentive End-to-End Continuous Breath Sensing from SpeechAlexis Deighton MacIntyre, Georgios Rizos, Anton Batliner, Alice Baird, Shahin Amiriparian, Antonia Hamilton, Björn W. Schuller. 2082-2086 [doi]
- Paralinguistic Classification of Mask Wearing by Image Classifiers and FusionJeno Szep, Salim Hariri. 2087-2091 [doi]
- Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic ChallengeZiqing Yang, Zifan An, Zehao Fan, Chengye Jing, Houwei Cao. 2092-2096 [doi]
- Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion RecognitionGizem Sogancioglu, Oxana Verkholyak, Heysem Kaya, Dmitrii Fedotov, Tobias Cadèe, Albert Ali Salah, Alexey Karpov 0001. 2097-2101 [doi]
- Are you Wearing a Mask? Improving Mask Detection from Speech Using Augmentation by Cycle-Consistent GANsNicolae-Catalin Ristea, Radu-Tudor Ionescu. 2102-2106 [doi]
- 1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTMKshitiz Kumar, Chaojun Liu, Yifan Gong, Jian Wu. 2107-2111 [doi]
- Low Latency End-to-End Streaming Speech Recognition with a Scout NetworkChengyi Wang 0002, Yu Wu 0012, Liang Lu, Shujie Liu 0001, Jinyu Li, Guoli Ye, Ming Zhou. 2112-2116 [doi]
- Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech RecognitionGakuto Kurata, George Saon. 2117-2121 [doi]
- Parallel Rescoring with Transformer for Streaming On-Device Speech RecognitionWei Li, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He. 2122-2126 [doi]
- Improved Hybrid Streaming ASR with Transformer Language ModelsPau Baquero-Arnal, Javier Jorge, Adrià Giménez, Joan Albert Silvestre-Cerdà, Javier Iranzo-Sánchez, Albert Sanchís, Jorge Civera, Alfons Juan. 2127-2131 [doi]
- Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented MemoryChunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-feng Yeh, Frank Zhang. 2132-2136 [doi]
- Enhancing Monotonic Multihead Attention for Streaming ASRHirofumi Inaguma, Masato Mimura, Tatsuya Kawahara. 2137-2141 [doi]
- Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech RecognitionShiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie. 2142-2146 [doi]
- High Performance Sequence-to-Sequence Model for Streaming Speech RecognitionThai Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker, Alex Waibel. 2147-2151 [doi]
- Transfer Learning Approaches for Streaming End-to-End Speech Recognition SystemVikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li. 2152-2156 [doi]
- Tackling the ADReSS Challenge: A Multimodal Approach to the Automated Recognition of Alzheimer's DementiaMatej Martinc, Senja Pollak. 2157-2161 [doi]
- Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer's DiseaseJiahong Yuan, Yuchen Bian, Xingyu Cai, Jiaji Huang, Zheng Ye, Kenneth Church 0001. 2162-2166 [doi]
- To BERT or not to BERT: Comparing Speech and Language-Based Approaches for Alzheimer's Disease DetectionAparna Balagopalan, Benjamin Eyre, Frank Rudzicz, Jekaterina Novikova. 2167-2171 [doi]
- Alzheimer's Dementia Recognition Through Spontaneous Speech: The ADReSS ChallengeSaturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney. 2172-2176 [doi]
- Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its SeverityRaghavendra Pappagari, Jaejin Cho, Laureano Moro-Velázquez, Najim Dehak. 2177-2181 [doi]
- A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia RecognitionNicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä. 2182-2186 [doi]
- Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer's Dementia Recognition from Spontaneous SpeechMorteza Rohanian, Julian Hough, Matthew Purver. 2187-2191 [doi]
- Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous SpeechThomas Searle, Zina M. Ibrahim, Richard J. B. Dobson. 2192-2196 [doi]
- Multiscale System for Alzheimer's Dementia Recognition Through Spontaneous SpeechErik Edwards, Charles Dognin, Bajibabu Bollepalli, Maneesh Kumar Singh 0001. 2197-2201 [doi]
- The INESC-ID Multi-Modal System for the ADReSS 2020 ChallengeAnna Pompili, Thomas Rolland, Alberto Abad. 2202-2206 [doi]
- Exploring MMSE Score Prediction Using Verbal and Non-Verbal CuesShahla Farzana, Natalie Parde. 2207-2211 [doi]
- Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its SeverityUtkarsh Sarawgi, Wazeer Zulfikar, Nouran Soliman, Pattie Maes. 2212-2216 [doi]
- Exploiting Multi-Modal Features from Pre-Trained Networks for Alzheimer's Dementia RecognitionJunghyun Koo, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee. 2217-2221 [doi]
- Automated Screening for Alzheimer's Dementia Through Spontaneous SpeechMuhammad Shehram Shah Syed, Zafi Sherhan Syed, Margaret Lech, Elena Pirogova. 2222-2226 [doi]
- NEC-TT Speaker Verification System for SRE'19 CTS ChallengeKong-Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda. 2227-2231 [doi]
- THUEE System for NIST SRE19 CTS ChallengeRuyun Li, Tianyu Liang, Dandan Song, Yi Liu 0049, Yangcheng Wu, Can Xu, Peng Ouyang, XianWei Zhang, Xianhong Chen, Weiqiang Zhang 0001, Shouyi Yin, Liang He. 2232-2236 [doi]
- Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe Submission to NIST SRE Challenge 2019Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, Gaël Le Lan. 2237-2241 [doi]
- Audio-Visual Speaker Recognition with a Cross-Modal Discriminative NetworkRuijie Tao, Rohan Kumar Das, Haizhou Li 0001. 2242-2246 [doi]
- Multimodal Association for Speaker VerificationSuwon Shon, James R. Glass. 2247-2251 [doi]
- Multi-Modality Matters: A Performance Leap on VoxCelebZhengyang Chen, Shuai Wang, Yanmin Qian. 2252-2256 [doi]
- Cross-Domain Adaptation with Discrepancy Minimization for Text-Independent Forensic Speaker VerificationZhenyu Wang, Wei Xia, John H. L. Hansen. 2257-2261 [doi]
- Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive BiasMufan Sang, Wei Xia, John H. L. Hansen. 2262-2266 [doi]
- JukeBox: A Multilingual Singer Recognition DatasetAnurag Chowdhury, Austin Cozzo, Arun Ross. 2267-2271 [doi]
- Speaker Identification for Household Scenarios with Self-Attention and Adversarial TrainingRuirui Li, Jyun-Yu Jiang, Xian Wu, Chu-Cheng Hsieh, Andreas Stolcke. 2272-2276 [doi]
- Streaming Keyword Spotting on Mobile DevicesOleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirkó Visontai, Stella Laurenzo. 2277-2281 [doi]
- Metadata-Aware End-to-End Keyword SpottingHongyi Liu 0004, Apurva Abhyankar, Yuriy Mishchenko, Thibaud Sénéchal, Gengshen Fu, Brian Kulis, Noah D. Stein, Anish Shah, Shiv Naga Prasad Vitaladevuni. 2282-2286 [doi]
- Adversarial Audio: A New Information Hiding MethodYehao Kong, Jiliang Zhang 0002. 2287-2291 [doi]
- S2IGAN: Speech-to-Image Generation via Adversarial LearningXinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg. 2292-2296 [doi]
- Automatic Speech Recognition Benchmark for Air-Traffic CommunicationsJuan Zuluaga-Gomez, Petr Motlícek, Qingran Zhan, Karel Veselý, Rudolf Braun. 2297-2301 [doi]
- Whisper Augmented End-to-End/Hybrid Speech Recognition System - CycleGAN ApproachPrithvi R. R. Gudepu, Gowtham P. Vadisetti, Abhishek Niranjan, Kinnera Saranu, Raghava Sarma, M. Ali Basha Shaik, Periyasamy Paramasivam. 2302-2306 [doi]
- Risk Forecasting from Earnings Calls Acoustics and Network CorrelationsRamit Sawhney, Arshiya Aggarwal, Piyush Khanna, Puneet Mathur, Taru Jain, Rajiv Ratn Shah. 2307-2311 [doi]
- SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition SystemsHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar. 2312-2316 [doi]
- Evaluating Automatically Generated Phoneme Captions for ImagesJustin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg. 2317-2321 [doi]
- An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of ChunksWei-Cheng Lin, Carlos Busso. 2322-2326 [doi]
- Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-Corpus Setting for Speech Emotion RecognitionSiddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller. 2327-2331 [doi]
- Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion LabelsTakuya Fujioka, Takeshi Homma, Kenji Nagamatsu. 2332-2336 [doi]
- Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent RepresentationJiaxing Liu, Zhilei Liu, Longbiao Wang, Yuan Gao, Lili Guo, Jianwu Dang. 2337-2341 [doi]
- Reconciliation of Multiple Corpora for Speech Emotion Recognition by Multiple Classifiers with an Adversarial Corpus DiscriminatorZhi Zhu, Yoshinao Sato. 2342-2346 [doi]
- Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural NetworksZheng Lian, Jianhua Tao, Bin Liu, Jian Huang 0014, Zhanlei Yang, Rongjun Li. 2347-2351 [doi]
- EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion ClassificationShuiyang Mao, P. C. Ching, Tan Lee. 2352-2356 [doi]
- Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion RecognitionShuiyang Mao, P. C. Ching, C. C. Jay Kuo, Tan Lee. 2357-2361 [doi]
- The Effect of Language Proficiency on the Perception of Segmental Foreign AccentRubén Pérez Ramón, María Luisa García Lecumberri, Martin Cooke. 2362-2366 [doi]
- The Effect of Language Dominance on the Selective Attention of Segments and Tones in Urdu-Cantonese SpeakersYi Liu, Jinghong Ning. 2367-2371 [doi]
- The Effect of Input on the Production of English Tense and Lax Vowels by Chinese Learners: Evidence from an Elementary School in ChinaMengrou Li, Ying Chen, Jie Cui. 2372-2376 [doi]
- Exploring the Use of an Artificial Accent of English to Assess Phonetic Learning in Monolingual and Bilingual SpeakersLaura Spinu, Jiwon Hwang, Nadya Pincus, Mariana Vasilita. 2377-2381 [doi]
- Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast SpeechShammur A. Chowdhury, Younes Samih, Mohamed Eldesouki, Ahmed Ali. 2382-2386 [doi]
- Bilingual Acoustic Voice Variation is Similarly Structured Across LanguagesKhia A. Johnson, Molly Babel, Robert A. Fuhrman. 2387-2391 [doi]
- Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech RecognitionHaobo Zhang, Haihua Xu, Van Tung Pham, Hao Huang, Eng Siong Chng. 2392-2396 [doi]
- Perception and Production of Mandarin Initial Stops by Native Urdu SpeakersDan Du, Xianjin Zhu, Zhu Li, Jinsong Zhang. 2397-2401 [doi]
- Now You're Speaking My Language: Visual Language IdentificationTriantafyllos Afouras, Joon Son Chung, Andrew Zisserman. 2402-2406 [doi]
- The Different Enhancement Roles of Covarying Cues in Thai and Mandarin TonesNari Rhee, Jianjing Kuang. 2407-2411 [doi]
- Singing Voice Extraction with Attention-Based Spectrograms FusionHao Shi, Longbiao Wang, Sheng Li 0010, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, Hiroshi Seki. 2412-2416 [doi]
- Incorporating Broad Phonetic Information for Speech EnhancementYen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-Weih Hung, Yu Tsao 0001. 2417-2421 [doi]
- A Recursive Network with Dynamic Attention for Monaural Speech EnhancementAndong Li, Chengshi Zheng, Cunhang Fan, Renhua Peng, Xiaodong Li. 2422-2426 [doi]
- Constrained Ratio Mask for Speech Enhancement Using DNNHongjiang Yu, Wei-Ping Zhu, Yuhong Yang. 2427-2431 [doi]
- SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental LearningChi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao 0001. 2432-2436 [doi]
- Adaptive Neural Speech Enhancement with a Denoising Variational AutoencoderYoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii. 2437-2441 [doi]
- Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural NetworksAhmet E. Bulut, Kazuhito Koishida. 2442-2446 [doi]
- Single-Channel Speech Enhancement by Subspace Affinity MinimizationDung N. Tran, Kazuhito Koishida. 2447-2451 [doi]
- Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech EnhancementHaoyu Li, Junichi Yamagishi. 2452-2456 [doi]
- NAAGN: Noise-Aware Attention-Gated Network for Speech EnhancementFeng Deng, Tao Jiang, Xiaorui Wang, Chen Zhang, Yan Li. 2457-2461 [doi]
- Online Monaural Speech Enhancement Using Delayed Subband LSTMXiaofei Li, Radu Horaud. 2462-2466 [doi]
- INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and DenoisingMaximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt. 2467-2471 [doi]
- DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech EnhancementYanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie. 2472-2476 [doi]
- Dual-Signal Transformation LSTM Network for Real-Time Noise SuppressionNils L. Westhausen, Bernd T. Meyer. 2477-2481 [doi]
- A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband SpeechJean-Marc Valin, Umut Isik, Neerad Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy. 2482-2486 [doi]
- PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased LossUmut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy. 2487-2491 [doi]
- The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge ResultsChandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan 0003, Johannes Gehrke. 2492-2496 [doi]
- The Implication of Sound Level on Spatial Selective Auditory Attention for Cochlear Implant Users: Behavioral and Electrophysiological MeasurementSara Akbarzadeh, Sungmin Lee, Chin Tuan Tan. 2497-2501 [doi]
- Enhancing the Interaural Time Difference of Bilateral Cochlear Implants with the Temporal Limits EncoderYangyang Wan, Huali Zhou, Qinglin Meng, Nengheng Zheng. 2502-2506 [doi]
- Speech Clarity Improvement by Vocal Self-Training Using a Hearing Impairment Simulator and its Correlation with an Auditory Modulation IndexToshio Irino, Soichi Higashiyama, Hanako Yoshigi. 2507-2511 [doi]
- Investigation of Phase Distortion on Perceived Speech Quality for Hearing-Impaired ListenersZhuohuang Zhang, Donald S. Williamson, Yi Shen 0008. 2512-2516 [doi]
- EEG-Based Short-Time Auditory Attention Detection Using Multi-Task Deep LearningZhuo Zhang, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Di Zhou, Longbiao Wang. 2517-2521 [doi]
- Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders - Step 1: CNN Model-Based Phone ClassificationSondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, Virginie Woisard. 2522-2526 [doi]
- Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature AugmentationBahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Annalena Venneri, Traci Walker, Markus Reuber, Heidi Christensen. 2527-2531 [doi]
- UncommonVoice: A Crowdsourced Dataset of Dysphonic SpeechMeredith Moore, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan. 2532-2536 [doi]
- Towards Automatic Assessment of Voice Disorders: A Clinical ApproachPurva Barche, Krishna Gurugubelli, Anil Kumar Vuppala. 2537-2541 [doi]
- BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple LanguagesAbhishek Shivkumar, Jack Weston, Raphael Lenain, Emil Fristed. 2542-2546 [doi]
- Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-Footprint Keyword SpottingMenglong Xu, Xiao-lei Zhang. 2547-2551 [doi]
- Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword SpottingThéodore Bluche, Thibault Gisselbrecht. 2552-2556 [doi]
- Deep Convolutional Spiking Neural Networks for Keyword SpottingEmre Yilmaz, Özgür Bora Gevrek, Jibin Wu, Yuxiang Chen, Xuanbo Meng, Haizhou Li 0001. 2557-2561 [doi]
- Domain Aware Training for Far-Field Small-Footprint Keyword SpottingHaiwei Wu, Yan Jia, Yuanfei Nie, Ming Li. 2562-2566 [doi]
- Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword SpottingKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia 0001, Helen Meng, Binheng Song. 2567-2571 [doi]
- Deep Template Matching for Small-Footprint and Configurable Keyword SpottingPeng Zhang, Xueliang Zhang. 2572-2576 [doi]
- Multi-Scale Convolution for Robust Keyword SpottingChen Yang, Xue Wen, Liming Song. 2577-2581 [doi]
- An Investigation of Few-Shot Learning in Spoken Term ClassificationYangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li. 2582-2586 [doi]
- End-to-End Keyword Search Based on Attention and Energy Scorer for Low Resource LanguagesZeyu Zhao, Wei-Qiang Zhang. 2587-2591 [doi]
- Stacked 1D Convolutional Networks for End-to-End Small Footprint Voice Trigger DetectionTakuya Higuchi, Mohammad Ghasemzadeh 0003, Kisun You, Chandra Dhir. 2592-2596 [doi]
- Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic EnvironmentsJens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach. 2597-2601 [doi]
- Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2Xueshuai Zhang, Wenchao Wang, Pengyuan Zhang. 2602-2606 [doi]
- The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02Qingjian Lin, Tingle Li, Ming Li. 2607-2611 [doi]
- "This is Houston. Say again, please". The Behavox System for the Apollo-11 Fearless Steps Challenge (Phase II)Arseniy Gorin, Daniil Kulko, Steven Grima, Alex Glasman. 2612-2616 [doi]
- FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo DataAditya Joglekar, John H. L. Hansen, Meena Chandra Shekhar, Abhijeet Sangwan. 2617-2621 [doi]
- Separating Varying Numbers of Sources with Auxiliary Autoencoding LossYi Luo, Nima Mesgarani. 2622-2626 [doi]
- On Synthesis for Supervised Monaural Speech Separation in Time DomainJingjing Chen, Qirong Mao, Dong Liu. 2627-2631 [doi]
- Learning Better Speech Representations by Worsening InterferenceJun Wang. 2632-2636 [doi]
- Asteroid: The PyTorch-Based Audio Source Separation Toolkit for ResearchersManuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent 0001. 2637-2641 [doi]
- Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech SeparationJingjing Chen, Qirong Mao, Dong Liu. 2642-2646 [doi]
- Conv-TasSAN: Separative Adversarial Network Based on Conv-TasNetChengyun Deng, Yi Zhang, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li. 2647-2651 [doi]
- Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream SeparationKeisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach. 2652-2656 [doi]
- Unsupervised Audio Source Separation Using Generative PriorsVivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Rushil Anirudh, Andreas Spanias. 2657-2661 [doi]
- Adversarial Latent Representation Learning for Speech EnhancementYuanhang Qiu, Ruili Wang. 2662-2666 [doi]
- An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler DivergenceYang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen. 2667-2671 [doi]
- Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech EnhancementLu Zhang, Mingjiang Wang. 2672-2676 [doi]
- VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech RecognitionQuan Wang, Ignacio Lopez-Moreno, Mert Saglam, Kevin W. Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein. 2677-2681 [doi]
- Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity LossZiqiang Shi, Rujie Liu, Jiqing Han. 2682-2686 [doi]
- Sub-Band Knowledge Distillation Framework for Speech EnhancementXiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li. 2687-2691 [doi]
- A Deep Learning-Based Kalman Filter for Speech EnhancementSujan Kumar Roy, Aaron Nicolson,