Abstract is missing.
- JOIST: A Joint Speech and Text Streaming Model for ASRTara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang 0033, Zhouyuan Huo, Zhehuai Chen, Bo Li 0028, Weiran Wang, Trevor Strohman. 52-59 [doi]
- A Context-Aware Knowledge Transferring Strategy for CTC-Based ASRKe-Han Lu, Kuan-Yu Chen. 60-67 [doi]
- E-Branchformer: Branchformer with Enhanced Merging for Speech RecognitionKwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe 0001. 84-91 [doi]
- Conformer-Based on-Device Streaming Speech Recognition with KD Compression and Two-Pass ArchitectureJinhwan Park, Sichen Jin, JunMo Park, Sungsoo Kim, Dhairya Sandhyana, Changheon Lee, Myoungji Han, Jungin Lee, Seokyeong Jung, Changwoo Han, Chanwoo Kim 0001. 92-99 [doi]
- Damage Control During Domain Adaptation for Transducer Based Automatic Speech RecognitionSomshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg. 130-135 [doi]
- Guided Contrastive Self-Supervised Pre-Training for Automatic Speech RecognitionAparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas. 174-181 [doi]
- Modular Hybrid Autoregressive TransducerZhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang 0033, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno 0001. 197-204 [doi]
- How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted Asr? an Extensive Benchmark on Air Traffic Control CommunicationsJuan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Seyyed Saeed Sarfjoo, Petr Motlícek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser, Qingran Zhan. 205-212 [doi]
- Monotonic Segmental Attention for Automatic Speech RecognitionAlbert Zeyer, Robin Schmitt, Wei Zhou 0043, Ralf Schlüter, Hermann Ney. 229-236 [doi]
- Dual Learning for Large Vocabulary On-Device ASRCal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar, Michael Picheny, KyungHyun Cho. 245-251 [doi]
- End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning RepresentationYoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe 0001, Nobutaka Ono. 260-265 [doi]
- HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from ScratchTina Raissi, Wei Zhou 0043, Simon Berger, Ralf Schlüter, Hermann Ney. 287-294 [doi]
- Macro-Block Dropout for Improved Regularization in Training End-to-End Speech Recognition ModelsChanwoo Kim 0001, Sathish Indurti, Jinhwan Park, Wonyong Sung. 331-338 [doi]
- A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language UnderstandingYifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan 0003, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe 0001. 406-413 [doi]
- On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword SpottingWei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee. 414-421 [doi]
- Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input RepresentationMartin Strauss 0003, Matteo Torcoli, Bernd Edler. 444-450 [doi]
- EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersSoumi Maiti, Yushi Ueda, Shinji Watanabe 0001, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004. 480-487 [doi]
- End-to-End Multi-Speaker ASR with Independent Vector AnalysisRobin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe 0001, Yanmin Qian. 496-501 [doi]
- Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain AdaptationChendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao 0006. 509-516 [doi]
- Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related MetricsTianyu Cao 0003, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak. 517-524 [doi]
- A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological DisordersAnna Favaro, Chelsie Motley, Tianyu Cao 0003, Miguel Iglesias, Ankur Butala, Esther S. Oh, Robert D. Stevens, Jesús Villalba 0001, Najim Dehak, Laureano Moro-Velázquez. 532-539 [doi]
- Efficient Dynamic Filter For Robust and Low Computational Feature ExtractionDonghyeon Kim, Jeong-gi Kwak, Hanseok Ko. 540-547 [doi]
- The Clever Hans Effect in Voice Spoofing DetectionBhusan Chettri. 577-584 [doi]
- Investigating Active-Learning-Based Training Data Selection for Speech Spoofing CountermeasureXin Wang 0037, Junichi Yamagishi. 585-592 [doi]
- Joint Speaker Diarisation and Tracking in Switching State-Space ModelJeremy Heng Meng Wong, Yifan Gong 0001. 605-612 [doi]
- Diarisation Using Location Tracking with Agglomerative ClusteringJeremy Heng Meng Wong, Igor Abramovski, Xiong Xiao, Yifan Gong 0001. 613-619 [doi]
- Mutual Learning of Single- and Multi-Channel End-to-End Neural DiarizationShota Horiguchi, Yuki Takashima, Shinji Watanabe 0001, Paola García. 620-625 [doi]
- Bertraffic: Bert-Based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsJuan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlícek, Karel Ondrej, Oliver Ohneiser, Hartmut Helmke. 633-640 [doi]
- Fine Grained Spoken Document Summarization Through Text SegmentationSamantha Kotey, Rozenn Dahyot, Naomi Harte. 647-654 [doi]
- Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken LanguagesLeanne Nortje, Herman Kamper. 700-707 [doi]
- Transformer-Based Lip-Reading with Regularized Dropout and Relaxed AttentionZhengyang Li, Timo Lohrenz, Matthias Dunkelberg, Tim Fingscheidt. 723-730 [doi]
- YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual GroundingKayode Olaleye, Dan Oneata, Herman Kamper. 731-738 [doi]
- Improving Luxembourgish Speech Recognition with Cross-Lingual Speech RepresentationsLe Minh Nguyen 0002, Shekhar Nayak, Matt Coler. 792-797 [doi]
- Multilingual Speech Emotion Recognition with Multi-Gating Mechanism and Neural Architecture SearchZihan Wang 0006, Qi Meng, HaiFeng Lan, Xinrui Zhang, KeHao Guo, Akshat Gupta. 806-813 [doi]
- Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language ModelsC. S. Anoop, A. G. Ramakrishnan. 830-837 [doi]
- Distribution-Based Emotion Recognition in ConversationWen Wu, Chao Zhang 0031, Philip C. Woodland. 860-867 [doi]
- Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point IterationYuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani. 884-891 [doi]
- GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion ModelsMatthew Baas, Herman Kamper. 906-911 [doi]
- Learning Accent Representation with Multi-Level VAE Towards Controllable Speech SynthesisJan Melechovský, Ambuj Mehrish, Dorien Herremans, Berrak Sisman. 928-935 [doi]
- Regotron: Regularizing the Tacotron2 Architecture Via Monotonic Alignment LossEfthymios Georgiou, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos. 977-983 [doi]
- Stop: A Dataset for Spoken Task Oriented Semantic ParsingPaden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed. 991-998 [doi]
- Automatic Prediction of Intelligibility of Words and Phonemes Produced Orally by Japanese Learners of EnglishChuanbo Zhu 0001, Takuya Kunihara, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi. 1029-1036 [doi]
- SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution LearningZuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao 0006. 1037-1044 [doi]
- An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech RecognitionChao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee 0001. 1074-1080 [doi]
- Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningTzu-hsun Feng, Shuyan Annie Dong, Ching-feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe 0001, Abdelrahman Mohamed, Shang-wen Li 0001, Hung-yi Lee. 1096-1103 [doi]
- Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted SettingsKuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang 0033, Hung-yi Lee. 1112-1119 [doi]
- On Compressing Sequences for Self-Supervised Speech ModelsYen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe 0001, Paola García, Hung-yi Lee, Hao Tang 0002. 1128-1135 [doi]
- Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise CorrelationsThemos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukás Burget, Jan Cernocký. 1136-1143 [doi]