The following publications are possibly variants of this publication:
- Streaming Multi-Talker ASR with Token-Level Serialized Output TrainingNaoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li 0001, Takuya Yoshioka. interspeech 2022: 3774-3778 [doi]
- Fusing ASR Outputs in Joint Training for Speech Emotion RecognitionYuanchao Li, Peter Bell 0001, Catherine Lai. icassp 2022: 7362-7366 [doi]
- Streaming Speaker-Attributed ASR with Token-Level Speaker EmbeddingsNaoyuki Kanda, Jian Wu 0027, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka. interspeech 2022: 521-525 [doi]
- JOIST: A Joint Speech and Text Streaming Model for ASRTara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang 0033, Zhouyuan Huo, Zhehuai Chen, Bo Li 0028, Weiran Wang, Trevor Strohman. slt 2022: 52-59 [doi]
- Serialized Output Training for End-to-End Overlapped Speech RecognitionNaoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka. interspeech 2020: 2797-2801 [doi]
- Joint and Adversarial Training with ASR for Expressive Speech SynthesisKaili Zhang, Cheng Gong, Wenhuan Lu, Longbiao Wang, Jianguo Wei, Dawei Liu. icassp 2022: 6322-6326 [doi]