Abstract is missing.
- Incremental Lattice Determinization for WFST DecodersZhehuai Chen, Mahsa Yarmohammadi, Hainan Xu, Hang Lv 0001, Lei Xie 0001, Daniel Povey, Sanjeev Khudanpur. 1-7 [doi]
- A Comparison of Transformer and LSTM Encoder Decoder Models for ASRAlbert Zeyer, Parnia Bahar, Kazuki Irie, Ralf Schlüter, Hermann Ney. 8-15 [doi]
- A Dropout-Based Single Model Committee Approach for Active Learning in ASRJiayi Fu, Kuang Ru. 16-22 [doi]
- Personalization of End-to-End Speech Recognition on Mobile Devices for Named EntitiesKhe Chai Sim, Leif Johnson, Giovanni Motta, Lillian Zhou, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang. 23-30 [doi]
- Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic ModelsNaoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe. 31-38 [doi]
- Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech RecognitionQiujia Li, Chao Zhang, Philip C. Woodland. 39-46 [doi]
- An Investigation into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party TranscriptionCatalin Zorila, Christoph Böddeker, Rama Doddipatla, Reinhold Haeb-Umbach. 47-53 [doi]
- State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention with Dilated 1D ConvolutionsKyu J. Han, Ramon Prieto, Tao Ma. 54-61 [doi]
- Highly Efficient Neural Network Language Model Compression Using Soft Binarization TrainingRao Ma, Qi Liu, Kai Yu 0004. 62-69 [doi]
- Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder ModelsAbhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim. 70-77 [doi]
- Lead2Gold: Towards Exploiting the Full Potential of Noisy Transcriptions for Speech RecognitionAdrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze. 78-85 [doi]
- Orthogonality Constrained Multi-Head Attention for Keyword SpottingMingu Lee, JinKyu Lee, Hye Jin Jang, Byeonggeun Kim, Wonil Chang, Kyuwoong Hwang. 86-92 [doi]
- Learning Between Different Teacher and Student Models in ASRJeremy Heng Meng Wong, Mark J. F. Gales, Yu Wang 0027. 93-99 [doi]
- A Unified Endpointer Using Multitask and Multidomain TrainingShuo-Yiin Chang, Bo Li 0028, Gabor Simko. 100-106 [doi]
- Domain Expansion in DNN-Based Acoustic Models for Robust Speech RecognitionShahram Ghorbani, Soheil Khorram, John H. L. Hansen. 107-113 [doi]
- Improving RNN Transducer Modeling for End-to-End Speech RecognitionJinyu Li, Rui Zhao, Hu Hu, Yifan Gong. 114-121 [doi]
- Simple Gated Convnet for Small Footprint Acoustic ModelingLukas Lee, Jinhwan Park, Wonyong Sung. 122-128 [doi]
- GANs for Children: A Generative Data Augmentation Strategy for Children Speech RecognitionPeiyao Sheng, Zhuolin Yang, Yanmin Qian. 129-135 [doi]
- Espresso: A Fast End-to-End Neural Speech Recognition ToolkitYiming Wang, Sanjeev Khudanpur, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv 0001, Yiwen Shao, Nanyun Peng, Lei Xie 0001, Shinji Watanabe. 136-143 [doi]
- On the Study of Generative Adversarial Networks for Cross-Lingual Voice ConversionBerrak Sisman, Mingyang Zhang 0003, Minghui Dong, Haizhou Li 0001. 144-151 [doi]
- WaveNet Factorization with Singular Value Decomposition for Voice ConversionHongqiang Du, Xiaohai Tian, Lei Xie 0001, Haizhou Li 0001. 152-159 [doi]
- A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice ConversionYi Zhou, Xiaohai Tian, Emre Yilmaz, Rohan Kumar Das, Haizhou Li 0001. 160-167 [doi]
- Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone DisambiguationHao Sun, Xu Tan, Jun-Wei Gan, Sheng Zhao, Dongxu Han, Hongzhi Liu, Tao Qin, Tie-Yan Liu. 168-175 [doi]
- Investigation of Shallow Wavenet Vocoder with Laplacian Distribution OutputPatrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda. 176-183 [doi]
- Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech SynthesisXiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie. 184-191 [doi]
- Controlling Emotion Strength with Relative Attribute for End-to-End Speech SynthesisXiaolian Zhu, Shan Yang, Geng Yang, Lei Xie. 192-199 [doi]
- Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-SpeechHieu-Thi Luong, Junichi Yamagishi. 200-207 [doi]
- Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian BiasFengyu Yang, Shan Yang, Pengcheng Zhu, Pengju Yan, Lei Xie 0001. 208-213 [doi]
- Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech SystemsTakuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. 214-221 [doi]
- Speaker-Aware Speech-TransformerZhiyun Fan, Jie Li, Shiyu Zhou, Bo Xu 0002. 222-229 [doi]
- Speech Separation Using Speaker InventoryPeidong Wang, Zhuo Chen, Xiong Xiao, Zhong Meng, Takuya Yoshioka, Tianyan Zhou, Liang Lu, Jinyu Li. 230-236 [doi]
- MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech RecognitionXuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe. 237-244 [doi]
- Joint Distribution Learning in the Framework of Variational Autoencoders for Far-Field Speech EnhancementMahesh K. Chelimilla, Shashi Kumar, Shakti P. Rath. 245-251 [doi]
- Analyzing Large Receptive Field Convolutional Networks for Distant Speech RecognitionSalar Jafarlou, Soheil Khorram, Vinay Kothapally, John H. L. Hansen. 252-259 [doi]
- FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio ProcessingYi Luo 0004, Cong Han, Nima Mesgarani, Enea Ceolini, Shih-Chii Liu. 260-267 [doi]
- Domain Adaptation via Teacher-Student Learning for End-to-End Speech RecognitionZhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong. 268-275 [doi]
- Advances in Online Audio-Visual Meeting TranscriptionTakuya Yoshioka, Yan Huang 0028, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Igor Abramovski, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang. 276-283 [doi]
- Joint Optimization of Classification and Clustering for Deep Speaker EmbeddingZhiming Wang, Kaisheng Yao, Shuo Fang, Xiaolong Li. 284-290 [doi]
- Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker RecognitionChien-Lin Huang. 291-295 [doi]
- End-to-End Neural Speaker Diarization with Self-AttentionYusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe. 296-303 [doi]
- A Cross-Corpus Study on Speech Emotion RecognitionRosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain. 304-311 [doi]
- Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker VerificationSongxiang Liu, Haibin Wu, Hung-yi Lee, Helen Meng. 312-319 [doi]
- Spoken Language Identification Using Bidirectional LSTM Based LID Sequential SenonesH. Muralikrishna, Pulkit Sapra, Anuksha Jain, Dileep Aroor Dinesh. 320-326 [doi]
- Time-Domain Speaker Extraction NetworkChenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li 0001. 327-334 [doi]
- Short Utterance Compensation in Speaker Verification via Cosine-Based Teacher-Student Learning of Speaker EmbeddingsJee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu. 335-341 [doi]
- Novel Enhanced Teager Energy Based Cepstral Coefficients for Replay Spoof DetectionRajul Acharya, Hemant A. Patil, Harsh Kotta. 342-349 [doi]
- Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker VerificationJunyi Peng, Yuexian Zou, Na Li, Deyi Tuo, Dan Su, Meng Yu, Chunlei Zhang, Dong Yu 0001. 350-357 [doi]
- Latent Space Representation for Multi-Target Speaker Detection and Identification with a Sparse Dataset Using Triplet Neural NetworksKin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans. 358-364 [doi]
- Self-Adaptive Soft Voice Activity Detection Using Deep Neural Networks for Robust Speaker VerificationYoungmoon Jung, Yeunju Choi, Hoirin Kim. 365-372 [doi]
- Spherediar: An Effective Speaker Diarization System for Meeting DataTuomas Kaseva, Aku Rouhe, Mikko Kurimo. 373-380 [doi]
- Bayesian Adversarial Learning for Speaker RecognitionJen-Tzung Chien, Chun Lin Kuo. 381-388 [doi]
- An Investigation of LSTM-CTC based Joint Acoustic Model for Indian Language IdentificationTirusha Mandava, Ravi Kumar Vuddagiri, Hari Krishna Vydana, Anil Kumar Vuppala. 389-396 [doi]
- A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: The Deepmine DatabaseHossein Zeinali, Lukás Burget, Jan Honza Cernocký. 397-402 [doi]
- Native Language Identification from Raw Waveforms Using Deep Convolutional Neural Networks with Attentive PoolingRutuja Ubale, Vikram Ramanarayanan, Yao Qian, Keelan Evanini, Chee Wee Leong, Chong Min Lee. 403-410 [doi]
- Speaker Verification with Application-Aware BeamformingLadislav Mosner, Oldrich Plchot, Johan Rohdin, Lukás Burget, Jan Cernocký. 411-418 [doi]
- Training Language Models for Long-Span Cross-Sentence EvaluationKazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney. 419-426 [doi]
- Transformer ASR with Contextual Block ProcessingEmiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe. 427-433 [doi]
- A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech RecognitionErik McDermott, Hasim Sak, Ehsan Variani. 434-441 [doi]
- Improving Grapheme-to-Phoneme Conversion by Investigating Copying Mechanism in Recurrent ArchitecturesAbhishek Niranjan, M. Ali Basha Shaik. 442-448 [doi]
- A Comparative Study on Transformer vs RNN in Speech ApplicationsShigeki Karita, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto. 449-456 [doi]
- From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech RecognitionDuc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer. 457-464 [doi]
- Attention-Based Speech Recognition Using Gaze InformationOsamu Segawa, Tomoki Hayashi, Kazuya Takeda. 465-470 [doi]
- Listening While Speaking and Visualizing: Improving ASR Through Multimodal ChainJohanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001. 471-478 [doi]
- Embeddings for DNN Speaker Adaptive TrainingJoanna Rownicka, Peter Bell 0001, Steve Renals. 479-486 [doi]
- Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech RecognitionSurabhi Punjabi, Harish Arsikere, Sri Garimella. 487-493 [doi]
- Speaker and Language Aware Training for End-to-End ASRShubham Bansal, Karan Malhotra, Sriram Ganapathy. 494-501 [doi]
- Data Augmentation Based on Vowel Stretch for Improving Children's Speech RecognitionTohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata. 502-508 [doi]
- Mixed Bandwidth Acoustic Modeling Leveraging Knowledge DistillationTakashi Fukuda, Samuel Thomas. 509-515 [doi]
- On Temporal Context Information for Hybrid BLSTM-Based Phoneme RecognitionTimo Lohrenz, Maximilian Strake, Tim Fingscheidt. 516-523 [doi]
- Exploring Model Units and Training Strategies for End-to-End Speech RecognitionMingkun Huang, Yizhou Lu, Lan Wang, Yanmin Qian, Kai Yu 0004. 524-531 [doi]
- Query-by-Example On-Device Keyword SpottingByeonggeun Kim, Mingu Lee, JinKyu Lee, Yeonseok Kim, Kyuwoong Hwang. 532-538 [doi]
- Small-Footprint Keyword Spotting with Graph Convolutional NetworkXi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei. 539-546 [doi]
- Simplified LSTMS for Speech RecognitionGeorge Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas. 547-553 [doi]
- Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder ModelsRyo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, Takanobu Oba. 554-561 [doi]
- End-to-End Training of a Large Vocabulary End-to-End Speech Recognition SystemChanwoo Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim. 562-569 [doi]
- Multilingual End-to-End Speech TranslationHirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe. 570-577 [doi]
- Neural Machine Translation with Acoustic EmbeddingTakatomo Kano, Sakriani Sakti, Satoshi Nakamura 0001. 578-584 [doi]
- One-to-Many Multilingual End-to-End Speech TranslationMattia Antonino Di Gangi, Matteo Negri, Marco Turchi. 585-592 [doi]
- Speech-to-Speech Translation Between Untranscribed Unknown LanguagesAndros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001. 593-600 [doi]
- Enhanced Bert-Based Ranking Models for Spoken Document RetrievalHsiao-Yun Lin, Tien-Hong Lo, Berlin Chen. 601-606 [doi]
- Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword SpottingXiong Wang, Sining Sun, Lei Xie 0001. 607-612 [doi]
- Verifying Deep Keyword Spotting Detection with Acoustic Word EmbeddingsYougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie 0001. 613-620 [doi]
- Multilingual Bottleneck Features for Query by Example Spoken Term DetectionDhananjay Ram, Lesly Miculicich, Hervé Bourlard. 621-628 [doi]
- Additional Shared Decoder on Siamese Multi-View Encoders for Learning Acoustic Word EmbeddingsMyunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim. 629-636 [doi]
- Efficient Free Keyword Detection Based on CNN and End-to-End Continuous DP-MatchingTomohiro Tanaka, Takahiro Shinozaki. 637-644 [doi]
- Improving Speech Enhancement with Phonetic Embedding FeaturesBo Wu, Meng Yu, LianWu Chen, Mingjie Jin, Dan Su, Dong Yu 0001. 645-651 [doi]
- Detecting Deception in Political Debates Using Acoustic and Textual FeaturesDaniel Kopev, Ahmed Ali, Ivan Koychev, Preslav Nakov. 652-659 [doi]
- End-to-End Overlapped Speech Detection and Speaker Counting with Raw WaveformWangyou Zhang, Man Sun, Lan Wang, Yanmin Qian. 660-666 [doi]
- Time Domain Audio Visual Speech SeparationJian Wu, Yong Xu, Shi-Xiong Zhang, LianWu Chen, Meng Yu, Lei Xie, Dong Yu 0001. 667-673 [doi]
- Speech Reveals Future Risk of Developing Dementia: Predictive Dementia Screening from Biographic InterviewsJochen Weiner, Claudia Frankenberg, Johannes Schröder, Tanja Schultz. 674-681 [doi]
- Improving Fundamental Frequency Generation in EMG-to-Speech Conversion Using a Quantization ApproachLorenz Diener, Tejas Umesh, Tanja Schultz. 682-689 [doi]
- Towards Real-Time Mispronunciation Detection in Kids' SpeechPeter Plantinga, Eric Fosler-Lussier. 690-696 [doi]
- Incorporating Prior Knowledge into Speaker Diarization and Linking for Identifying Common SpeakerTsun-Yat Leung, Lahiru Samarakoon, Albert Y. S. Lam. 697-703 [doi]
- Logistic Similarity Metric Learning via Affinity Matrix for Text-Independent Speaker VerificationJunyi Peng, Rongzhi Gu, Yuexian Zou. 704-709 [doi]
- Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GansPhani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak. 710-717 [doi]
- CNN with Phonetic Attention for Text-Independent Speaker VerificationTianyan Zhou, Yong Zhao, Jinyu Li, Yifan Gong, Jian Wu. 718-725 [doi]
- Probing the Information Encoded in X-VectorsDesh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur. 726-733 [doi]
- In-the-Wild End-to-End Detection of Speech Affecting DiseasesM. Joana Correia, Isabel Trancoso, Bhiksha Raj. 734-741 [doi]
- Optimizing Neural Network Embeddings Using a Pair-Wise Loss for Text-Independent Speaker VerificationHira Dhamyal, Tianyan Zhou, Bhiksha Raj, Rita Singh. 742-748 [doi]
- Towards Controlling False Alarm - Miss Trade-Off in Perceptual Speaker Comparison via Non-Neutral Listening Task FramingRosa González Hautamäki, Tomi H. Kinnunen. 749-756 [doi]
- Dover: A Method for Combining Diarization OutputsAndreas Stolcke, Takuya Yoshioka. 757-763 [doi]
- Using Very Deep Convolutional Neural Networks to Automatically Detect Plagiarized Spoken ResponsesXinhao Wang, Keelan Evanini, Yao Qian, Klaus Zechner. 764-771 [doi]
- Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural NetworksShang-Bao Luo, Hung-Shin Lee, Kuan-Yu Chen, Hsin-Min Wang. 772-778 [doi]
- Transfer Learning for Context-Aware Spoken Language UnderstandingQian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu. 779-786 [doi]
- Emoception: An Inception Inspired Efficient Speech Emotion Recognition NetworkChirag Singh, Abhay Kumar, Ajay Nagar, Suraj Tripathi, Promod Yenigalla. 787-791 [doi]
- A Comparative Study on End-to-End Speech to Text TranslationParnia Bahar, Tobias Bieschke, Hermann Ney. 792-799 [doi]
- Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language UnderstandingJiewen Wu, Luis Fernando D'Haro, Nancy F. Chen, Pavitra Krishnaswamy, Rafael E. Banchs. 800-806 [doi]
- Markov Recurrent Neural Network Language ModelJen-Tzung Chien, Che-Yu Kuo. 807-813 [doi]
- Topic-Aware Pointer-Generator Networks for Summarizing Spoken ConversationsZhengyuan Liu, Angela Ng, Sheldon Lee Shao Guang, Ai Ti Aw, Nancy F. Chen. 814-821 [doi]
- SLU for Voice Command in Smart Home: Comparison of Pipeline and End-to-End ApproachesThierry Desot, François Portet, Michel Vacher. 822-829 [doi]
- Scalable Neural Dialogue State TrackingVevake Balaraman, Bernardo Magnini. 830-837 [doi]
- Hierarchical Transformers for Long Document ClassificationRaghavendra Pappagari, Piotr Zelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak. 838-844 [doi]
- Adapting Pretrained Transformer to Lattices for Spoken Language UnderstandingChao-Wei Huang, Yun-Nung Chen. 845-852 [doi]
- Spatio-Temporal Context Modelling for Speech Emotion ClassificationMd Asif Jalal, Roger K. Moore, Thomas Hain. 853-859 [doi]
- Paraphrase Generation Based on VAE and Pointer-Generator NetworksLohith Ravuru, Hyungtak Choi, Siddarth K. M., Hojung Lee, Inchul Hwang. 860-866 [doi]
- Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning SystemsYinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltán Tüske, Larry Sansone, Michael Picheny. 867-874 [doi]
- Online Batch Normalization Adaptation for Automatic Speech RecognitionFranco Mana, Felix Weninger, Roberto Gemello, Puming Zhan. 875-880 [doi]
- Speaker Adaptive Training Using Model Agnostic Meta-LearningOndrej Klejch, Joachim Fainberg, Peter Bell 0001, Steve Renals. 881-888 [doi]
- A Comparison of End-to-End Models for Long-Form Speech RecognitionChung-Cheng Chiu, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara N. Sainath, Yonghui Wu, Wei Han, Yu Zhang 0033, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang. 889-896 [doi]
- Acoustic Model Adaptation from Raw Waveforms with SincnetJoachim Fainberg, Ondrej Klejch, Erfan Loweimi, Peter Bell 0001, Steve Renals. 897-904 [doi]
- Recurrent Neural Network Transducer for Audio-Visual Speech RecognitionTakaki Makino, Hank Liao, Yannis M. Assael, Brendan Shillingford, Basilio Garcia, Otavio Braga, Olivier Siohan. 905-912 [doi]
- Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech RecognitionJennifer Drexler, James R. Glass. 913-919 [doi]
- Recognizing Long-Form Speech Using Streaming End-to-End ModelsArun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara N. Sainath, Trevor Strohman. 920-927 [doi]
- Leveraging Language ID in Multilingual End-to-End Speech RecognitionAustin Waters, Neeraj Gaur, Parisa Haghani, Pedro J. Moreno, Zhongdi Qu. 928-935 [doi]
- Streaming End-to-End Speech Recognition with Joint CTC-Attention Based ModelsNiko Moritz, Takaaki Hori, Jonathan Le Roux. 936-943 [doi]
- Monotonic Recurrent Neural Network Transducer and Decoding StrategiesAnshuman Tripathi, Han Lu, Hasim Sak, Hagen Soltau. 944-948 [doi]
- Character-Aware Attention-Based End-to-End Speech RecognitionZhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong. 949-955 [doi]
- Attention Based On-Device Streaming Speech Recognition with Large Speech CorpusKwangyoun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim, Kyungmin Lee, Dhananjaya Gowda, JunMo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim. 956-963 [doi]
- Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech ChainSahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001. 964-971 [doi]
- End-to-End Code-Switching ASR for Low-Resourced Language PairsXianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li 0001. 972-979 [doi]
- Unsupervised Adaptation of Acoustic Models for ASR Using Utterance-Level Embeddings from Squeeze and Excitation NetworksHardik B. Sailor, Salil Deena, Md Asif Jalal, Rasa Lileikyte, Thomas Hain. 980-987 [doi]
- Power-Law Nonlinearity with Maximally Uniform Distribution Criterion for Improved Neural Network Training in Automatic Speech RecognitionChanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda. 988-995 [doi]
- Speech Recognition with Augmented Synthesized SpeechAndrew Rosenberg, Yu Zhang 0033, Bhuvana Ramabhadran, Ye Jia, Pedro J. Moreno, Yonghui Wu, Zelin Wu. 996-1002 [doi]
- Development of Voice Spoofing Detection Systems for 2019 Edition of Automatic Speaker Verification and Countermeasures ChallengeJoão Monteiro, Md. Jahangir Alam. 1003-1010 [doi]
- Spoof Detection Using Time-Delay Shallow Neural Network and Feature SwitchingMari Ganesh Kumar, Suvidha Rupesh Kumar, M. S. Saranya, B. Bharathi, Hema A. Murthy. 1011-1017 [doi]
- Long Range Acoustic and Deep Features Perspective on ASVspoof 2019Rohan Kumar Das, Jichen Yang, Haizhou Li 0001. 1018-1025 [doi]
- The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic SpeechAhmed M. Ali, Suwon Shon, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, James R. Glass, Steve Renals, Khalid Choukri. 1026-1033 [doi]
- Second Language Transfer Learning in Humans and Machines Using Image SupervisionKiran Praveen, Anshul Gupta, Akshara Soman, Sriram Ganapathy. 1040-1047 [doi]
- Zero-Shot Pronunciation Lexicons for Cross-Language Acoustic Model TransferMatthew Wiesner, Oliver Adams, David Yarowsky, Jan Trmal, Sanjeev Khudanpur. 1048-1054 [doi]
- Robust Belief State Space Representation for Statistical Dialogue Managers Using Deep AutoencodersFotios Lygerakis, Vassilios Diakoloulas, Michail Lagoudakis, Margarita Kotti. 1055-1061 [doi]
- Improving Speech-Based End-of-Turn Detection Via Cross-Modal Representation Learning with Punctuated Text DataRyo Masumura, Mana Ihori, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Takanobu Oba, Ryuichiro Higashinaka. 1062-1069 [doi]
- Dialogue Environments are Different from Games: Investigating Variants of Deep Q-Networks for Dialogue PolicyYu-An Wang, Yun-Nung Chen. 1070-1076 [doi]
- Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing DiversityEunah Cho, He Xie, John P. Lalor, Varun Kumar, William M. Campbell. 1077-1084 [doi]