Abstract is missing.
- Noise-robust exemplar matching for rescoring query-by-example searchEmre Yilmaz, Julien van Hout, Horacio Franco. 1-7 [doi]
- Learning speaker representation for neural network based multichannel speaker extractionKaterina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani. 8-15 [doi]
- Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentationWei-Ning Hsu, Yu Zhang, James R. Glass. 16-23 [doi]
- Binaural processing for robust recognition of degraded speechAnjali Menon, Chanwoo Kim, Umpei Kurokawa, Richard M. Stern. 24-31 [doi]
- Meeting recognition with asynchronous distributed microphone arrayShoko Araki, Nobutaka Ono, Keisuke Kinoshita, Marc Delcroix. 32-39 [doi]
- Adversarial training for data-driven speech enhancement without parallel corpusTakuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani. 40-47 [doi]
- Tackling unseen acoustic conditions in query-by-example search using time and frequency convolution for multilingual deep bottleneck featuresJulien van Hout, Vikramjit Mitra, Horacio Franco, Chris Bartels, Dimitra Vergyri. 48-54 [doi]
- Improving separation of overlapped speech for meeting conversations using uncalibrated microphone arrayKeisuke Nakamura, Randy Gomez. 55-62 [doi]
- Reducing the computational complexity for whole word modelsHagen Soltau, Hank Liao, Hasim Sak. 63-68 [doi]
- Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergenceNaoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu. 69-76 [doi]
- Semi-supervised training strategies for deep neural networksMatthew Gibson, Gary Cook, Puming Zhan. 77-83 [doi]
- Multi-task ensembles with teacher-student trainingJeremy H. M. Wong, Mark J. F. Gales. 84-90 [doi]
- Language diarization for semi-supervised bilingual acoustic model trainingEmre Yilmaz, Mitchell McLaren, Henk van den Heuvel, David A. van Leeuwen. 91-96 [doi]
- Future word contexts in neural network language modelsX. Chen, X. Liu, A. Ragni, Y. Wang, M. J. F. Gales. 97-103 [doi]
- Future vector enhanced LSTM language model for LVCSRQi Liu, Yanmin Qian, Kai Yu 0004. 104-110 [doi]
- Acoustic-to-word model without OOVJinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong. 111-117 [doi]
- Turbo fusion of magnitude and phase information for DNN-based phoneme recognitionTimo Lohrenz, Tim Fingscheidt. 118-125 [doi]
- Computational cost reduction of long short-term memory based on simultaneous compression of input and hidden stateTakashi Masuko. 126-133 [doi]
- Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networksMasato Mimura, Shinsuke Sakai, Tatsuya Kawahara. 134-140 [doi]
- WERD: Using social text spelling variants for evaluating dialectal speech recognitionAhmed Ali, Preslav Nakov, Peter Bell 0001, Steve Renals. 141-148 [doi]
- Character-based units for unlimited vocabulary continuous speech recognitionPeter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo. 149-156 [doi]
- Gated convolutional networks based hybrid acoustic models for low resource speech recognitionJian Kang, Wei-Qiang Zhang, Jia Liu 0001. 157-164 [doi]
- Lattice rescoring strategies for long short term memory language models in speech recognitionShankar Kumar, Michael Nirschl, Daniel Niels Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix X. Yu. 165-172 [doi]
- Syllable-based acoustic modeling with CTC-SMBR-LSTMZhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro J. Moreno. 173-177 [doi]
- Sequence training of DNN acoustic models with natural gradientAdnan Haider, Philip C. Woodland. 178-184 [doi]
- Consistent DNN uncertainty training and decoding for robust ASRKaran Nathwani, Emmanuel Vincent, Irina Illina. 185-192 [doi]
- Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducerKanishka Rao, Hasim Sak, Rohit Prabhavalkar. 193-199 [doi]
- Unsupervised adaptation of student DNNS learned from teacher RNNS for improved ASR performanceLahiru Samarakoon, Brian Mak. 200-205 [doi]
- Exploring neural transducers for end-to-end speech recognitionEric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, Anuroop Sriram, Zhenyao Zhu. 206-213 [doi]
- Unsupervised adaptation with domain separation networks for robust speech recognitionZhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong. 214-221 [doi]
- Incremental training and constructing the very deep convolutional residual network acoustic modelsSheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara, Hisashi Kawai. 222-227 [doi]
- On lattice generation for large vocabulary speech recognitionDavid Rybach, Michael Riley, Johan Schalkwyk. 228-235 [doi]
- Simplifying very deep convolutional neural network architectures for robust speech recognitionJoanna Rownicka, Steve Renals, Peter Bell 0001. 236-243 [doi]
- Language modeling with highway LSTMGakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy. 244-251 [doi]
- Direct modeling of raw audio with DNNS for wake word detectionKen'ichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, Arindam Mandal. 252-257 [doi]
- Improving the efficiency of forward-backward algorithm using batched computation in TensorFlowKhe Chai Sim, Arun Narayanan, Tom Bagby, Tara N. Sainath, Michiel Bacchiani. 258-264 [doi]
- Language independent end-to-end architecture for joint language identification and speech recognitionShinji Watanabe, Takaaki Hori, John R. Hershey. 265-271 [doi]
- Keyword spotting for Google assistant using contextual speech recognitionAssaf Hurwitz Michaely, Xuedong Zhang, Gabor Simko, Carolina Parada, Petar S. Aleksic. 272-278 [doi]
- Investigation of transfer learning for ASR using LF-MMI trained neural networksPegah Ghahremani, Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur. 279-286 [doi]
- Multi-level language modeling and decoding for open vocabulary end-to-end speech recognitionTakaaki Hori, Shinji Watanabe, John R. Hershey. 287-293 [doi]
- Language modeling with neural trans-dimensional random fieldsBin Wang, Zhijian Ou. 294-300 [doi]
- Listening while speaking: Speech chain by deep learningAndros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001. 301-308 [doi]
- Attention-based Wav2Text with feature transfer learningAndros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001. 309-315 [doi]
- Speech recognition challenge in the wild: Arabic MGB-3Ahmed Ali, Stephan Vogel, Steve Renals. 316-322 [doi]
- The zero resource speech challenge 2017Ewan Dunbar, Xuan-Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux. 323-330 [doi]
- The blizzard machine learning challenge 2017Kei Sawada, Keiichi Tokuda, Simon King, Alan W. Black. 331-337 [doi]
- Aalto system for the 2017 Arabic multi-genre broadcast challengePeter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo. 338-345 [doi]
- JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learningVimal Manohar, Daniel Povey, Sanjeev Khudanpur. 346-352 [doi]
- Automatic speech recognition of Arabic multi-genre broadcast mediaMaryam Najafian, Wei-Ning Hsu, Ahmed Ali, James R. Glass. 353-359 [doi]
- UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speechAhmet Emin Bulut, Qian Zhang, Chunlei Zhang, Fahimeh Bahmaninezhad, John H. L. Hansen. 360-367 [doi]
- MGB-3 but system: Low-resource ASR on Egyptian YouTube dataKarel Veselý, Murali Karthick Baskar, Mireia Diez, Karel Benes. 368-373 [doi]
- MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challengeSuwon Shon, Ahmed Ali, James R. Glass. 374-380 [doi]
- Seeing and hearing too: Audio representation for video captioningShun-Po Chuang, Chia-Hung Wan, Pang-Chi Huang, Chi-Yu Yang, Hung-yi Lee. 381-388 [doi]
- Multitask training with unlabeled data for end-to-end sign language fingerspelling recognitionBowen Shi, Karen Livescu. 389-396 [doi]
- A hierarchical attention based model for off-topic spontaneous spoken response detectionAndrey Malinin, Kate Knill, Mark J. F. Gales. 397-403 [doi]
- A context-aware speech recognition and understanding system for air traffic control domainYoussef Oualil, Dietrich Klakow, György Szaszák, Ajay Srinivasamurthy, Hartmut Helmke, Petr Motlícek. 404-408 [doi]
- Spoken language biomarkers for detecting cognitive impairmentTuka Alhanai, Rhoda Au, James R. Glass. 409-416 [doi]
- DBLSTM based multilingual articulatory feature extraction for language documentationMarkus Müller 0001, Sebastian Stiiker, Alex Waibel. 417-423 [doi]
- Learning modality-invariant representations for speech and imagesKenneth Leidal, David Harwath, James R. Glass. 424-429 [doi]
- Early and late integration of audio features for automatic video descriptionChiori Hori, Takaaki Hori, Tim K. Marks, John R. Hershey. 430-436 [doi]
- Cracking the cocktail party problem by multi-beam deep attractor networkZhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong. 437-444 [doi]
- Ground truth estimation of spoken english fluency score using decorrelation penalized low-rank matrix factorizationHoon Chung, Yun-kyung Lee, Jeon Gue Park. 445-449 [doi]
- Exploring the use of acoustic embeddings in neural machine translationSalil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain. 450-457 [doi]
- Unwritten languages demand attention too! Word discovery with encoder-decoder modelsMarcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier. 458-465 [doi]
- Neural relevance-aware query modeling for spoken document retrievalTien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen. 466-473 [doi]
- Streaming small-footprint keyword spotting using sequence-to-sequence modelsYanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw. 474-481 [doi]
- Iterative policy learning in end-to-end trainable task-oriented neural dialog modelsBing Liu, Ian Lane. 482-489 [doi]
- Denotation extraction for interactive learning in dialogue systemsMiroslav Vodolán, Filip Jurcícek. 490-496 [doi]
- Mitigating the impact of speech recognition errors on chatbot using sequence-to-sequence modelPin-Jung Chen, I-Hung Hsu, Yi Yao Huang, Hung-yi Lee. 497-503 [doi]
- Deep quaternion neural networks for spoken language understandingTitouan Parcollet, Mohamed Morchid, Georges Linarès. 504-511 [doi]
- Topic segmentation in ASR transcripts using bidirectional RNNS for change detectionImran Sehikh, Dominique Fohr, Irina Illina. 512-518 [doi]
- Grounded language understanding for manipulation instructions using GAN-based classificationKomei Sugiura, Hisashi Kawai. 519-524 [doi]
- Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic featuresEmiru Tsunoo, Ondrej Klejch, Peter Bell 0001, Steve Renals. 525-532 [doi]
- Personalized word representations carrying personalized semantics learned from social network postsZih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee. 533-540 [doi]
- Speaker-sensitive dual memory networks for multi-turn slot taggingYoung-Bum Kim, Sungjin Lee, Ruhi Sarikaya. 541-546 [doi]
- ONENET: Joint domain, intent, slot prediction for spoken language understandingYoung-Bum Kim, Sungjin Lee, Karl Stratos. 547-553 [doi]
- Dynamic time-aware attention to speaker roles and contexts for spoken language understandingPo-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung Chen. 554-560 [doi]
- Scalable multi-domain dialogue state trackingAbhinav Rastogi, Dilek Hakkani-Tür, Larry P. Heck. 561-568 [doi]
- Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog systemYao Qian, Rutuja Ubale, Vikram Ramanarayanan, Patrick L. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun. 569-576 [doi]
- Leveraging side information for speaker identification with the Enron conversational telephone speech collectionNing Gao, Gregory Sell, Douglas W. Oard, Mark Dredze. 577-583 [doi]
- End-to-end text-independent speaker verification with flexibility in utterance durationChunlei Zhang, Kazuhito Koishida. 584-590 [doi]
- Spoofing detection via simultaneous verification of audio-visual synchronicity and transcriptionLea Schonherr, Steffen Zeiler, Dorothea Kolossa. 591-598 [doi]
- Adversarial manifold learning for speaker recognitionJen-Tzung Chien, Kang-Ting Peng. 599-605 [doi]
- Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corporaYao Qian, Keelan Evanini, Patrick L. Lange, Robert A. Pugh, Rutuja Ubale, Frank K. Soong. 606-613 [doi]
- Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verificationZiqiang Shi, Liu Liu, Mengjiao Wang, Rujie Liu. 614-620 [doi]
- Leveraging native language speech for accent identification using deep Siamese networksAditya Siddhant, Preethi Jyothi, Sriram Ganapathy. 621-628 [doi]
- Comparison of multiple features and modeling methods for text-dependent speaker verificationYi Liu, Liang He, Yao Tian, Zhuzi Chen, Jia Liu, Michael T. Johnson. 629-636 [doi]
- Investigating native and non-native English classification and transfer effects using Legendre polynomial coefficient clusteringRachel Rakov, Andrew Rosenberg. 637-643 [doi]
- The CMU entry to blizzard machine learning challengePallavi Baljekar, Sai Krishna Rallabandi, Alan W. Black. 644-649 [doi]
- The USTC system for blizzard machine learning challenge 2017-ES2Ya-Jun Hu, Li-juan Liu, Chuang Ding, Zhen-Hua Ling, Li-Rong Dai. 650-656 [doi]
- The iFLYTEK system for blizzard machine learning challenge 2017-ES1Li-juan Liu, Chuang Ding, Ya-Jun Hu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Si Wei. 657-664 [doi]
- Minimally supervised written-to-spoken text normalizationAxel H. Ng, Kyle Gorman, Richard Sproat. 665-670 [doi]
- Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systemsEunwoo Song, Frank K. Soong, Hong-Goo Kang. 671-676 [doi]
- Sparse representation of phonetic features for voice conversion with and without parallel dataBerrak Cicman, Haizhou Li, Kay Chen Tan. 677-684 [doi]
- Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning frameworkShan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li. 685-691 [doi]
- Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical contextKevin Vythelingum, Yannick Estève, Olivier Rosee. 692-697 [doi]
- Subband wavenet with overlapped single-sideband filterbanksTakuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. 698-704 [doi]
- Integrated speaker-adaptive speech synthesisMoquan Wan, Gilles Degottex, Mark J. F. Gales. 705-711 [doi]
- An investigation of multi-speaker training for wavenet vocoderTomoki Hayashi, Akira Tamamori, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda. 712-718 [doi]
- An embedded segmental K-means model for unsupervised segmentation and clustering of speechHerman Kamper, Karen Livescu, Sharon Goldwater. 719-726 [doi]
- Multilingual bottle-neck feature learning from untranscribed speechHongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li. 727-733 [doi]
- Extracting bottleneck features and word-like pairs from untranscribed speech for feature representationYougen Yuan, Cheung Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li. 734-739 [doi]
- Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017Michael Heck, Sakriani Sakti, Satoshi Nakamura 0001. 740-746 [doi]
- Composite embedding systems for ZeroSpeech2017 Track1Hayato Shibata, Taku Kato, Takahiro Shinozaki, Shinji Watanabe. 747-753 [doi]
- Deep learning methods for unsupervised acoustic modeling - Leap submission to ZeroSpeech challenge 2017T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy. 754-761 [doi]
- Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditionsT. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy, Susheela Devi. 762-768 [doi]