Abstract is missing.
- Description of the Munich-Passau Snore Sound Corpus (MPSSC)Christoph Janott, Anton Batliner. [doi]
- Description of the Upper Respiratory Tract Infection Corpus (URTIC)Jarek Krajewski, Sebastian Schnieder, Anton Batliner. [doi]
- The INTERSPEECH 2017 Computational Paralinguistics Challenge: A Summary of ResultsStefan Steidl. [doi]
- Description of the Homebank Child/Adult Addressee Corpus (HB-CHAAC)Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont. [doi]
- DiscussionBjörn W. Schuller, Anton Batliner. [doi]
- ISCA Medal for Scientific AchievementHaizhou Li. 1 [doi]
- The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack DetectionTomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee. 2-6 [doi]
- Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 ChallengeRoberto Font, Juan M. Espín, María José Cano. 7-11 [doi]
- Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay DetectionHemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni. 12-16 [doi]
- Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and FusionWeicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li. 17-21 [doi]
- Spoof Detection Using Source, Instantaneous Frequency and Cepstral FeaturesSarfaraz Jelil, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003. 22-26 [doi]
- Audio Replay Attack Detection Using High-Frequency FeaturesMarcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, Jakub Galka. 27-31 [doi]
- Feature Selection Based on CQCCs for Automatic Speaker Verification SpoofingXianliang Wang, Yanhong Xiao, Xuan Zhu. 32-36 [doi]
- Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch SpeechEmre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David A. van Leeuwen. 37-41 [doi]
- Exploiting Untranscribed Broadcast Data for Improved Code-Switching DetectionEmre Yilmaz, Henk van den Heuvel, David A. van Leeuwen. 42-46 [doi]
- Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine DialogVikram Ramanarayanan, David Suendermann-Oeft. 47-51 [doi]
- On Building Mixed Lingual Speech Synthesis SystemsSai Krishna Rallabandi, Alan W. Black. 52-56 [doi]
- Speech Synthesis for Mixed-Language Navigation InstructionsKhyathi Chandu Raghavi, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W. Black. 57-61 [doi]
- Addressing Code-Switching in French/Algerian Arabic SpeechDjegdjiga Amazouz, Martine Adda-Decker, Lori Lamel. 62-66 [doi]
- Metrics for Modeling Code-Switching Across CorporaGualberto A. Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock, Almeida Jacqueline Toribio. 67-71 [doi]
- Synthesising isiZulu-English Code-Switch Bigrams Using Word EmbeddingsEwald van der Westhuizen, Thomas Niesler. 72-76 [doi]
- Crowdsourcing Universal Part-of-Speech Tags for Code-SwitchingVictor Soto, Julia Hirschberg. 77-81 [doi]
- Audio Replay Attack Detection with Deep Learning FrameworksGalina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin. 82-86 [doi]
- Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017Zhe Ji, Zhi-Yi Li, Peng Li, Maobo An, Shengxiang Gao, Dan Wu, Faru Zhao. 87-91 [doi]
- A Study on Replay Attack and Anti-Spoofing for Automatic Speaker VerificationLantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng. 92-96 [doi]
- Replay Attack Detection Using DNN for Channel DiscriminationParav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland. 97-101 [doi]
- ResNet and Model Fusion for Automatic Spoofing DetectionZhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu. 102-106 [doi]
- SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala. 107-111 [doi]
- Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck FeaturesWilliam Hartmann, Roger Hsiao, Tim Ng, Jeff Z. Ma, Francis Keith, Man-Hung Siu. 112-116 [doi]
- Student-Teacher Training with Diverse Decision Tree EnsemblesJeremy H. M. Wong, Mark J. F. Gales. 117-121 [doi]
- Embedding-Based Speaker Adaptive Training of Deep Neural NetworksXiaodong Cui, Vaibhava Goel, George Saon. 122-126 [doi]
- Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge TransferJeff Z. Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball. 127-131 [doi]
- English Conversational Telephone Speech Recognition by Humans and MachinesGeorge Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall. 132-136 [doi]
- Comparing Human and Machine Errors in Conversational Speech TranscriptionAndreas Stolcke, Jasha Droppo. 137-141 [doi]
- Multimodal Markers of Persuasive Speech: Designing a Virtual Debate CoachVolha Petukhova, Manoj Raju, Harry Bunt. 142-146 [doi]
- Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum DisorderDaniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth S. Narayanan, Ruth B. Grossman. 147-151 [doi]
- A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional BehaviorsAlec Burmania, Carlos Busso. 152-156 [doi]
- An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous SpeechGaurav Fotedar, Prasanta Kumar Ghosh. 157-161 [doi]
- Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression TechniquesD.-Y. Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li. 162-165 [doi]
- Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation StrategiesMarion Dohen, Benjamin Roustan. 166-170 [doi]
- Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation ProcessingPeter Guzewich, Stephen A. Zahorian. 171-175 [doi]
- Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System DistancePhilipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt. 176-180 [doi]
- A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) SystemsJan Franzen, Tim Fingscheidt. 181-185 [doi]
- Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant RecipientsDongmei Wang, John H. L. Hansen. 186-190 [doi]
- Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares ClassifierDavid Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera. 191-195 [doi]
- Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear ImplantTsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee. 196-200 [doi]
- Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI StudyZainab Hermes, Marissa S. Barlaz, Ryan Shosted, Zhi-Pei Liang, Bradley P. Sutton. 201-205 [doi]
- Glottal Opening and Strategies of Production of FricativesBenjamin Elie, Yves Laprie. 206-209 [doi]
- Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan ArabicMohamed Yassine Frej, Christopher Carignan, Catherine T. Best. 210-214 [doi]
- How are Four-Level Length Distinctions Produced? Evidence from Moroccan ArabicGiuseppina Turco, Karim Shoul, Rachid Ridouane. 215-218 [doi]
- Vowels in the Barunga Variety of North Australian KriolCaroline Jones, Katherine Demuth, Weicong Li, Andre Almeida. 219-223 [doi]
- Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel HarmonyIndranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah. 224-228 [doi]
- The Influence of Synthetic Voice on the Evaluation of a Virtual CharacterJoão Paulo Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell. 229-233 [doi]
- Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural NetworkAmelia J. Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda. 234-238 [doi]
- An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion SynthesisSébastien Le Maguer, Ingmar Steiner, Alexander Hewer. 239-243 [doi]
- VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory ModelRachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth S. Narayanan. 244-248 [doi]
- Beyond the Listening Test: An Interactive Approach to TTS EvaluationJoseph Mendelson, Matthew P. Aylett. 249-253 [doi]
- Integrating Articulatory Information in Deep Learning-Based Text-to-Speech SynthesisBeiming Cao, Myung Jong Kim, Jan P. H. van Santen, Ted Mau, Jun Wang 0037. 254-258 [doi]
- Approaches for Neural-Network Language Model AdaptationMin Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar. 259-263 [doi]
- A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language ModelsYoussef Oualil, Dietrich Klakow. 264-268 [doi]
- Investigating Bidirectional Recurrent Neural Network Language Models for Speech RecognitionX. Chen, Anton Ragni, X. Liu, Mark J. F. Gales. 269-273 [doi]
- Fast Neural Network Language Model Lookups at N-Gram SpeedsYinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran. 274-278 [doi]
- Empirical Exploration of Novel Architectures and Objectives for Language ModelsGakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon. 279-283 [doi]
- Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward NetworksKarel Benes, Murali Karthick Baskar, Lukás Burget. 284-288 [doi]
- Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice AnalysisAmir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen. 289-293 [doi]
- Automatic Paraphasia Detection from Aphasic Speech: A Preliminary StudyDuc Le, Keli Licata, Emily Mower Provost. 294-298 [doi]
- Evaluation of the Neurological State of People with Parkinson's Disease Using i-VectorsNicanor Garcia, Juan Rafael Orozco-Arroyave, Luis Fernando D'Haro, Najim Dehak, Elmar Nöth. 299-303 [doi]
- Objective Severity Assessment from Disordered Voice Using Estimated Glottal AirflowYu-Ren Chien, Michal Borský, Jón Guðnason. 304-308 [doi]
- Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based ApproachFlorian B. Pokorny, Björn W. Schuller, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter. 309-313 [doi]
- Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's DiseaseJuan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth. 314-318 [doi]
- Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNsLinxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber, Stephen M. Houghton. 319-323 [doi]
- An Investigation of Crowd Speech for Room Occupancy EstimationSiyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le. 324-328 [doi]
- Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech SignalsKarthika Vijayan, Jitendra Kumar Dhiman, Chandra Sekhar Seelamantula. 329-333 [doi]
- Musical Speech: A New Methodology for Transcribing Speech ProsodyAlexsandro R. Meireles, Antônio R. M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros. 334-338 [doi]
- Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech TrainingK. S. Nataraj, Prem C. Pandey, Hirak Dasgupta. 339-343 [doi]
- Estimation of the Probability Distribution of Spectral Fine Structure in the Speech SourceTom Bäckström. 344-348 [doi]
- End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-FricativesSucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini. 349-353 [doi]
- Dialect Perception by Older ChildrenEwa Jacewicz, Robert Allen Fox. 354-358 [doi]
- Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than StopsKiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima. 359-363 [doi]
- L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and ComprehensibilityLieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts. 364-368 [doi]
- Effects of Pitch Fall and L1 on Vowel Length Identification in L2 JapaneseIzumi Takiguchi. 369-373 [doi]
- A Preliminary Study of Prosodic Disambiguation by Chinese EFL LearnersYuanyuan Zhang, Hongwei Ding. 374-378 [doi]
- Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google HomeChanwoo Kim, Ananya Misra, Kean K. Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani. 379-383 [doi]
- Neural Network-Based Spectrum Estimation for Online WPE DereverberationKeisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani. 384-388 [doi]
- Factorial Modeling for Effective Suppression of Directional NoiseOsamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie. 389-393 [doi]
- On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array MicrophonesYanhui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee. 394-398 [doi]
- Acoustic Modeling for Google HomeBo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon. 399-403 [doi]
- On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech RecognitionSeyedmahdad Mirsamadi, John H. L. Hansen. 404-408 [doi]
- Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis SystemMasanori Morise, Genta Miyashita, Kenji Ozawa. 409-413 [doi]
- Robust Source-Filter Separation of Speech Signal in the Phase DomainErfan Loweimi, Jon Barker, Oscar Saz Torralba, Thomas Hain. 414-418 [doi]
- 0 ChangesSimon Stone, Peter Steiner, Peter Birkholz. 419-423 [doi]
- o EstimationHideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda. 424-428 [doi]
- Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied EnvironmentsAvinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan. 429-433 [doi]
- Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech SynthesisMohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh. 434-438 [doi]
- Wavelet Speech Enhancement Based on Robust Principal Component AnalysisChia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao. 439-443 [doi]
- Vowel Onset Point Detection Using Sonority InformationBidisha Sharma, S. R. Mahadeva Prasanna. 444-448 [doi]
- Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual StudiesUnto K. Laine. 449-453 [doi]
- Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural NetworksChristian Kroos, Mark D. Plumbley. 454-458 [doi]
- Multilingual i-Vector Based Statistical Modeling for Music Genre ClassificationJia Dai, Wei Xue, Wenju Liu. 459-463 [doi]
- Indoor/Outdoor Audio Classification Using Foreground Speech SegmentationBanriskhem K. Khonglah, K. T. Deepak, S. R. Mahadeva Prasanna. 464-468 [doi]
- Attention Based CLDNNs for Short-Duration Acoustic Scene ClassificationJinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan. 469-473 [doi]
- Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event DetectionXianjun Xia, Roberto Togneri, Ferdous Ahmed Sohel, David Huang. 474-478 [doi]
- Enhanced Feature Extraction for Speech Detection in Media AudioInseon Jang, Chunghyun Ahn, Jeongil Seo, Younseon Jang. 479-483 [doi]
- Audio Classification Using Class-Specific Learned DescriptorsSukanya Sonowal, Tushar Sandhan, In Kyu Choi, Nam Soo Kim. 484-487 [doi]
- Hidden Markov Model Variational Autoencoder for Acoustic Unit DiscoveryJanek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj. 488-492 [doi]
- Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural NetworksMatthias Zöhrer, Franz Pernkopf. 493-497 [doi]
- Montreal Forced Aligner: Trainable Text-Speech Alignment Using KaldiMichael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner 0019, Morgan Sonderegger. 498-502 [doi]
- A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the 'Color' of Whispered Phonemes and Deep Neural NetworkG. Nisha Meenakshi, Prasanta Kumar Ghosh. 503-507 [doi]
- Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech RecognitionIan Williams, Petar S. Aleksic. 508-512 [doi]
- Comparison of Decoding Strategies for CTC Acoustic ModelsThomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel. 513-517 [doi]
- Phone Duration Modeling for LVCSR Using Neural NetworksHossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur. 518-522 [doi]
- Towards Better Decoding and Language Model Integration in Sequence to Sequence ModelsJan Chorowski, Navdeep Jaitly. 523-527 [doi]
- Empirical Evaluation of Parallel Training Algorithms on Acoustic ModelingWenpeng Li, Binbin Zhang, Lei Xie, Dong Yu. 528-532 [doi]
- Binary Deep Neural Networks for Speech RecognitionXu Xiang, Yanmin Qian, Kai Yu 0004. 533-537 [doi]
- Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter OptimizationAkshay Chandrashekaran, Ian R. Lane. 538-542 [doi]
- Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech RecognitionShohei Toyama, Daisuke Saito, Nobuaki Minematsu. 543-547 [doi]
- Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural NetworksVardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas C. Raykar, Lili Kotlerman, Guy Lev. 548-552 [doi]
- Estimation of Gap Between Current Language Models and Human PerformanceXiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow. 553-557 [doi]
- A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation RecoveryAnna Moró, György Szaszák. 558-562 [doi]
- Factors Affecting the Intelligibility of Low-Pass Filtered SpeechLei Wang, Fei Chen. 563-566 [doi]
- Phonetic Restoration of Temporally Reversed SpeechShi Yu Wang, Fei Chen. 567-570 [doi]
- Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed "Fast" SpeechMako Ishida. 571-575 [doi]
- Lexically Guided Perceptual Learning in Mandarin ChineseL. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler. 576-580 [doi]
- The Effect of Spectral Profile on the Intelligibility of Emotional Speech in NoiseChris Davis, Chee Seng Chong, Jeesun Kim. 581-585 [doi]
- Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is TalkingMerel Maslowski, Antje S. Meyer, Hans Rutger Bosker. 586-590 [doi]
- Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional SpeechDaniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto. 591-595 [doi]
- Predicting Epenthetic Vowel Quality from AcousticsAdriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux. 596-600 [doi]
- The Effect of Spectral Tilt on Size Discrimination of Voiced Speech SoundsToshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson. 601-605 [doi]
- Misperceptions of the Emotional Content of Natural and Vocoded Speech in a CarJaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi. 606-610 [doi]
- The Relative Cueing Power of F0 and Duration in German Prominence PerceptionOliver Niebuhr, Jana Winkler. 611-615 [doi]
- Perception and Acoustics of Vowel Nasality in Brazilian PortugueseLuciana Marques, Rebecca Scarborough. 616-620 [doi]
- Sociophonetic Realizations Guide Subsequent Lexical AccessJonny Kim, Katie Drager. 621-625 [doi]
- Critical Articulators Identification from RT-MRI of the Vocal TractSamuel Silva, António Teixeira. 626-630 [doi]
- Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance ImagesKrishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan. 631-635 [doi]
- Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance PriorsSasan Asadiabadi, Engin Erzin. 636-640 [doi]
- An Objective Critical Distance Measure Based on the Relative Level of Spectral ValleyT. V. Ananthapadmanabha, A. G. Ramakrishnan, Shubham Sharma. 641-644 [doi]
- Database of Volumetric and Real-Time Vocal Tract MRI for Speech ScienceTanner Sorensen, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam C. Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan. 645-649 [doi]
- The Influence on Realization and Perception of Lexical Tones from Affricate's AspirationChong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang. 650-654 [doi]
- Audiovisual Recalibration of Vowel CategoriesMatthias K. Franken, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen. 655-658 [doi]
- The Effect of Gesture on Persuasive SpeechJudith Peters, Marieke Hoetjes. 659-663 [doi]
- Auditory-Visual Integration of Talker Gender in Cantonese Tone PerceptionWei Lai. 664-668 [doi]
- Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech PerceptionTakayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco. 669-673 [doi]
- When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information MismatchLena F. Renner, Marcin Wlodarczak. 674-678 [doi]
- Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment CorrelationsWin Thuzar Kyaw, Yoshinori Sagisaka. 679-683 [doi]
- Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard SpeechDaryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain. 684-688 [doi]
- Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future DirectionsAndrea Bandini, Aravind Namasivayam, Yana Yunusova. 689-693 [doi]
- Accurate Synchronization of Speech and EGG Signal Using Phase InformationS. B. Sunil Kumar, K. Sreenivasa Rao, Tanumay Mandal. 694-698 [doi]
- The Acquisition of Focal Lengthening in Stockholm SwedishAnna Sara H. Romøren, Aoju Chen. 699-703 [doi]
- Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech RecognitionShiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu. 704-708 [doi]
- CTC Training of Multi-Phone Acoustic Models for Speech RecognitionOlivier Siohan. 709-713 [doi]
- An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and AdaptationSibo Tong, Philip N. Garner, Hervé Bourlard. 714-718 [doi]
- 2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based AdaptationMartin Karafiát, Murali Karthick Baskar, Pavel Matejka, Karel Veselý, Frantisek Grézl, Lukás Burget, Jan Cernocký. 719-723 [doi]
- Optimizing DNN Adaptation for Recognition of Enhanced SpeechMarco Matassoni, Alessio Brutti, Daniele Falavigna. 724-728 [doi]
- Deep Least Squares Regression for Speaker AdaptationYounggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim. 729-733 [doi]
- Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech RecognitionVan Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson. 734-738 [doi]
- Generalized Distillation Framework for Speaker NormalizationNeethu Mariam Joy, Sandeep Reddy Kothinti, S. Umesh, Basil Abraham. 739-743 [doi]
- Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic ModelsLahiru Samarakoon, Brian Mak, Khe Chai Sim. 744-748 [doi]
- Factorised Representations for Neural Network Adaptation to Diverse Acoustic EnvironmentsJoachim Fainberg, Steve Renals, Peter Bell 0001. 749-753 [doi]
- An RNN Model of Text NormalizationRichard Sproat, Navdeep Jaitly. 754-758 [doi]
- Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy LabelsAsaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran. 759-763 [doi]
- Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech SynthesisYusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami. 764-768 [doi]
- Global Syllable Vectors for Building TTS Front-End with Deep LearningJinfu Ni, Yoshinori Shiga, Hisashi Kawai. 769-773 [doi]
- Prosody Control of Utterance Sequence for Information DeliveringIshin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi. 774-778 [doi]
- Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output LayerYuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai. 779-783 [doi]
- Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration PredictionYibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu. 784-788 [doi]
- Discrete Duration Model for Speech SynthesisBo Chen, Tianling Bian, Kai Yu 0004. 789-793 [doi]
- Comparison of Modeling Target in LSTM-RNN Duration ModelBo Chen, Jiahao Lai, Kai Yu. 794-798 [doi]
- Learning Word Vector Representations Based on Acoustic CountsManuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi. 799-803 [doi]
- Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation DisfluenciesÉva Székely, Joseph Mendelson, Joakim Gustafson. 804-808 [doi]
- Prosograph: A Tool for Prosody Visualisation of Large Speech CorporaAlp Öktem, Mireia Farrús, Leo Wanner. 809-810 [doi]
- ChunkitApp: Investigating the Relevant Units of Online Speech ProcessingSvetlana Vetchinnikova, Anna Mauranen, Nina Mikusová. 811-812 [doi]
- Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision ControlMarkus Jochim. 813-814 [doi]
- HomeBank: A Repository for Long-Form Real-World Audio Recordings of ChildrenAnne S. Warlaumont, Mark VanDam, Elika Bergelson, Alejandrina Cristià. 815-816 [doi]
- A System for Real Time Collaborative Transcription CorrectionPeter Bell 0001, Joachim Fainberg, Catherine Lai, Mark Sinclair. 817-818 [doi]
- MoPAReST - Mobile Phone Assisted Remote Speech Therapy PlatformChitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu. 819-820 [doi]
- An Apparatus to Investigate Western Opera Singing Skill Learning Using Performance and Result Biofeedback, and Measuring its Neural CorrelatesAurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte. 821-822 [doi]
- PercyConfigurator - Perception Experiments as a ServiceChristoph Draxler. 823-824 [doi]
- System for Speech Transcription and Post-Editing in Microsoft WordAskars Salimbajevs, Indra Ikauniece. 825-826 [doi]
- Emojive! Collecting Emotion Data from Speech and Facial Expression Using Mobile Game AppJi-Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung. 827-828 [doi]
- Mylly - The Mill: A New Platform for Processing Speech and Text Corpora Easily and EfficientlyMietta Lennes, Jussi Piitulainen, Martin Matthiesen. 829-830 [doi]
- Visual Learning 2: Pronunciation App Using Ultrasound, Video, and MRIKyori Suzuki, Ian Wilson, Hayato Watanabe. 831-832 [doi]
- Dialogue as Collaborative Problem SolvingJames Allen. 833 [doi]
- Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word AffectBrian Stasak, Julien Epps, Roland Goecke. 834-838 [doi]
- Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot InteractionJosé Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahu, Richard M. Stern, Néstor Becerra Yoma. 839-843 [doi]
- Analysis of Engagement and User Experience with a Laughter Responsive Social RobotBekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, T. Metin Sezgin. 844-848 [doi]
- Automatic Classification of Autistic Child Vocalisations: A Novel Database and ResultsAlice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, Björn W. Schuller. 849-853 [doi]
- Crowd-Sourced Design of Artificial Attentive ListenersCatharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson. 854-858 [doi]
- Studying the Link Between Inter-Speaker Coordination and Speech Imitation Through Human-Machine InteractionsLeonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot 0001. 859-863 [doi]
- Adjusting the Frame: Biphasic Performative Control of Speech RhythmSamuel Delalez, Christophe d'Alessandro. 864-868 [doi]
- Attentional Factors in Listeners' Uptake of Gesture Cues During Speech ProcessingRaheleh Saryazdi, Craig G. Chambers. 869-873 [doi]
- Motion Analysis in Vocalized Surprise ExpressionsCarlos Toshinori Ishi, Takashi Minato, Hiroshi Ishiguro. 874-878 [doi]
- Enhancing Backchannel Prediction Using Word EmbeddingsRobin Ruede, Markus Müller 0001, Sebastian Stüker, Alex Waibel. 879-883 [doi]
- A Computational Model for Phonetically Responsive Spoken Dialogue SystemsEran Raveh, Ingmar Steiner, Bernd Möbius. 884-888 [doi]
- Incremental Dialogue Act Recognition: Token- vs Chunk-Based ClassificationEustace Ebhotemhen, Volha Petukhova, Dietrich Klakow. 889-893 [doi]
- Clear Speech - Mere Speech? How Segmental and Prosodic Speech Reduction Shape the Impression That Speakers Create on ListenersOliver Niebuhr. 894-898 [doi]
- Relationships Between Speech Timing and Perceived Hostility in a French Corpus of Political DebatesCharlotte Kouklia, Nicolas Audibert. 899-903 [doi]
- Towards Speaker Characterization: Identifying and Predicting Dimensions of Person AttributionLaura Fernández Gallardo, Benjamin Weiss 0001. 904-908 [doi]
- Prosodic Analysis of Attention-Drawing SpeechCarlos Toshinori Ishi, Jun Arai, Norihiro Hagita. 909-913 [doi]
- Perceptual and Acoustic CorreLates of Gender in the Prepubertal VoiceAdrian P. Simpson, Riccarda Funk, Frederik Palmer. 914-918 [doi]
- To See or not to See: Interlocutor Visibility and Likeability Influence Convergence in IntonationKatrin Schweitzer, Michael Walsh 0001, Antje Schweitzer. 919-923 [doi]
- Acoustic Correlates of Parental Role and Gender Identity in the Speech of Expecting ParentsMelanie Weirich, Adrian P. Simpson. 924-928 [doi]
- A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced DomainsRubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrão, Anna Pompili, Ramón Fernández Astudillo, Joana Campos 0001, Ana Paiva, Isabel Trancoso. 929-933 [doi]
- Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic CaptionsRachael Tatman, Conner Kasten. 934-938 [doi]
- A Comparison of Sequence-to-Sequence Models for Speech RecognitionRohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly. 939-943 [doi]
- CTC in the Context of Generalized Full-Sum HMM TrainingAlbert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney. 944-948 [doi]
- Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LMTakaaki Hori, Shinji Watanabe, Yu Zhang, William Chan. 949-953 [doi]
- Multitask Learning with CTC and Segmental CRF for Speech RecognitionLiang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith. 954-958 [doi]
- Direct Acoustics-to-Word Models for English Conversational Speech RecognitionKartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo. 959-963 [doi]
- Reducing the Computational Complexity of Two-Dimensional LSTMsBo Li, Tara N. Sainath. 964-968 [doi]
- Functional Principal Component Analysis of Vocal Tract Area FunctionsJorge C. Lucero. 969-973 [doi]
- Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and LanguagesGanesh Sivaraman, Carol Y. Espy-Wilson, Martijn Wieling. 974-978 [doi]
- Integrated Mechanical Model of [r]-[l] and [b]-[m]-[w] Producing Consonant Cluster [br]Takayuki Arai. 979-983 [doi]
- A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic InversionLeonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil. 984-988 [doi]
- Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation AnalysisHidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu. 989-993 [doi]
- Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance ImagingTanner Sorensen, Asterios Toutios, Johannes Töger, Louis Goldstein, Shrikanth S. Narayanan. 994-998 [doi]
- Deep Neural Network Embeddings for Text-Independent Speaker VerificationDavid Snyder, Daniel Garcia-Romero, Daniel Povey, Sanjeev Khudanpur. 999-1003 [doi]
- Tied Variational Autoencoder Backends for i-Vector Speaker RecognitionJesús Villalba, Niko Brümmer, Najim Dehak. 1004-1008 [doi]
- Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck FeaturesShivesh Ranjan, John H. L. Hansen. 1009-1013 [doi]
- Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel InformationSuwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko. 1014-1018 [doi]
- Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker VerificationAbbas Khosravani, Mohammad Mehdi Homayounpour. 1019-1023 [doi]
- DNN Bottleneck Features for Speaker ClusteringJesús Jorrín, Paola García, Luis Buera. 1024-1028 [doi]
- Creak as a Feature of Lexical Stress in EstonianKätlin Aare, Pärtel Lippus, Juraj Simko. 1029-1033 [doi]
- Cross-Speaker Variation in Voice Source Correlates of Focus and DeaccentuationIrena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl. 1034-1038 [doi]
- Acoustic Characterization of Word-Final Glottal Stops in Mizo and Assam SoraSishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S. R. Mahadeva Prasanna, Samarendra Dandapat. 1039-1043 [doi]
- Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse FilteringParham Mokhtari, Hiroshi Ando. 1044-1048 [doi]
- Automatic Measurement of Pre-AspirationYaniv Sheena, Mísa Hejná, Yossi Adi, Joseph Keshet. 1049-1053 [doi]
- Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati SpeakersKiranpreet Nara. 1054-1058 [doi]
- An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech SynthesisXin Wang, Shinji Takaki, Junichi Yamagishi. 1059-1063 [doi]
- Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure InformationViacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman. 1064-1068 [doi]
- 0 Prediction for Electrolaryngeal Speech EnhancementKou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura 0001. 1069-1073 [doi]
- 0 Contours for Statistical Phrase/Accent Command EstimationNobukatsu Hojo, Yasuhito Ohsugi, Yusuke Ijima, Hirokazu Kameoka. 1074-1078 [doi]
- Controlling Prominence Realisation in Parametric DNN-Based Speech SynthesisZofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson. 1079-1083 [doi]
- Increasing Recall of Lengthening Detection via Semi-Automatic ClassificationSimon Betz, Jana Voße, Sina Zarrieß, Petra Wagner. 1084-1088 [doi]
- Efficient Emotion Recognition from Speech Using Deep Learning on SpectrogramsAharon Satt, Shai Rozenberg, Ron Hoory. 1089-1093 [doi]
- Interaction and Transition Model for Speech Emotion Recognition in DialogueRuo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono. 1094-1097 [doi]
- Progressive Neural Networks for Transfer Learning in Emotion RecognitionJohn Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost. 1098-1102 [doi]
- Jointly Predicting Arousal, Valence and Dominance with Multi-Task LearningSrinivas Parthasarathy, Carlos Busso. 1103-1107 [doi]
- Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural NetworkDuc Le, Zakaria Aldeneh, Emily Mower Provost. 1108-1112 [doi]
- Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task LearningJaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers. 1113-1117 [doi]
- Speaker-Dependent WaveNet VocoderAkira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda. 1118-1122 [doi]
- Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth ExtensionYu Gu, Zhen-Hua Ling. 1123-1127 [doi]
- Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech SynthesisShinji Takaki, Hirokazu Kameoka, Junichi Yamagishi. 1128-1132 [doi]
- A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech SynthesisSrikanth Ronanki, Oliver Watts, Simon King. 1133-1137 [doi]
- Statistical Voice Conversion with WaveNet-Based Waveform GenerationKazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda. 1138-1142 [doi]
- Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based AutoencodersVincent Wan, Yannis Agiomyrgiannakis, Hanna Silén, Jakub Vít. 1143-1147 [doi]
- A Comparison of Sentence-Level Speech Intelligibility MetricsAlexander Kain, Max Del Giudice, Kris Tjaden. 1148-1152 [doi]
- An Auditory Model of Speaker Size Perception for Voiced Speech SoundsToshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson. 1153-1157 [doi]
- The Recognition of Compounds: A Computational AccountLouis ten Bosch, Lou Boves, Mirjam Ernestus. 1158-1162 [doi]
- Humans do not Maximize the Probability of Correct Decision When Recognizing DANTALE Words in NoiseMohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen 0001. 1163-1167 [doi]
- Single-Ended Prediction of Listening Effort Based on Automatic Speech RecognitionRainer Huber, Constantin Spille, Bernd T. Meyer. 1168-1172 [doi]
- Modeling Categorical Perception with the Receptive Fields of Auditory NeuronsChris Neufeld. 1173-1177 [doi]
- A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech SeparationYannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee. 1178-1182 [doi]
- Deep Clustering-Based Beamforming for Separation with Unknown Number of SourcesTakuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolíková, Tomohiro Nakatani. 1183-1187 [doi]
- Time-Frequency Masking for Blind Source Separation with Preserved Spatial CuesShadi Pirhosseinloo, Kostas Kokkinakis. 1188-1192 [doi]
- Variational Recurrent Neural Networks for Speech SeparationJen-Tzung Chien, Kuan-Ting Kuo. 1193-1197 [doi]
- Detecting Overlapped Speech on Short Timeframes Using Deep LearningValentin Andrei, Horia Cucu, Corneliu Burileanu. 1198-1202 [doi]
- Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant ConditionsXu Li, Junfeng Li, Yonghong Yan 0002. 1203-1207 [doi]
- The Vocative Chant and Beyond: German Calling Melodies Under Routine and Urgent ContextsSergio I. Quiroz, Marzena Zygis. 1208-1212 [doi]
- Comparing Languages Using Hierarchical Prosodic AnalysisJuraj Simko, Antti Suni, Katri Hiovain, Martti Vainio. 1213-1217 [doi]
- Intonation Facilitates Prediction of Focus Even in the Presence of Lexical TonesMartin Ho Kwan Ip, Anne Cutler. 1218-1222 [doi]
- Mind the Peak: When Museum is Temporarily Understood as Musical in Australian EnglishKatharina Zahner, Heather Kember, Bettina Braun. 1223-1227 [doi]
- Pashto Intonation PatternsLuca Rognoni, Judith Bishop, Miriam Corris. 1228-1232 [doi]
- A New Model of Final Lowering in Spontaneous MonologueKikuo Maekawa. 1233-1237 [doi]
- Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion SpaceXi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai. 1238-1242 [doi]
- Adversarial Auto-Encoders for Speech Based Emotion RecognitionSaurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael AbdAlmageed, Carol Espy Wilson. 1243-1247 [doi]
- An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture RegressionTing Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah. 1248-1252 [doi]
- Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion RecognitionSoheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost. 1253-1257 [doi]
- Voice-to-Affect Mapping: Inferences on Language Voice Baseline SettingsAilbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl. 1258-1262 [doi]
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted SpeechMichael Neumann, Ngoc Thang Vu. 1263-1267 [doi]
- Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior ProbabilitiesHiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari. 1268-1272 [doi]
- Learning Latent Representations for Speech Generation and TransformationWei-Ning Hsu, Yu Zhang, James R. Glass. 1273-1277 [doi]
- Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech CorpusTetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu. 1278-1282 [doi]
- Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial NetworksTakuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino. 1283-1287 [doi]
- A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice TransformationLuc Ardaillon, Axel Roebel. 1288-1292 [doi]
- Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and ConversionSeyed Hamidreza Mohammadi, Alexander Kain. 1293-1297 [doi]
- Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence MappingHasim Sak, Matt Shannon, Kanishka Rao, Françoise Beaufays. 1298-1302 [doi]
- Highway-LSTM and Recurrent Highway Networks for Speech RecognitionGolan Pundak, Tara N. Sainath. 1303-1307 [doi]
- Improving Speech Recognition by Revising Gated Recurrent UnitsMirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio. 1308-1312 [doi]
- Stochastic Recurrent Neural Network for Speech RecognitionJen-Tzung Chien, Chen Shen. 1313-1317 [doi]
- Frame and Segment Level Recurrent Neural Networks for Phone ClassificationMartin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf. 1318-1322 [doi]
- Deep Learning-Based Telephony Speech Recognition in the WildKyu J. Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian R. Lane. 1323-1327 [doi]
- The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, et al.. 1328-1332 [doi]
- The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation SystemPedro A. Torres-Carrasquillo, Fred Richardson, Shahan Nercessian, Douglas E. Sturim, William M. Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Sri Harish Reddy Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Réda Dehak. 1333-1337 [doi]
- Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation SystemDaniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface. 1338-1342 [doi]
- UTD-CRSS Systems for 2016 NIST Speaker Recognition EvaluationChunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen. 1343-1347 [doi]
- Analysis and Description of ABC Submission to NIST SRE 2016Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez, Johan Rohdin, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya. 1348-1352 [doi]
- The 2016 NIST Speaker Recognition EvaluationSeyed Omid Sadjadi, Timothée Kheyrkhah, Audrey Tong, Craig S. Greenberg, Douglas A. Reynolds, Elliot Singer, Lisa P. Mason, Jaime Hernandez-Cordero. 1353-1357 [doi]
- A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing SynthesisHideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino. 1358-1362 [doi]
- Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMsAna Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku. 1363-1367 [doi]
- Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech SystemLauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku. 1368-1372 [doi]
- Semi Parametric Concatenative TTS with Instant Voice Modification CapabilitiesAlexander Sorin, Slava Shechtman, Asaf Rendel. 1373-1377 [doi]
- Modeling Laryngeal Muscle Activation Noise for Low-Order Physiological Based Speech SynthesisRodrigo Manríquez, Sean D. Peterson, Pavel Prado, Patricio Orio, Matías Zañartu. 1378-1382 [doi]
- Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech SynthesisFelipe Espic, Cassia Valentini-Botinhao, Simon King. 1383-1387 [doi]
- Similar Prosodic Structure Perceived Differently in German and EnglishHeather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler. 1388-1392 [doi]
- Disambiguate or not? - The Role of Prosody in Unambiguous and Potentially Ambiguous Anaphora Production in Strictly Mandarin Parallel StructuresLuying Hou, Bert Le Bruyn, René Kager. 1393-1397 [doi]
- Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian PortugueseAngeliki Athanasopoulou, Irene Vogel, Hossep Dolatian. 1398-1402 [doi]
- Phonological Complexity, Segment Rate and Speech Tempo PerceptionLeendert Plug, Rachel Smith. 1403-1406 [doi]
- On the Duration of Mandarin TonesJing Yang, Yu Zhang, Aijun Li, Li Xu. 1407-1411 [doi]
- The Formant Dynamics of Long Close Vowels in Three Varieties of SwedishOtto Ewald, Eva Liina Asu, Susanne Schötz. 1412-1416 [doi]
- Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's SpeechYao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland. 1417-1421 [doi]
- Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTWJunwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu. 1422-1426 [doi]
- Off-Topic Spoken Response Detection Using Siamese Convolutional Neural NetworksChong Min Lee, Su-Youn Yoon, Xihao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini. 1427-1431 [doi]
- Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active LearningVipul Arora, Aditi Lahiri, Henning Reetz. 1432-1436 [doi]
- Detection of Mispronunciations and Disfluencies in Children Reading AloudJorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão. 1437-1441 [doi]
- Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label SequencesDavid Escudero Mancebo, César González Ferreras, Lourdes Aguilar, Eva Estebas-Vilaplana. 1442-1446 [doi]
- Inferring Stance from ProsodyNigel G. Ward, Jason C. Carlson, Olac Fuentes, Diego Castán, Elizabeth Shriberg, Andreas Tsiartas. 1447-1451 [doi]
- Exploring Dynamic Measures of Stance in Spoken InteractionGina-Anne Levow, Richard A. Wright. 1452-1456 [doi]
- Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random FieldsValentin Barrière, Chloé Clavel, Slim Essid. 1457-1461 [doi]
- Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception PredictionQinyi Luo, Rahul Gupta, Shrikanth S. Narayanan. 1462-1466 [doi]
- The Sound of Deception - What Makes a Speaker Credible?Anne Schröder, Simon Stone, Peter Birkholz. 1467-1471 [doi]
- Hybrid Acoustic-Lexical Deep Learning Approach for Deception DetectionGideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg. 1472-1476 [doi]
- A Generative Model for Score Normalization in Speaker RecognitionAlbert Swart, Niko Brümmer. 1477-1481 [doi]
- Content Normalization for Text-Dependent Speaker VerificationSubhadeep Dey, Srikanth R. Madikeri, Petr Motlícek, Marc Ferras. 1482-1486 [doi]
- End-to-End Text-Independent Speaker Verification with Triplet Loss on Short UtterancesChunlei Zhang, Kazuhito Koishida. 1487-1491 [doi]
- Adversarial Network Bottleneck Features for Noise Robust Speaker VerificationHong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo. 1492-1496 [doi]
- What Does the Speaker Embedding Encode?Shuai Wang, Yanmin Qian, Kai Yu 0004. 1497-1501 [doi]
- Incorporating Local Acoustic Variability Information into Short Duration Speaker VerificationJianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee. 1502-1506 [doi]
- DNN i-Vector Speaker Verification with Short, Text-Constrained Test UtterancesJinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng. 1507-1511 [doi]
- Time-Varying Autoregressions for Speaker Verification in Reverberant ConditionsVille Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen. 1512-1516 [doi]
- Deep Speaker Embeddings for Short-Duration Speaker VerificationGautam Bhattacharya, Md. Jahangir Alam, Patrick Kenny. 1517-1521 [doi]
- Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification SystemsSoo-Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan. 1522-1526 [doi]
- Gain Compensation for Fast i-Vector Extraction Over Short DurationKong-Aik Lee, Haizhou Li. 1527-1531 [doi]
- Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker VerificationHee-Soo Heo, Jee-weon Jung, Il-Ho Yang, Sung Hyun Yoon, Ha-Jin Yu. 1532-1536 [doi]
- Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least SquaresChen Chen, Jiqing Han, Yilin Pan. 1537-1541 [doi]
- Deep Speaker Feature Learning for Text-Independent Speaker VerificationLantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang. 1542-1546 [doi]
- Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker VerificationPierre-Michel Bousquet, Mickael Rouvier. 1547-1551 [doi]
- Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker RecognitionAlan McCree, Gregory Sell, Daniel Garcia-Romero. 1552-1556 [doi]
- Improving the Effectiveness of Speaker Verification Domain Adaptation with Inadequate In-Domain DataBengt J. Borgström, Elliot Singer, Douglas A. Reynolds, Seyed Omid Sadjadi. 1557-1561 [doi]
- i-Vector DNN Scoring and Calibration for Noise Robust Speaker VerificationZhili Tan, Man-Wai Mak. 1562-1566 [doi]
- Analysis of Score Normalization in Multilingual Speaker RecognitionPavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget, Mireia Díez Sánchez, Jan Cernocký. 1567-1571 [doi]
- Alternative Approaches to Neural Network Based Speaker VerificationAnna Silnova, Lukás Burget, Jan Cernocký. 1572-1575 [doi]
- A Distribution Free Formulation of the Total Variability ModelRuchir Travadi, Shrikanth S. Narayanan. 1576-1580 [doi]
- Domain Mismatch Modeling of Out-Domain i-Vectors for PLDA Speaker VerificationMd. Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan. 1581-1585 [doi]
- An Exploration of Dropout with LSTMsGaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, YongHong Yan. 1586-1590 [doi]
- Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech RecognitionJaeyoung Kim, Mostafa El-Khamy, Jungwon Lee. 1591-1595 [doi]
- Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic ModelingDung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani. 1596-1600 [doi]
- Forward-Backward Convolutional LSTM for Acoustic ModelingShigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani. 1601-1605 [doi]
- Convolutional Recurrent Neural Networks for Small-Footprint Keyword SpottingSercan Ömer Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates. 1606-1610 [doi]
- Deep Activation Mixture Model for Speech RecognitionChunyang Wu, Mark J. F. Gales. 1611-1615 [doi]
- Ensembles of Multi-Scale VGG Acoustic ModelsMichael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura 0001. 1616-1620 [doi]
- Training Context-Dependent DNN Acoustic Models Using Probabilistic SamplingTamás Grósz, Gábor Gosztolya, László Tóth. 1621-1625 [doi]
- A Comparative Evaluation of GMM-Free State Tying Methods for ASRTamás Grósz, Gábor Gosztolya, László Tóth. 1626-1630 [doi]
- Backstitch: Counteracting Finite-Sample Bias via Negative StepsYiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur. 1631-1635 [doi]
- Node Pruning Based on Entropy of Weights and Node Activity for Small-Footprint Acoustic Model Based on Deep Neural NetworksRyu Takeda, Kazuhiro Nakadai, Kazunori Komatani. 1636-1640 [doi]
- End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlowEhsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani. 1641-1645 [doi]
- An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix MultiplicationKhe Chai Sim, Arun Narayanan. 1646-1650 [doi]
- Parallel Neural Network Features for Improved Tandem Acoustic ModelingZoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney. 1651-1655 [doi]
- Acoustic Feature Learning via Deep Variational Canonical Correlation AnalysisQingming Tang, Weiran Wang, Karen Livescu. 1656-1660 [doi]
- Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential NetworksRyo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka. 1661-1665 [doi]
- Improving Prediction of Speech Activity Using Multi-Participant Respiratory StateMarcin Wlodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare. 1666-1670 [doi]
- Turn-Taking Offsets and Dialogue ContextPeter A. Heeman, Rebecca Lunsford. 1671-1675 [doi]
- Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue SystemsAngelika Maier, Julian Hough, David Schlangen. 1676-1680 [doi]
- End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese SpeechYuichi Ishimoto, Takehiro Teraoka, Mika Enomoto. 1681-1685 [doi]
- Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic ContentsChaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro. 1686-1690 [doi]
- Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTCHirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara. 1691-1695 [doi]
- Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic LevelsZahra Rahimi, Anish Kumar, Diane J. Litman, Susannah Paletz, Mingzhi Yu. 1696-1700 [doi]
- Measuring Synchrony in Task-Based DialoguesJustine Reverdy, Carl Vogel. 1701-1705 [doi]
- Sequence to Sequence Modeling for User Simulation in Dialog SystemsPaul A. Crook, Alex Marin. 1706-1710 [doi]
- Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog InteractionsVikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft. 1711-1715 [doi]
- Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center CallsAtsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono. 1716-1720 [doi]
- Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy LearningStefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-hao Su, Tsung-Hsien Wen, Milica Gasic, Steve J. Young. 1721-1725 [doi]
- Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence PositionsShizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara. 1726-1730 [doi]
- Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic InteractionsSyeda Narjis Fatima, Engin Erzin. 1731-1735 [doi]
- An Automatically Aligned Corpus of Child-Directed SpeechMicha Elsner, Kiwako Ito. 1736-1740 [doi]
- A Comparison of Danish Listeners' Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English SentencesOcke-Schwen Bohn, Trine Askjær-Jørgensen. 1741-1744 [doi]
- On the Role of Temporal Variability in the Acquisition of the German Vowel Length ContrastFelicitas Kleber. 1745-1749 [doi]
- A Data-Driven Approach for Perceptually Validated Acoustic Features for Children's Sibilant Fricative ProductionsPatrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson. 1750-1754 [doi]
- Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as ReferenceYujia Xiao, Frank K. Soong. 1755-1759 [doi]
- Mechanisms of Tone Sandhi Rule Application by Non-Native SpeakersSi Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang. 1760-1764 [doi]
- Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin ChineseSeth Wiener. 1765-1769 [doi]
- Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin SpeakersYing Chen, Eric Pederson. 1770-1774 [doi]
- Prosody Analysis of L2 English for Naturalness Evaluation Through Speech ModificationDean Luo, Ruxin Luo, Lixin Wang. 1775-1778 [doi]
- Measuring Encoding Efficiency in Swedish and English Language Learner Speech ProductionGintare Grigonyte, Gerold Schneider. 1779-1783 [doi]
- Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish ListenersAdriana Hanulíková, Jenny Ekström. 1784-1788 [doi]
- Qualitative Differences in L3 Learners' Neurophysiological Response to L1 versus L2 TransferAlejandra Keidel Fernández, Thomas Hörberg. 1789-1793 [doi]
- Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled forJohan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva. 1794-1798 [doi]
- The Relationship Between the Perception and Production of Non-Native TonesKaile Zhang, Gang Peng. 1799-1803 [doi]
- MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated SpeechEllen Marklund, Elísabet Eir Cortes, Johan Sjons. 1804-1808 [doi]
- Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad AliVisar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig. 1809-1813 [doi]
- Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control SpeakersAntonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi. 1814-1818 [doi]
- Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated AssessmentAndrea Bandini, Jordan R. Green, Lorne Zinman, Yana Yunusova. 1819-1823 [doi]
- Zero Frequency Filter Based Analysis of Voice DisordersNagaraj Adiga, Vikram C. M., Keerthi Pullela, S. R. Mahadeva Prasanna. 1824-1828 [doi]
- Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space AreaNikitha K., Sishir Kalita, C. M. Vikram, M. Pushpavathi, S. R. Mahadeva Prasanna. 1829-1833 [doi]
- Automatic Prediction of Speech Evaluation Metrics for Dysarthric SpeechImed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier. 1834-1838 [doi]
- Apkinson - A Mobile Monitoring Solution for Parkinson's DiseasePhilipp Klumpp, Thomas Janu, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth. 1839-1843 [doi]
- Dysprosody Differentiate Between Parkinson's Disease, Progressive Supranuclear Palsy, and Multiple System AtrophyJan Hlavnicka, Tereza Tykalová, Roman Cmejla, Jirí Klempír, Evzen Ruzicka, Jan Rusz. 1844-1848 [doi]
- Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural NetworksMing Tu, Visar Berisha, Julie Liss. 1849-1853 [doi]
- Deep Autoencoder Based Speech Features for Improved Dysarthric Speech RecognitionBhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu. 1854-1858 [doi]
- Prediction of Speech Delay from Acoustic MeasurementsJason Lilley, Madhavi Vedula Ratnagiri, H. Timothy Bunnell. 1859-1863 [doi]
- The Frequency Range of "The Ling Six Sounds" in Standard ChineseAijun Li, Hua Zhang, Wen Sun. 1864-1868 [doi]
- Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted ChildrenWentao Gu, Jiao Yin, James J. Mahshie. 1869-1873 [doi]
- Audio Content Based Geotagging in MultimediaAnurag Kumar 0003, Benjamin Elizalde, Bhiksha Raj. 1874-1878 [doi]
- Time Delay Histogram Based Speech Source Separation Using a Planar ArrayZhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan 0002. 1879-1883 [doi]
- Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech SequenceGayadhar Pradhan, Avinash Kumar, Syed Shahnawazuddin. 1884-1888 [doi]
- A Contrast Function and Algorithm for Blind Separation of Audio SignalsWei Gao, Roberto Togneri, Victor Sreeram. 1889-1893 [doi]
- Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech SourceChenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li. 1894-1898 [doi]
- Speaker Direction-of-Arrival Estimation Based on Frequency-Independent BeampatternFeng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan. 1899-1903 [doi]
- A Mask Estimation Method Integrating Data Field Model for Speech EnhancementXianyun Wang, Changchun Bao, Feng Bao 0003. 1904-1908 [doi]
- Improved End-of-Query Detection for Streaming Speech RecognitionMatt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada. 1909-1913 [doi]
- Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AEDDi He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen. 1914-1918 [doi]
- Improving Source Separation via Multi-Speaker RepresentationsJeroen Zegers, Hugo Van Hamme. 1919-1923 [doi]
- Multiple Sound Source Counting and Localization Based on Spatial Principal EigenvectorBing Yang, Hong Liu, Cheng Pang. 1924-1928 [doi]
- Subband Selection for Binaural Speech Source LocalizationGirija Ramesan Karthik, Prasanta Kumar Ghosh. 1929-1933 [doi]
- Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech RecordingsBo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu. 1934-1937 [doi]
- Bimodal Recurrent Neural Network for Audiovisual Voice Activity DetectionFei Tao, Carlos Busso. 1938-1942 [doi]
- Domain-Specific Utterance End-Point Detection for Speech RecognitionRoland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister. 1943-1947 [doi]
- Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant EnvironmentsVinay Kothapally, John H. L. Hansen. 1948-1952 [doi]
- A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech EnhancementYi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang. 1953-1957 [doi]
- Multi-Target Ensemble Learning for Monaural Speech SeparationHui Zhang, Xueliang Zhang, Guanglai Gao. 1958-1962 [doi]
- Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example SearchAtsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani. 1963-1967 [doi]
- Subjective Intelligibility of Deep Neural Network-Based Speech EnhancementFemke B. Gelderblom, Tron V. Tronstad, Erlend Magnus Viggen. 1968-1972 [doi]
- Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech IntelligibilityMaria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh. 1973-1977 [doi]
- On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech SignalsHans-Günter Hirsch, Michael Gref. 1978-1982 [doi]
- MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech EnhancementRobert Rehr, Timo Gerkmann. 1983-1987 [doi]
- Binary Mask Estimation Strategies for Constrained Imputation-Based Speech EnhancementRicard Marxer, Jon Barker. 1988-1992 [doi]
- A Fully Convolutional Neural Network for Speech EnhancementSe Rim Park, Jinwon Lee. 1993-1997 [doi]
- Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral RegularizationLi Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino. 1998-2002 [doi]
- A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech SeparationDanny Websdale, Ben Milner. 2003-2007 [doi]
- Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker VerificationDaniel Michelsanti, Zheng-Hua Tan. 2008-2012 [doi]
- Speech Enhancement Using Bayesian WavenetKaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florêncio, Mark Hasegawa-Johnson. 2013-2017 [doi]
- Binaural Reverberant Speech Separation Based on Deep Neural NetworksXueliang Zhang, DeLiang Wang. 2018-2022 [doi]
- On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening EnhancementTudor-Catalin Zorila, Yannis Stylianou. 2023-2027 [doi]
- Applications of the BBN Sage Speech Processing PlatformRalf Meermeier, Sean Colbath. 2028-2029 [doi]
- Bob Speaks KaldiMilos Cernak, Alain Komaty, Amir Mohammadi, André Anjos, Sébastien Marcel. 2030-2031 [doi]
- Real Time Pitch Shifting with Formant Structure Preservation Using the Phase VocoderMichal Lenarczyk. 2032-2033 [doi]
- A Signal Processing Approach for Speaker Separation Using SFF AnalysisNivedita Chennupati, B. H. V. S. Narayana Murthy, B. Yegnanarayana. 2034-2035 [doi]
- Speech Recognition and Understanding on Hardware-Accelerated DSPGeorg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef G. Bauer, Jakub Nowicki, Tobias Bocklet, Hannah R. Colett, Ohad Falik, Michael Deisher, Sylvia J. Downing. 2036-2037 [doi]
- MetaLab: A Repository for Meta-Analyses on Language Development, and MoreSho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristià. 2038-2039 [doi]
- Evolving Recurrent Neural Networks That Process and Classify Raw Audio in a Streaming FashionAdrien Daniel. 2040-2041 [doi]
- Combining Gaussian Mixture Models and Segmental Feature Models for Speaker RecognitionMilana Milosevic, Ulrike Glavitsch. 2042-2043 [doi]
- "Did you laugh enough today?" - Deep Neural Networks for Mobile and Wearable Laughter TrackersGerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn W. Schuller. 2044-2045 [doi]
- Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public TransportationKwang Myung Jeon, Nam-Kyun Kim, Chan Woong Kwak, Jung-Min Moon, Hong Kook Kim. 2046-2047 [doi]
- Real-Time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA JetsonSean U. N. Wood, Jean Rouat. 2048-2049 [doi]
- Reading Validation for Pronunciation Evaluation in the Digitala ProjectAku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo. 2050-2051 [doi]
- Conversing with Social Agents That Smile and LaughCatherine Pelachaud. 2052 [doi]
- Team ELISA System for DARPA LORELEI Speech Evaluation 2016Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick Baskar, Martin Karafiát, Lukás Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth S. Narayanan. 2053-2057 [doi]
- First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe RegionPéter Mihajlik, Lili Szabó, Balázs Tarján, András Balog, Krisztina Rábai. 2058-2062 [doi]
- The Motivation and Development of MPAi, a Māori Pronunciation AidCatherine I. Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, J. King. 2063-2067 [doi]
- On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic ModelingSiyuan Feng, Tan Lee. 2068-2072 [doi]
- Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic TranscriptionsAmit Das, Mark Hasegawa-Johnson, Karel Veselý. 2073-2077 [doi]
- Areal and Phylogenetic Features for Multilingual Speech SynthesisAlexander Gutkin, Richard Sproat. 2078-2082 [doi]
- SLPAnnotator: Tools for Implementing Sign Language Phonetic AnnotationKathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman. 2083-2087 [doi]
- The LENA System Applied to Swedish: Reliability of the Adult Word Count EstimateIris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund. 2088-2092 [doi]
- What do Babies Hear? Analyses of Child- and Adult-Directed SpeechMarisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Elika Bergelson. 2093-2097 [doi]
- A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language EnvironmentsMarisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristià, Melanie Soderstrom, Mark VanDam, Han Sloetjes. 2098-2102 [doi]
- Top-Down versus Bottom-Up Theories of Phonological Acquisition: A Big Data ApproachChristina Bergmann, Sho Tsuji, Alejandrina Cristià. 2103-2107 [doi]
- Which Acoustic and Phonological Factors Shape Infants' Vowel Discrimination? Exploiting Natural Variation in InPhonDBSho Tsuji, Alejandrina Cristià. 2108-2112 [doi]
- The ABAIR Initiative: Bringing Spoken Irish into the Digital SpaceAilbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl. 2113-2117 [doi]
- Very Low Resource Radio Browsing for Agile Developmental and Humanitarian MonitoringArmin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler. 2118-2122 [doi]
- Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot ResultsNikolaos Malandrakis, Ondrej Glembek, Shrikanth S. Narayanan. 2123-2127 [doi]
- Eliciting Meaningful Units from SpeechDaniil Kocharov, Tatiana Kachkovskaia, Pavel A. Skrelin. 2128-2132 [doi]
- Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech ApplicationsSaurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty. 2133-2137 [doi]
- Machine Assisted Analysis of Vowel Length Contrasts in WolofElodie Gauthier, Laurent Besacier, Sylvie Voisin. 2138-2142 [doi]
- Leveraging Text Data for Word Segmentation for Underresourced LanguagesThomas Glarner, Benedikt T. Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach. 2143-2147 [doi]
- Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual InitializationXiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu. 2148-2152 [doi]
- Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource LanguagesBasil Abraham, S. Umesh, Neethu Mariam Joy. 2153-2157 [doi]
- Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource LanguagesBasil Abraham, Tejaswi Seeram, S. Umesh. 2158-2162 [doi]
- Building an ASR Corpus Using Althingi's Parliamentary SpeechesInga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason. 2163-2167 [doi]
- Implementation of a Radiology Speech Recognition System for Estonian Using Open Source SoftwareTanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister. 2168-2172 [doi]
- Building ASR Corpora Using EyraJón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir. 2173-2177 [doi]
- Rapid Development of TTS Corpora for Four South African LanguagesDaniel R. van Niekerk, Charl Johannes van Heerden, Marelie H. Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha. 2178-2182 [doi]
- Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced LanguagesAlexander Gutkin. 2183-2187 [doi]
- Nativization of Foreign Names in TTS for Automatic Reading of World News in SwahiliJoseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King. 2188-2192 [doi]
- Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin SpeechRong Tong, Nancy F. Chen, Bin Ma. 2193-2197 [doi]
- Relating Unsupervised Word Segmentation to Reported Vocabulary AcquisitionElin Larsen, Alejandrina Cristià, Emmanuel Dupoux. 2198-2202 [doi]
- Modelling the Informativeness of Non-Verbal Cues in Parent-Child InteractionMats Wirén, Kristina N. Björkenstam, Robert Östling. 2203-2207 [doi]
- Computational Simulations of Temporal Vocalization Behavior in Adult-Child InteractionEllen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson. 2208-2212 [doi]
- Approximating Phonotactic Input in Children's Linguistic Environments from Orthographic TranscriptsSofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam. 2213-2217 [doi]
- Learning Weakly Supervised Multimodal Phoneme EmbeddingsRahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux. 2218-2222 [doi]
- Personalized Quantification of Voice Attractiveness in Multidimensional Merit SpaceYasunari Obuchi. 2223-2227 [doi]
- The Role of Temporal Amplitude Modulations in the Political Arena: Hillary Clinton vs. Donald TrumpHans Rutger Bosker. 2228-2232 [doi]
- Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based CrowdsourcingLaura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller. 2233-2237 [doi]
- Attractiveness of French Voices for German Listeners - Results from Native and Non-Native Read SpeechJürgen Trouvain, Frank Zimmerer. 2238-2242 [doi]
- Social Attractiveness in DialogsAntje Schweitzer, Natalie Lewandowski, Daniel Duran 0001. 2243-2247 [doi]
- A Gender Bias in the Acoustic-Melodic Features of Charismatic Speech?Eszter Novák-Tót, Oliver Niebuhr, Aoju Chen. 2248-2252 [doi]
- Pitch Convergence as an Effect of Perceived Attractiveness and LikabilityJan Michalsky, Heike Schoormann. 2253-2256 [doi]
- Does Posh English Sound Attractive?Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu. 2257-2261 [doi]
- Large-Scale Speaker Ranking from Crowdsourced Pairwise Listener RatingsTimo Baumann. 2262-2266 [doi]
- Aerodynamic Features of French FricativesRosario Signorello, Sergio Hassid, Didier Demolin. 2267-2271 [doi]
- Inter-Speaker Variability: Speaker Normalisation and Quantitative Estimation of Articulatory Invariants in Speech Production for FrenchAntoine Serrurier, Pierre Badin, Louis-Jean Boë, Laurent Lamalle, Christiane Neuschaefer-Rube. 2272-2276 [doi]
- Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance ImagingNimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth S. Narayanan. 2277-2281 [doi]
- Speaker-Specific Biomechanical Model-Based Investigation of a Simple Speech Task Based on Tagged-MRIKeyi Tang, Negar M. Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney Fels. 2282-2286 [doi]
- Sounds of the Human Vocal TractReed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth S. Narayanan. 2287-2291 [doi]
- A Simulation Study on the Effect of Glottal Boundary Conditions on Vocal Tract FormantsYasufumi Uezu, Tokihiko Kaburagi. 2292-2296 [doi]
- A Robust and Alternative Approach to Zero Frequency Filtering Method for Epoch ExtractionP. Gangamohan, B. Yegnanarayana. 2297-2300 [doi]
- Improving YANGsaf F0 Estimator with Adaptive Kalman FilterKanru Hua. 2301-2305 [doi]
- A Spectro-Temporal Demodulation Technique for Pitch EstimationJitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula. 2306-2310 [doi]
- 0 of Complex Tone Based on Pitch Perception of Amplitude Modulated SignalKenichiro Miwa, Masashi Unoki. 2311-2315 [doi]
- Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution SpectraSimon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt. 2316-2320 [doi]
- Harvest: A High-Performance Fundamental Frequency Estimator from Speech SignalsMasanori Morise. 2321-2325 [doi]
- Prosodic Event Recognition Using Convolutional Neural Networks with Context InformationSabrina Stehwien, Ngoc Thang Vu. 2326-2330 [doi]
- Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized StatementsRamiro H. Gálvez, Stefan Benus, Agustín Gravano, Marián Trnka. 2331-2335 [doi]
- An Investigation of Pitch Matching Across Adjacent Turns in a Corpus of Spontaneous GermanMargaret Zellers, Antje Schweitzer. 2336-2340 [doi]
- The Relationship Between F0 Synchrony and Speech Convergence in Dyadic InteractionSankar Mukherjee, Alessandro D'Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino. 2341-2345 [doi]
- The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone CallsJordi Luque, Carlos Segura, Ariadna Sánchez, Martí Umbert, Luis Angel Galindo. 2346-2350 [doi]
- Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine SpanishPablo Brusco, Juan Manuel Pérez, Agustín Gravano. 2351-2355 [doi]
- Emotional Features for Speech Overlaps ClassificationOlga Egorow, Andreas Wendemuth. 2356-2360 [doi]
- Computing Multimodal Dyadic Behaviors During Spontaneous Diagnosis Interviews Toward Automatic Categorization of Autism Spectrum DisorderChin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee. 2361-2365 [doi]
- Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior FeaturesYun-Shao Lin, Chi-Chun Lee. 2366-2370 [doi]
- Spotting Social Signals in Conversational Speech over IP: A Deep Learning PerspectiveRaymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn W. Schuller. 2371-2375 [doi]
- Optimized Time Series Filters for Detecting Laughter and Filler EventsGábor Gosztolya. 2376-2380 [doi]
- Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement Within TED TalksFasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell 0001. 2381-2385 [doi]
- Large-Scale Domain Adaptation via Teacher-Student LearningJinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong. 2386-2390 [doi]
- Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram InversionW. Ahmad, Syed Shahnawazuddin, Hemant Kumar Kathania, Gayadhar Pradhan, A. B. Samaddar. 2391-2395 [doi]
- RNN-LDA Clustering for Feature Based DNN AdaptationXurong Xie, Xunying Liu, Tan Lee, Lan Wang. 2396-2400 [doi]
- Robust Online i-Vectors for Unsupervised Adaptation of DNN Acoustic Models: A Study in the Context of Digital Voice AssistantsHarish Arsikere, Sri Garimella. 2401-2405 [doi]
- Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic ControlAjay Srinivasamurthy, Petr Motlícek, Ivan Himawan, György Szaszák, Youssef Oualil, Hartmut Helmke. 2406-2410 [doi]
- Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech RecognitionTaesup Kim, Inchul Song, Yoshua Bengio. 2411-2415 [doi]
- An Entrained Rhythm's Frequency, Not Phase, Influences Temporal Sampling of SpeechHans Rutger Bosker, Anne Kösem. 2416-2420 [doi]
- Context Regularity Indexed by Auditory N1 and P2 Event-Related PotentialsXiao Wang, Yanhui Zhang, Gang Peng. 2421-2425 [doi]
- Discovering Language in Marmoset VocalizationSakshi Verma, K. L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema A. Murthy. 2426-2430 [doi]
- Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech PerceptionHiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001. 2431-2435 [doi]
- The Phonological Status of the French Initial Accent and its Role in Semantic Processing: An Event-Related Potentials StudyNoémie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano. 2436-2440 [doi]
- A Neuro-Experimental Evidence for the Motor Theory of Speech PerceptionBin Zhao, Jianwu Dang, Gaoyan Zhang. 2441-2445 [doi]
- Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASRPurvi Agrawal, Sriram Ganapathy. 2446-2450 [doi]
- Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech RecognitionMasato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara. 2451-2455 [doi]
- Recognizing Multi-Talker Speech with Permutation Invariant TrainingDong Yu, Xuankai Chang, Yanmin Qian. 2456-2460 [doi]
- Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral InformationYuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya, Shinji Watanabe, Jonathan Le Roux. 2461-2465 [doi]
- Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASRErfan Loweimi, Jon Barker, Thomas Hain. 2466-2470 [doi]
- Robust Speech Recognition via Anchor Word RepresentationsBrian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, Sree Hari Krishnan Parthasarathi, Björn Hoffmeister. 2471-2475 [doi]
- Towards Zero-Shot Frame Semantic Parsing for Domain ScalingAnkur Bapna, Gökhan Tür, Dilek Hakkani-Tür, Larry P. Heck. 2476-2480 [doi]
- ClockWork-RNN Based Architectures for Slot FillingDespoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis. 2481-2485 [doi]
- Investigating the Effect of ASR Tuning on Named Entity RecognitionMohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset. 2486-2490 [doi]
- Label-Dependency Coding in Simple Recurrent Networks for Spoken Language UnderstandingMarco Dinarelli, Vedran Vukotic, Christian Raymond. 2491-2495 [doi]
- Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational SpeechZhong Meng, Biing-Hwang Juang. 2496-2500 [doi]
- Topic Identification for Speech Without ASRChunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur. 2501-2505 [doi]
- An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented DialogBing Liu, Ian Lane. 2506-2510 [doi]
- Deep Reinforcement Learning of Dialogue Policies with Less Weight UpdatesHeriberto Cuayáhuitl, Seunghak Yu. 2511-2515 [doi]
- Towards End-to-End Spoken Dialogue Systems with Turn EmbeddingsAli Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi. 2516-2520 [doi]
- Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer InteractionOleg Akhtiamov, Maxim Sidorov, Alexey A. Karpov, Wolfgang Minker. 2521-2525 [doi]
- Rushing to Judgement: How do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human-Machine Dialog?Vikram Ramanarayanan, Chee Wee Leong, David Suendermann-Oeft. 2526-2530 [doi]
- Hyperarticulation of Corrections in Multilingual Dialogue SystemsIvan Kraljevski, Diane Hirschfeld. 2531-2535 [doi]
- Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionBenjamin Milde, Christoph Schmidt, Joachim Köhler. 2536-2540 [doi]
- Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection FrameworkXiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur. 2541-2545 [doi]
- Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and TextTakahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig. 2546-2550 [doi]
- Improved Subword Modeling for WFST-Based Speech RecognitionPeter Smit, Sami Virpioja, Mikko Kurimo. 2551-2555 [doi]
- Pronunciation Learning with RNN-TransducersAntoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Françoise Beaufays. 2556-2560 [doi]
- Learning Similarity Functions for Pronunciation VariationsEinat Naaman, Yossi Adi, Joseph Keshet. 2561-2565 [doi]
- Spoken Language Identification Using LSTM-Based Angular ProximityGregory Gelly, Jean-Luc Gauvain. 2566-2570 [doi]
- End-to-End Language Identification Using High-Order Utterance Representation with Bilinear PoolingMa Jin, Yan Song, Ian Vince McLoughlin, Wu Guo, Li-Rong Dai. 2571-2575 [doi]
- Dialect Recognition Based on Unsupervised Bottleneck FeaturesQian Zhang, John H. L. Hansen. 2576-2580 [doi]
- Investigating Scalability in Hierarchical Language Identification SystemSaad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li. 2581-2585 [doi]
- Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English SpeechYao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong. 2586-2590 [doi]
- QMDIS: QCRI-MIT Advanced Dialect Identification SystemSameer Khurana, Maryam Najafian, Ahmed M. Ali, Tuka Al Hanai, Yonatan Belinkov, James R. Glass. 2591-2595 [doi]
- Detection of Replay Attacks Using Single Frequency Filtering Cepstral CoefficientsK. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala. 2596-2600 [doi]
- Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech DetectionHardik B. Sailor, Madhu R. Kamble, Hemant A. Patil. 2601-2605 [doi]
- Independent Modelling of High and Low Energy Speech Frames for Spoofing DetectionGajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah. 2606-2610 [doi]
- Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed DataAchintya Kr. Sarkar, Md. Sahidullah, Zheng-Hua Tan, Tomi Kinnunen. 2611-2615 [doi]
- VoxCeleb: A Large-Scale Speaker Identification DatasetArsha Nagrani, Joon Son Chung, Andrew Zisserman. 2616-2620 [doi]
- Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition TechnologyKaren Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright. 2621-2624 [doi]
- Sequence-to-Sequence Models Can Directly Translate Foreign SpeechRon J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen. 2625-2629 [doi]
- Structured-Based Curriculum Learning for End-to-End English-Japanese Speech TranslationTakatomo Kano, Sakriani Sakti, Satoshi Nakamura 0001. 2630-2634 [doi]
- Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition ErrorsNicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico. 2635-2639 [doi]
- Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and EmphasisQuoc Truong Do, Sakriani Sakti, Satoshi Nakamura 0001. 2640-2644 [doi]
- NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language TranslationEunah Cho, Jan Niehues, Alex Waibel. 2645-2649 [doi]
- Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering EmbeddingsLukas Drude, Reinhold Haeb-Umbach. 2650-2654 [doi]
- Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech MixturesKaterina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani. 2655-2659 [doi]
- Eigenvector-Based Speech Mask Estimation Using Logistic RegressionLukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf. 2660-2664 [doi]
- Real-Time Speech Enhancement with GCC-NMFSean U. N. Wood, Jean Rouat. 2665-2669 [doi]
- Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy EnvironmentYouna Ji, Jun Byun, Young-Cheol Park. 2670-2674 [doi]
- Glottal Model Based Speech Beamforming for ad-hoc Microphone ArraysYang Zhang, Dinei Florêncio, Mark Hasegawa-Johnson. 2675-2679 [doi]
- Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior FeaturesYuanyuan Liu, Tan Lee, P. C. Ching, Thomas K. T. Law, Kathy Y. S. Lee. 2680-2684 [doi]
- Multi-Stage DNN Training for Automatic Recognition of Dysarthric SpeechEmre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik. 2685-2689 [doi]
- Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult SpeechDaniel Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera Tawil, Angela Morgan. 2690-2694 [doi]
- On Improving Acoustic Models for TORGO Dysarthric Speech DatabaseNeethu Mariam Joy, S. Umesh, Basil Abraham. 2695-2699 [doi]
- Glottal Source Features for Automatic Speech-Based Depression AssessmentOlympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke. 2700-2704 [doi]
- Speech Processing Approach for Diagnosing Dementia in an Early StageRoozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian. 2705-2709 [doi]
- Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic SignalsFadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro. 2710-2714 [doi]
- Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary FeaturesSalil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain. 2715-2719 [doi]
- Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech RecognitionMittul Singh, Youssef Oualil, Dietrich Klakow. 2720-2724 [doi]
- Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the CheapCiprian Chelba, Diamantino Caseiro, Fadi Biadsy. 2725-2729 [doi]
- Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken InteractionsManoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas D. Lyon, Shrikanth S. Narayanan. 2730-2734 [doi]
- Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech RecognitionWeiwu Zhu. 2735-2738 [doi]
- Developing On-Line Speaker Diarization SystemDimitrios Dimitriadis, Petr Fousek. 2739-2743 [doi]
- Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech ProcessingShreyas Seshadri, Ulpu Remes, Okko Räsänen. 2744-2748 [doi]
- Automatic Evaluation of Children Reading Aloud on Sentences and PseudowordsJorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão. 2749-2753 [doi]
- Off-Topic Spoken Response Detection with Word EmbeddingsSu-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini. 2754-2758 [doi]
- Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep ModelsWei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee. 2759-2763 [doi]
- Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture SlidesShoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa. 2764-2768 [doi]
- Multiview Representation Learning via Deep CCA for Silent Speech RecognitionMyung Jong Kim, Beiming Cao, Ted Mau, Jun Wang 0037. 2769-2773 [doi]
- Use of Graphemic Lexicons for Spoken Language AssessmentKate M. Knill, Mark J. F. Gales, K. Kyriakopoulos, Anton Ragni, Yu Wang. 2774-2778 [doi]
- Distilling Knowledge from an Ensemble of Models for Punctuation PredictionJiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li. 2779-2783 [doi]
- A Mostly Data-Driven Approach to Inverse Text NormalizationErnest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Plátek, Donald McAllaster, Venki Nagesha. 2784-2788 [doi]
- Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering ApproachWenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim. 2789-2793 [doi]
- Experiments in Character-Level Neural Network Models for PunctuationWilliam Gale, Sarangarajan Parthasarathy. 2794-2798 [doi]
- Multi-Channel Apollo Mission Speech Transcripts CalibrationLakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen. 2799-2803 [doi]
- Calibration Approaches for Language Detection