Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017

researchr

You are not signed in
Sign in
Sign up

Francisco Lacerda, editor, Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017. ISCA, 2017. [doi]

Conference: interspeech2017

Abstract is missing.

Description of the Munich-Passau Snore Sound Corpus (MPSSC)Christoph Janott, Anton Batliner. [doi]

Description of the Upper Respiratory Tract Infection Corpus (URTIC)Jarek Krajewski, Sebastian Schnieder, Anton Batliner. [doi]

The INTERSPEECH 2017 Computational Paralinguistics Challenge: A Summary of ResultsStefan Steidl. [doi]

Description of the Homebank Child/Adult Addressee Corpus (HB-CHAAC)Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont. [doi]

DiscussionBjörn W. Schuller, Anton Batliner. [doi]

ISCA Medal for Scientific AchievementHaizhou Li. 1 [doi]

The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack DetectionTomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee. 2-6 [doi]

Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 ChallengeRoberto Font, Juan M. Espín, María José Cano. 7-11 [doi]

Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay DetectionHemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni. 12-16 [doi]

Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and FusionWeicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li. 17-21 [doi]

Spoof Detection Using Source, Instantaneous Frequency and Cepstral FeaturesSarfaraz Jelil, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003. 22-26 [doi]

Audio Replay Attack Detection Using High-Frequency FeaturesMarcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, Jakub Galka. 27-31 [doi]

Feature Selection Based on CQCCs for Automatic Speaker Verification SpoofingXianliang Wang, Yanhong Xiao, Xuan Zhu. 32-36 [doi]

Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch SpeechEmre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David A. van Leeuwen. 37-41 [doi]

Exploiting Untranscribed Broadcast Data for Improved Code-Switching DetectionEmre Yilmaz, Henk van den Heuvel, David A. van Leeuwen. 42-46 [doi]

Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine DialogVikram Ramanarayanan, David Suendermann-Oeft. 47-51 [doi]

On Building Mixed Lingual Speech Synthesis SystemsSai Krishna Rallabandi, Alan W. Black. 52-56 [doi]

Speech Synthesis for Mixed-Language Navigation InstructionsKhyathi Chandu Raghavi, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W. Black. 57-61 [doi]

Addressing Code-Switching in French/Algerian Arabic SpeechDjegdjiga Amazouz, Martine Adda-Decker, Lori Lamel. 62-66 [doi]

Metrics for Modeling Code-Switching Across CorporaGualberto A. Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock, Almeida Jacqueline Toribio. 67-71 [doi]

Synthesising isiZulu-English Code-Switch Bigrams Using Word EmbeddingsEwald van der Westhuizen, Thomas Niesler. 72-76 [doi]

Crowdsourcing Universal Part-of-Speech Tags for Code-SwitchingVictor Soto, Julia Hirschberg. 77-81 [doi]

Audio Replay Attack Detection with Deep Learning FrameworksGalina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin. 82-86 [doi]

Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017Zhe Ji, Zhi-Yi Li, Peng Li, Maobo An, Shengxiang Gao, Dan Wu, Faru Zhao. 87-91 [doi]

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker VerificationLantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng. 92-96 [doi]

Replay Attack Detection Using DNN for Channel DiscriminationParav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland. 97-101 [doi]

ResNet and Model Fusion for Automatic Spoofing DetectionZhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu. 102-106 [doi]

SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala. 107-111 [doi]

Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck FeaturesWilliam Hartmann, Roger Hsiao, Tim Ng, Jeff Z. Ma, Francis Keith, Man-Hung Siu. 112-116 [doi]

Student-Teacher Training with Diverse Decision Tree EnsemblesJeremy H. M. Wong, Mark J. F. Gales. 117-121 [doi]

Embedding-Based Speaker Adaptive Training of Deep Neural NetworksXiaodong Cui, Vaibhava Goel, George Saon. 122-126 [doi]

Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge TransferJeff Z. Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball. 127-131 [doi]

English Conversational Telephone Speech Recognition by Humans and MachinesGeorge Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall. 132-136 [doi]

Comparing Human and Machine Errors in Conversational Speech TranscriptionAndreas Stolcke, Jasha Droppo. 137-141 [doi]

Multimodal Markers of Persuasive Speech: Designing a Virtual Debate CoachVolha Petukhova, Manoj Raju, Harry Bunt. 142-146 [doi]

Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum DisorderDaniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth S. Narayanan, Ruth B. Grossman. 147-151 [doi]

A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional BehaviorsAlec Burmania, Carlos Busso. 152-156 [doi]

An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous SpeechGaurav Fotedar, Prasanta Kumar Ghosh. 157-161 [doi]

Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression TechniquesD.-Y. Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li. 162-165 [doi]

Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation StrategiesMarion Dohen, Benjamin Roustan. 166-170 [doi]

Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation ProcessingPeter Guzewich, Stephen A. Zahorian. 171-175 [doi]

Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System DistancePhilipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt. 176-180 [doi]

A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) SystemsJan Franzen, Tim Fingscheidt. 181-185 [doi]

Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant RecipientsDongmei Wang, John H. L. Hansen. 186-190 [doi]

Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares ClassifierDavid Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera. 191-195 [doi]

Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear ImplantTsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee. 196-200 [doi]

Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI StudyZainab Hermes, Marissa S. Barlaz, Ryan Shosted, Zhi-Pei Liang, Bradley P. Sutton. 201-205 [doi]

Glottal Opening and Strategies of Production of FricativesBenjamin Elie, Yves Laprie. 206-209 [doi]

Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan ArabicMohamed Yassine Frej, Christopher Carignan, Catherine T. Best. 210-214 [doi]

How are Four-Level Length Distinctions Produced? Evidence from Moroccan ArabicGiuseppina Turco, Karim Shoul, Rachid Ridouane. 215-218 [doi]

Vowels in the Barunga Variety of North Australian KriolCaroline Jones, Katherine Demuth, Weicong Li, Andre Almeida. 219-223 [doi]

Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel HarmonyIndranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah. 224-228 [doi]

The Influence of Synthetic Voice on the Evaluation of a Virtual CharacterJoão Paulo Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell. 229-233 [doi]

Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural NetworkAmelia J. Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda. 234-238 [doi]

An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion SynthesisSébastien Le Maguer, Ingmar Steiner, Alexander Hewer. 239-243 [doi]

VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory ModelRachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth S. Narayanan. 244-248 [doi]

Beyond the Listening Test: An Interactive Approach to TTS EvaluationJoseph Mendelson, Matthew P. Aylett. 249-253 [doi]

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech SynthesisBeiming Cao, Myung Jong Kim, Jan P. H. van Santen, Ted Mau, Jun Wang 0037. 254-258 [doi]

Approaches for Neural-Network Language Model AdaptationMin Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar. 259-263 [doi]

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language ModelsYoussef Oualil, Dietrich Klakow. 264-268 [doi]

Investigating Bidirectional Recurrent Neural Network Language Models for Speech RecognitionX. Chen, Anton Ragni, X. Liu, Mark J. F. Gales. 269-273 [doi]

Fast Neural Network Language Model Lookups at N-Gram SpeedsYinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran. 274-278 [doi]

Empirical Exploration of Novel Architectures and Objectives for Language ModelsGakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon. 279-283 [doi]

Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward NetworksKarel Benes, Murali Karthick Baskar, Lukás Burget. 284-288 [doi]

Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice AnalysisAmir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen. 289-293 [doi]

Automatic Paraphasia Detection from Aphasic Speech: A Preliminary StudyDuc Le, Keli Licata, Emily Mower Provost. 294-298 [doi]

Evaluation of the Neurological State of People with Parkinson's Disease Using i-VectorsNicanor Garcia, Juan Rafael Orozco-Arroyave, Luis Fernando D'Haro, Najim Dehak, Elmar Nöth. 299-303 [doi]

Objective Severity Assessment from Disordered Voice Using Estimated Glottal AirflowYu-Ren Chien, Michal Borský, Jón Guðnason. 304-308 [doi]

Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based ApproachFlorian B. Pokorny, Björn W. Schuller, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter. 309-313 [doi]

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's DiseaseJuan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth. 314-318 [doi]

Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNsLinxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber, Stephen M. Houghton. 319-323 [doi]

An Investigation of Crowd Speech for Room Occupancy EstimationSiyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le. 324-328 [doi]

Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech SignalsKarthika Vijayan, Jitendra Kumar Dhiman, Chandra Sekhar Seelamantula. 329-333 [doi]

Musical Speech: A New Methodology for Transcribing Speech ProsodyAlexsandro R. Meireles, Antônio R. M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros. 334-338 [doi]

Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech TrainingK. S. Nataraj, Prem C. Pandey, Hirak Dasgupta. 339-343 [doi]

Estimation of the Probability Distribution of Spectral Fine Structure in the Speech SourceTom Bäckström. 344-348 [doi]

End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-FricativesSucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini. 349-353 [doi]

Dialect Perception by Older ChildrenEwa Jacewicz, Robert Allen Fox. 354-358 [doi]

Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than StopsKiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima. 359-363 [doi]

L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and ComprehensibilityLieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts. 364-368 [doi]

Effects of Pitch Fall and L1 on Vowel Length Identification in L2 JapaneseIzumi Takiguchi. 369-373 [doi]

A Preliminary Study of Prosodic Disambiguation by Chinese EFL LearnersYuanyuan Zhang, Hongwei Ding. 374-378 [doi]

Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google HomeChanwoo Kim, Ananya Misra, Kean K. Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani. 379-383 [doi]

Neural Network-Based Spectrum Estimation for Online WPE DereverberationKeisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani. 384-388 [doi]

Factorial Modeling for Effective Suppression of Directional NoiseOsamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie. 389-393 [doi]

On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array MicrophonesYanhui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee. 394-398 [doi]

Acoustic Modeling for Google HomeBo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon. 399-403 [doi]

On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech RecognitionSeyedmahdad Mirsamadi, John H. L. Hansen. 404-408 [doi]

Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis SystemMasanori Morise, Genta Miyashita, Kenji Ozawa. 409-413 [doi]

Robust Source-Filter Separation of Speech Signal in the Phase DomainErfan Loweimi, Jon Barker, Oscar Saz Torralba, Thomas Hain. 414-418 [doi]

0 ChangesSimon Stone, Peter Steiner, Peter Birkholz. 419-423 [doi]

o EstimationHideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda. 424-428 [doi]

Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied EnvironmentsAvinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan. 429-433 [doi]

Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech SynthesisMohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh. 434-438 [doi]

Wavelet Speech Enhancement Based on Robust Principal Component AnalysisChia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao. 439-443 [doi]

Vowel Onset Point Detection Using Sonority InformationBidisha Sharma, S. R. Mahadeva Prasanna. 444-448 [doi]

Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual StudiesUnto K. Laine. 449-453 [doi]

Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural NetworksChristian Kroos, Mark D. Plumbley. 454-458 [doi]

Multilingual i-Vector Based Statistical Modeling for Music Genre ClassificationJia Dai, Wei Xue, Wenju Liu. 459-463 [doi]

Indoor/Outdoor Audio Classification Using Foreground Speech SegmentationBanriskhem K. Khonglah, K. T. Deepak, S. R. Mahadeva Prasanna. 464-468 [doi]

Attention Based CLDNNs for Short-Duration Acoustic Scene ClassificationJinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan. 469-473 [doi]

Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event DetectionXianjun Xia, Roberto Togneri, Ferdous Ahmed Sohel, David Huang. 474-478 [doi]

Enhanced Feature Extraction for Speech Detection in Media AudioInseon Jang, Chunghyun Ahn, Jeongil Seo, Younseon Jang. 479-483 [doi]

Audio Classification Using Class-Specific Learned DescriptorsSukanya Sonowal, Tushar Sandhan, In Kyu Choi, Nam Soo Kim. 484-487 [doi]

Hidden Markov Model Variational Autoencoder for Acoustic Unit DiscoveryJanek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj. 488-492 [doi]

Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural NetworksMatthias Zöhrer, Franz Pernkopf. 493-497 [doi]

Montreal Forced Aligner: Trainable Text-Speech Alignment Using KaldiMichael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner 0019, Morgan Sonderegger. 498-502 [doi]

A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the 'Color' of Whispered Phonemes and Deep Neural NetworkG. Nisha Meenakshi, Prasanta Kumar Ghosh. 503-507 [doi]

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech RecognitionIan Williams, Petar S. Aleksic. 508-512 [doi]

Comparison of Decoding Strategies for CTC Acoustic ModelsThomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel. 513-517 [doi]

Phone Duration Modeling for LVCSR Using Neural NetworksHossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur. 518-522 [doi]

Towards Better Decoding and Language Model Integration in Sequence to Sequence ModelsJan Chorowski, Navdeep Jaitly. 523-527 [doi]

Empirical Evaluation of Parallel Training Algorithms on Acoustic ModelingWenpeng Li, Binbin Zhang, Lei Xie, Dong Yu. 528-532 [doi]

Binary Deep Neural Networks for Speech RecognitionXu Xiang, Yanmin Qian, Kai Yu 0004. 533-537 [doi]

Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter OptimizationAkshay Chandrashekaran, Ian R. Lane. 538-542 [doi]

Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech RecognitionShohei Toyama, Daisuke Saito, Nobuaki Minematsu. 543-547 [doi]

Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural NetworksVardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas C. Raykar, Lili Kotlerman, Guy Lev. 548-552 [doi]

Estimation of Gap Between Current Language Models and Human PerformanceXiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow. 553-557 [doi]

A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation RecoveryAnna Moró, György Szaszák. 558-562 [doi]

Factors Affecting the Intelligibility of Low-Pass Filtered SpeechLei Wang, Fei Chen. 563-566 [doi]

Phonetic Restoration of Temporally Reversed SpeechShi Yu Wang, Fei Chen. 567-570 [doi]

Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed "Fast" SpeechMako Ishida. 571-575 [doi]

Lexically Guided Perceptual Learning in Mandarin ChineseL. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler. 576-580 [doi]

The Effect of Spectral Profile on the Intelligibility of Emotional Speech in NoiseChris Davis, Chee Seng Chong, Jeesun Kim. 581-585 [doi]

Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is TalkingMerel Maslowski, Antje S. Meyer, Hans Rutger Bosker. 586-590 [doi]

Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional SpeechDaniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto. 591-595 [doi]

Predicting Epenthetic Vowel Quality from AcousticsAdriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux. 596-600 [doi]

The Effect of Spectral Tilt on Size Discrimination of Voiced Speech SoundsToshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson. 601-605 [doi]

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a CarJaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi. 606-610 [doi]

The Relative Cueing Power of F0 and Duration in German Prominence PerceptionOliver Niebuhr, Jana Winkler. 611-615 [doi]

Perception and Acoustics of Vowel Nasality in Brazilian PortugueseLuciana Marques, Rebecca Scarborough. 616-620 [doi]

Sociophonetic Realizations Guide Subsequent Lexical AccessJonny Kim, Katie Drager. 621-625 [doi]

Critical Articulators Identification from RT-MRI of the Vocal TractSamuel Silva, António Teixeira. 626-630 [doi]

Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance ImagesKrishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan. 631-635 [doi]

Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance PriorsSasan Asadiabadi, Engin Erzin. 636-640 [doi]

An Objective Critical Distance Measure Based on the Relative Level of Spectral ValleyT. V. Ananthapadmanabha, A. G. Ramakrishnan, Shubham Sharma. 641-644 [doi]

Database of Volumetric and Real-Time Vocal Tract MRI for Speech ScienceTanner Sorensen, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam C. Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan. 645-649 [doi]

The Influence on Realization and Perception of Lexical Tones from Affricate's AspirationChong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang. 650-654 [doi]

Audiovisual Recalibration of Vowel CategoriesMatthias K. Franken, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen. 655-658 [doi]

The Effect of Gesture on Persuasive SpeechJudith Peters, Marieke Hoetjes. 659-663 [doi]

Auditory-Visual Integration of Talker Gender in Cantonese Tone PerceptionWei Lai. 664-668 [doi]

Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech PerceptionTakayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco. 669-673 [doi]

When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information MismatchLena F. Renner, Marcin Wlodarczak. 674-678 [doi]

Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment CorrelationsWin Thuzar Kyaw, Yoshinori Sagisaka. 679-683 [doi]

Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard SpeechDaryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain. 684-688 [doi]

Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future DirectionsAndrea Bandini, Aravind Namasivayam, Yana Yunusova. 689-693 [doi]

Accurate Synchronization of Speech and EGG Signal Using Phase InformationS. B. Sunil Kumar, K. Sreenivasa Rao, Tanumay Mandal. 694-698 [doi]

The Acquisition of Focal Lengthening in Stockholm SwedishAnna Sara H. Romøren, Aoju Chen. 699-703 [doi]

Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech RecognitionShiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu. 704-708 [doi]

CTC Training of Multi-Phone Acoustic Models for Speech RecognitionOlivier Siohan. 709-713 [doi]

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and AdaptationSibo Tong, Philip N. Garner, Hervé Bourlard. 714-718 [doi]

2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based AdaptationMartin Karafiát, Murali Karthick Baskar, Pavel Matejka, Karel Veselý, Frantisek Grézl, Lukás Burget, Jan Cernocký. 719-723 [doi]

Optimizing DNN Adaptation for Recognition of Enhanced SpeechMarco Matassoni, Alessio Brutti, Daniele Falavigna. 724-728 [doi]

Deep Least Squares Regression for Speaker AdaptationYounggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim. 729-733 [doi]

Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech RecognitionVan Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson. 734-738 [doi]

Generalized Distillation Framework for Speaker NormalizationNeethu Mariam Joy, Sandeep Reddy Kothinti, S. Umesh, Basil Abraham. 739-743 [doi]

Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic ModelsLahiru Samarakoon, Brian Mak, Khe Chai Sim. 744-748 [doi]

Factorised Representations for Neural Network Adaptation to Diverse Acoustic EnvironmentsJoachim Fainberg, Steve Renals, Peter Bell 0001. 749-753 [doi]

An RNN Model of Text NormalizationRichard Sproat, Navdeep Jaitly. 754-758 [doi]

Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy LabelsAsaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran. 759-763 [doi]

Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech SynthesisYusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami. 764-768 [doi]

Global Syllable Vectors for Building TTS Front-End with Deep LearningJinfu Ni, Yoshinori Shiga, Hisashi Kawai. 769-773 [doi]

Prosody Control of Utterance Sequence for Information DeliveringIshin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi. 774-778 [doi]

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output LayerYuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai. 779-783 [doi]

Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration PredictionYibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu. 784-788 [doi]

Discrete Duration Model for Speech SynthesisBo Chen, Tianling Bian, Kai Yu 0004. 789-793 [doi]

Comparison of Modeling Target in LSTM-RNN Duration ModelBo Chen, Jiahao Lai, Kai Yu. 794-798 [doi]

Learning Word Vector Representations Based on Acoustic CountsManuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi. 799-803 [doi]

Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation DisfluenciesÉva Székely, Joseph Mendelson, Joakim Gustafson. 804-808 [doi]

Prosograph: A Tool for Prosody Visualisation of Large Speech CorporaAlp Öktem, Mireia Farrús, Leo Wanner. 809-810 [doi]

ChunkitApp: Investigating the Relevant Units of Online Speech ProcessingSvetlana Vetchinnikova, Anna Mauranen, Nina Mikusová. 811-812 [doi]

Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision ControlMarkus Jochim. 813-814 [doi]

HomeBank: A Repository for Long-Form Real-World Audio Recordings of ChildrenAnne S. Warlaumont, Mark VanDam, Elika Bergelson, Alejandrina Cristià. 815-816 [doi]

A System for Real Time Collaborative Transcription CorrectionPeter Bell 0001, Joachim Fainberg, Catherine Lai, Mark Sinclair. 817-818 [doi]

MoPAReST - Mobile Phone Assisted Remote Speech Therapy PlatformChitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu. 819-820 [doi]

An Apparatus to Investigate Western Opera Singing Skill Learning Using Performance and Result Biofeedback, and Measuring its Neural CorrelatesAurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte. 821-822 [doi]

PercyConfigurator - Perception Experiments as a ServiceChristoph Draxler. 823-824 [doi]

System for Speech Transcription and Post-Editing in Microsoft WordAskars Salimbajevs, Indra Ikauniece. 825-826 [doi]

Emojive! Collecting Emotion Data from Speech and Facial Expression Using Mobile Game AppJi-Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung. 827-828 [doi]

Mylly - The Mill: A New Platform for Processing Speech and Text Corpora Easily and EfficientlyMietta Lennes, Jussi Piitulainen, Martin Matthiesen. 829-830 [doi]

Visual Learning 2: Pronunciation App Using Ultrasound, Video, and MRIKyori Suzuki, Ian Wilson, Hayato Watanabe. 831-832 [doi]

Dialogue as Collaborative Problem SolvingJames Allen. 833 [doi]

Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word AffectBrian Stasak, Julien Epps, Roland Goecke. 834-838 [doi]

Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot InteractionJosé Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahu, Richard M. Stern, Néstor Becerra Yoma. 839-843 [doi]

Analysis of Engagement and User Experience with a Laughter Responsive Social RobotBekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, T. Metin Sezgin. 844-848 [doi]

Automatic Classification of Autistic Child Vocalisations: A Novel Database and ResultsAlice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, Björn W. Schuller. 849-853 [doi]

Crowd-Sourced Design of Artificial Attentive ListenersCatharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson. 854-858 [doi]

Studying the Link Between Inter-Speaker Coordination and Speech Imitation Through Human-Machine InteractionsLeonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot 0001. 859-863 [doi]

Adjusting the Frame: Biphasic Performative Control of Speech RhythmSamuel Delalez, Christophe d'Alessandro. 864-868 [doi]

Attentional Factors in Listeners' Uptake of Gesture Cues During Speech ProcessingRaheleh Saryazdi, Craig G. Chambers. 869-873 [doi]

Motion Analysis in Vocalized Surprise ExpressionsCarlos Toshinori Ishi, Takashi Minato, Hiroshi Ishiguro. 874-878 [doi]

Enhancing Backchannel Prediction Using Word EmbeddingsRobin Ruede, Markus Müller 0001, Sebastian Stüker, Alex Waibel. 879-883 [doi]

A Computational Model for Phonetically Responsive Spoken Dialogue SystemsEran Raveh, Ingmar Steiner, Bernd Möbius. 884-888 [doi]

Incremental Dialogue Act Recognition: Token- vs Chunk-Based ClassificationEustace Ebhotemhen, Volha Petukhova, Dietrich Klakow. 889-893 [doi]

Clear Speech - Mere Speech? How Segmental and Prosodic Speech Reduction Shape the Impression That Speakers Create on ListenersOliver Niebuhr. 894-898 [doi]

Relationships Between Speech Timing and Perceived Hostility in a French Corpus of Political DebatesCharlotte Kouklia, Nicolas Audibert. 899-903 [doi]

Towards Speaker Characterization: Identifying and Predicting Dimensions of Person AttributionLaura Fernández Gallardo, Benjamin Weiss 0001. 904-908 [doi]

Prosodic Analysis of Attention-Drawing SpeechCarlos Toshinori Ishi, Jun Arai, Norihiro Hagita. 909-913 [doi]

Perceptual and Acoustic CorreLates of Gender in the Prepubertal VoiceAdrian P. Simpson, Riccarda Funk, Frederik Palmer. 914-918 [doi]

To See or not to See: Interlocutor Visibility and Likeability Influence Convergence in IntonationKatrin Schweitzer, Michael Walsh 0001, Antje Schweitzer. 919-923 [doi]

Acoustic Correlates of Parental Role and Gender Identity in the Speech of Expecting ParentsMelanie Weirich, Adrian P. Simpson. 924-928 [doi]

A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced DomainsRubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrão, Anna Pompili, Ramón Fernández Astudillo, Joana Campos 0001, Ana Paiva, Isabel Trancoso. 929-933 [doi]

Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic CaptionsRachael Tatman, Conner Kasten. 934-938 [doi]

A Comparison of Sequence-to-Sequence Models for Speech RecognitionRohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly. 939-943 [doi]

CTC in the Context of Generalized Full-Sum HMM TrainingAlbert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney. 944-948 [doi]

Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LMTakaaki Hori, Shinji Watanabe, Yu Zhang, William Chan. 949-953 [doi]

Multitask Learning with CTC and Segmental CRF for Speech RecognitionLiang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith. 954-958 [doi]

Direct Acoustics-to-Word Models for English Conversational Speech RecognitionKartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo. 959-963 [doi]

Reducing the Computational Complexity of Two-Dimensional LSTMsBo Li, Tara N. Sainath. 964-968 [doi]

Functional Principal Component Analysis of Vocal Tract Area FunctionsJorge C. Lucero. 969-973 [doi]

Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and LanguagesGanesh Sivaraman, Carol Y. Espy-Wilson, Martijn Wieling. 974-978 [doi]

Integrated Mechanical Model of [r]-[l] and [b]-[m]-[w] Producing Consonant Cluster [br]Takayuki Arai. 979-983 [doi]

A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic InversionLeonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil. 984-988 [doi]

Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation AnalysisHidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu. 989-993 [doi]

Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance ImagingTanner Sorensen, Asterios Toutios, Johannes Töger, Louis Goldstein, Shrikanth S. Narayanan. 994-998 [doi]

Deep Neural Network Embeddings for Text-Independent Speaker VerificationDavid Snyder, Daniel Garcia-Romero, Daniel Povey, Sanjeev Khudanpur. 999-1003 [doi]

Tied Variational Autoencoder Backends for i-Vector Speaker RecognitionJesús Villalba, Niko Brümmer, Najim Dehak. 1004-1008 [doi]

Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck FeaturesShivesh Ranjan, John H. L. Hansen. 1009-1013 [doi]

Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel InformationSuwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko. 1014-1018 [doi]

Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker VerificationAbbas Khosravani, Mohammad Mehdi Homayounpour. 1019-1023 [doi]

DNN Bottleneck Features for Speaker ClusteringJesús Jorrín, Paola García, Luis Buera. 1024-1028 [doi]

Creak as a Feature of Lexical Stress in EstonianKätlin Aare, Pärtel Lippus, Juraj Simko. 1029-1033 [doi]

Cross-Speaker Variation in Voice Source Correlates of Focus and DeaccentuationIrena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl. 1034-1038 [doi]

Acoustic Characterization of Word-Final Glottal Stops in Mizo and Assam SoraSishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S. R. Mahadeva Prasanna, Samarendra Dandapat. 1039-1043 [doi]

Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse FilteringParham Mokhtari, Hiroshi Ando. 1044-1048 [doi]

Automatic Measurement of Pre-AspirationYaniv Sheena, Mísa Hejná, Yossi Adi, Joseph Keshet. 1049-1053 [doi]

Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati SpeakersKiranpreet Nara. 1054-1058 [doi]

An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech SynthesisXin Wang, Shinji Takaki, Junichi Yamagishi. 1059-1063 [doi]

Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure InformationViacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman. 1064-1068 [doi]

0 Prediction for Electrolaryngeal Speech EnhancementKou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura 0001. 1069-1073 [doi]

0 Contours for Statistical Phrase/Accent Command EstimationNobukatsu Hojo, Yasuhito Ohsugi, Yusuke Ijima, Hirokazu Kameoka. 1074-1078 [doi]

Controlling Prominence Realisation in Parametric DNN-Based Speech SynthesisZofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson. 1079-1083 [doi]

Increasing Recall of Lengthening Detection via Semi-Automatic ClassificationSimon Betz, Jana Voße, Sina Zarrieß, Petra Wagner. 1084-1088 [doi]

Efficient Emotion Recognition from Speech Using Deep Learning on SpectrogramsAharon Satt, Shai Rozenberg, Ron Hoory. 1089-1093 [doi]

Interaction and Transition Model for Speech Emotion Recognition in DialogueRuo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono. 1094-1097 [doi]

Progressive Neural Networks for Transfer Learning in Emotion RecognitionJohn Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost. 1098-1102 [doi]

Jointly Predicting Arousal, Valence and Dominance with Multi-Task LearningSrinivas Parthasarathy, Carlos Busso. 1103-1107 [doi]

Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural NetworkDuc Le, Zakaria Aldeneh, Emily Mower Provost. 1108-1112 [doi]

Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task LearningJaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers. 1113-1117 [doi]

Speaker-Dependent WaveNet VocoderAkira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda. 1118-1122 [doi]

Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth ExtensionYu Gu, Zhen-Hua Ling. 1123-1127 [doi]

Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech SynthesisShinji Takaki, Hirokazu Kameoka, Junichi Yamagishi. 1128-1132 [doi]

A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech SynthesisSrikanth Ronanki, Oliver Watts, Simon King. 1133-1137 [doi]

Statistical Voice Conversion with WaveNet-Based Waveform GenerationKazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda. 1138-1142 [doi]

Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based AutoencodersVincent Wan, Yannis Agiomyrgiannakis, Hanna Silén, Jakub Vít. 1143-1147 [doi]

A Comparison of Sentence-Level Speech Intelligibility MetricsAlexander Kain, Max Del Giudice, Kris Tjaden. 1148-1152 [doi]

An Auditory Model of Speaker Size Perception for Voiced Speech SoundsToshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson. 1153-1157 [doi]

The Recognition of Compounds: A Computational AccountLouis ten Bosch, Lou Boves, Mirjam Ernestus. 1158-1162 [doi]

Humans do not Maximize the Probability of Correct Decision When Recognizing DANTALE Words in NoiseMohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen 0001. 1163-1167 [doi]

Single-Ended Prediction of Listening Effort Based on Automatic Speech RecognitionRainer Huber, Constantin Spille, Bernd T. Meyer. 1168-1172 [doi]

Modeling Categorical Perception with the Receptive Fields of Auditory NeuronsChris Neufeld. 1173-1177 [doi]

A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech SeparationYannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee. 1178-1182 [doi]

Deep Clustering-Based Beamforming for Separation with Unknown Number of SourcesTakuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolíková, Tomohiro Nakatani. 1183-1187 [doi]

Time-Frequency Masking for Blind Source Separation with Preserved Spatial CuesShadi Pirhosseinloo, Kostas Kokkinakis. 1188-1192 [doi]

Variational Recurrent Neural Networks for Speech SeparationJen-Tzung Chien, Kuan-Ting Kuo. 1193-1197 [doi]

Detecting Overlapped Speech on Short Timeframes Using Deep LearningValentin Andrei, Horia Cucu, Corneliu Burileanu. 1198-1202 [doi]

Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant ConditionsXu Li, Junfeng Li, Yonghong Yan 0002. 1203-1207 [doi]

The Vocative Chant and Beyond: German Calling Melodies Under Routine and Urgent ContextsSergio I. Quiroz, Marzena Zygis. 1208-1212 [doi]

Comparing Languages Using Hierarchical Prosodic AnalysisJuraj Simko, Antti Suni, Katri Hiovain, Martti Vainio. 1213-1217 [doi]

Intonation Facilitates Prediction of Focus Even in the Presence of Lexical TonesMartin Ho Kwan Ip, Anne Cutler. 1218-1222 [doi]

Mind the Peak: When Museum is Temporarily Understood as Musical in Australian EnglishKatharina Zahner, Heather Kember, Bettina Braun. 1223-1227 [doi]

Pashto Intonation PatternsLuca Rognoni, Judith Bishop, Miriam Corris. 1228-1232 [doi]

A New Model of Final Lowering in Spontaneous MonologueKikuo Maekawa. 1233-1237 [doi]

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion SpaceXi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai. 1238-1242 [doi]

Adversarial Auto-Encoders for Speech Based Emotion RecognitionSaurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael AbdAlmageed, Carol Espy Wilson. 1243-1247 [doi]

An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture RegressionTing Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah. 1248-1252 [doi]

Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion RecognitionSoheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost. 1253-1257 [doi]

Voice-to-Affect Mapping: Inferences on Language Voice Baseline SettingsAilbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl. 1258-1262 [doi]

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted SpeechMichael Neumann, Ngoc Thang Vu. 1263-1267 [doi]

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior ProbabilitiesHiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari. 1268-1272 [doi]

Learning Latent Representations for Speech Generation and TransformationWei-Ning Hsu, Yu Zhang, James R. Glass. 1273-1277 [doi]

Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech CorpusTetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu. 1278-1282 [doi]

Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial NetworksTakuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino. 1283-1287 [doi]

A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice TransformationLuc Ardaillon, Axel Roebel. 1288-1292 [doi]

Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and ConversionSeyed Hamidreza Mohammadi, Alexander Kain. 1293-1297 [doi]

Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence MappingHasim Sak, Matt Shannon, Kanishka Rao, Françoise Beaufays. 1298-1302 [doi]

Highway-LSTM and Recurrent Highway Networks for Speech RecognitionGolan Pundak, Tara N. Sainath. 1303-1307 [doi]

Improving Speech Recognition by Revising Gated Recurrent UnitsMirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio. 1308-1312 [doi]

Stochastic Recurrent Neural Network for Speech RecognitionJen-Tzung Chien, Chen Shen. 1313-1317 [doi]

Frame and Segment Level Recurrent Neural Networks for Phone ClassificationMartin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf. 1318-1322 [doi]

Deep Learning-Based Telephony Speech Recognition in the WildKyu J. Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian R. Lane. 1323-1327 [doi]

The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, et al.. 1328-1332 [doi]

The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation SystemPedro A. Torres-Carrasquillo, Fred Richardson, Shahan Nercessian, Douglas E. Sturim, William M. Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Sri Harish Reddy Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Réda Dehak. 1333-1337 [doi]

Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation SystemDaniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface. 1338-1342 [doi]

UTD-CRSS Systems for 2016 NIST Speaker Recognition EvaluationChunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen. 1343-1347 [doi]

Analysis and Description of ABC Submission to NIST SRE 2016Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez, Johan Rohdin, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya. 1348-1352 [doi]

The 2016 NIST Speaker Recognition EvaluationSeyed Omid Sadjadi, Timothée Kheyrkhah, Audrey Tong, Craig S. Greenberg, Douglas A. Reynolds, Elliot Singer, Lisa P. Mason, Jaime Hernandez-Cordero. 1353-1357 [doi]

A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing SynthesisHideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino. 1358-1362 [doi]

Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMsAna Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku. 1363-1367 [doi]

Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech SystemLauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku. 1368-1372 [doi]

Semi Parametric Concatenative TTS with Instant Voice Modification CapabilitiesAlexander Sorin, Slava Shechtman, Asaf Rendel. 1373-1377 [doi]

Modeling Laryngeal Muscle Activation Noise for Low-Order Physiological Based Speech SynthesisRodrigo Manríquez, Sean D. Peterson, Pavel Prado, Patricio Orio, Matías Zañartu. 1378-1382 [doi]

Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech SynthesisFelipe Espic, Cassia Valentini-Botinhao, Simon King. 1383-1387 [doi]

Similar Prosodic Structure Perceived Differently in German and EnglishHeather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler. 1388-1392 [doi]

Disambiguate or not? - The Role of Prosody in Unambiguous and Potentially Ambiguous Anaphora Production in Strictly Mandarin Parallel StructuresLuying Hou, Bert Le Bruyn, René Kager. 1393-1397 [doi]

Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian PortugueseAngeliki Athanasopoulou, Irene Vogel, Hossep Dolatian. 1398-1402 [doi]

Phonological Complexity, Segment Rate and Speech Tempo PerceptionLeendert Plug, Rachel Smith. 1403-1406 [doi]

On the Duration of Mandarin TonesJing Yang, Yu Zhang, Aijun Li, Li Xu. 1407-1411 [doi]

The Formant Dynamics of Long Close Vowels in Three Varieties of SwedishOtto Ewald, Eva Liina Asu, Susanne Schötz. 1412-1416 [doi]

Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's SpeechYao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland. 1417-1421 [doi]

Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTWJunwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu. 1422-1426 [doi]

Off-Topic Spoken Response Detection Using Siamese Convolutional Neural NetworksChong Min Lee, Su-Youn Yoon, Xihao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini. 1427-1431 [doi]

Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active LearningVipul Arora, Aditi Lahiri, Henning Reetz. 1432-1436 [doi]

Detection of Mispronunciations and Disfluencies in Children Reading AloudJorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão. 1437-1441 [doi]

Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label SequencesDavid Escudero Mancebo, César González Ferreras, Lourdes Aguilar, Eva Estebas-Vilaplana. 1442-1446 [doi]

Inferring Stance from ProsodyNigel G. Ward, Jason C. Carlson, Olac Fuentes, Diego Castán, Elizabeth Shriberg, Andreas Tsiartas. 1447-1451 [doi]

Exploring Dynamic Measures of Stance in Spoken InteractionGina-Anne Levow, Richard A. Wright. 1452-1456 [doi]

Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random FieldsValentin Barrière, Chloé Clavel, Slim Essid. 1457-1461 [doi]

Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception PredictionQinyi Luo, Rahul Gupta, Shrikanth S. Narayanan. 1462-1466 [doi]

The Sound of Deception - What Makes a Speaker Credible?Anne Schröder, Simon Stone, Peter Birkholz. 1467-1471 [doi]

Hybrid Acoustic-Lexical Deep Learning Approach for Deception DetectionGideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg. 1472-1476 [doi]

A Generative Model for Score Normalization in Speaker RecognitionAlbert Swart, Niko Brümmer. 1477-1481 [doi]

Content Normalization for Text-Dependent Speaker VerificationSubhadeep Dey, Srikanth R. Madikeri, Petr Motlícek, Marc Ferras. 1482-1486 [doi]

End-to-End Text-Independent Speaker Verification with Triplet Loss on Short UtterancesChunlei Zhang, Kazuhito Koishida. 1487-1491 [doi]

Adversarial Network Bottleneck Features for Noise Robust Speaker VerificationHong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo. 1492-1496 [doi]

What Does the Speaker Embedding Encode?Shuai Wang, Yanmin Qian, Kai Yu 0004. 1497-1501 [doi]

Incorporating Local Acoustic Variability Information into Short Duration Speaker VerificationJianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee. 1502-1506 [doi]

DNN i-Vector Speaker Verification with Short, Text-Constrained Test UtterancesJinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng. 1507-1511 [doi]

Time-Varying Autoregressions for Speaker Verification in Reverberant ConditionsVille Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen. 1512-1516 [doi]

Deep Speaker Embeddings for Short-Duration Speaker VerificationGautam Bhattacharya, Md. Jahangir Alam, Patrick Kenny. 1517-1521 [doi]

Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification SystemsSoo-Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan. 1522-1526 [doi]

Gain Compensation for Fast i-Vector Extraction Over Short DurationKong-Aik Lee, Haizhou Li. 1527-1531 [doi]

Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker VerificationHee-Soo Heo, Jee-weon Jung, Il-Ho Yang, Sung Hyun Yoon, Ha-Jin Yu. 1532-1536 [doi]

Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least SquaresChen Chen, Jiqing Han, Yilin Pan. 1537-1541 [doi]

Deep Speaker Feature Learning for Text-Independent Speaker VerificationLantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang. 1542-1546 [doi]

Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker VerificationPierre-Michel Bousquet, Mickael Rouvier. 1547-1551 [doi]

Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker RecognitionAlan McCree, Gregory Sell, Daniel Garcia-Romero. 1552-1556 [doi]

Improving the Effectiveness of Speaker Verification Domain Adaptation with Inadequate In-Domain DataBengt J. Borgström, Elliot Singer, Douglas A. Reynolds, Seyed Omid Sadjadi. 1557-1561 [doi]

i-Vector DNN Scoring and Calibration for Noise Robust Speaker VerificationZhili Tan, Man-Wai Mak. 1562-1566 [doi]

Analysis of Score Normalization in Multilingual Speaker RecognitionPavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget, Mireia Díez Sánchez, Jan Cernocký. 1567-1571 [doi]

Alternative Approaches to Neural Network Based Speaker VerificationAnna Silnova, Lukás Burget, Jan Cernocký. 1572-1575 [doi]

A Distribution Free Formulation of the Total Variability ModelRuchir Travadi, Shrikanth S. Narayanan. 1576-1580 [doi]

Domain Mismatch Modeling of Out-Domain i-Vectors for PLDA Speaker VerificationMd. Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan. 1581-1585 [doi]

An Exploration of Dropout with LSTMsGaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, YongHong Yan. 1586-1590 [doi]

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech RecognitionJaeyoung Kim, Mostafa El-Khamy, Jungwon Lee. 1591-1595 [doi]

Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic ModelingDung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani. 1596-1600 [doi]

Forward-Backward Convolutional LSTM for Acoustic ModelingShigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani. 1601-1605 [doi]

Convolutional Recurrent Neural Networks for Small-Footprint Keyword SpottingSercan Ömer Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates. 1606-1610 [doi]

Deep Activation Mixture Model for Speech RecognitionChunyang Wu, Mark J. F. Gales. 1611-1615 [doi]

Ensembles of Multi-Scale VGG Acoustic ModelsMichael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura 0001. 1616-1620 [doi]

Training Context-Dependent DNN Acoustic Models Using Probabilistic SamplingTamás Grósz, Gábor Gosztolya, László Tóth. 1621-1625 [doi]

A Comparative Evaluation of GMM-Free State Tying Methods for ASRTamás Grósz, Gábor Gosztolya, László Tóth. 1626-1630 [doi]

Backstitch: Counteracting Finite-Sample Bias via Negative StepsYiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur. 1631-1635 [doi]

Node Pruning Based on Entropy of Weights and Node Activity for Small-Footprint Acoustic Model Based on Deep Neural NetworksRyu Takeda, Kazuhiro Nakadai, Kazunori Komatani. 1636-1640 [doi]

End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlowEhsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani. 1641-1645 [doi]

An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix MultiplicationKhe Chai Sim, Arun Narayanan. 1646-1650 [doi]

Parallel Neural Network Features for Improved Tandem Acoustic ModelingZoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney. 1651-1655 [doi]

Acoustic Feature Learning via Deep Variational Canonical Correlation AnalysisQingming Tang, Weiran Wang, Karen Livescu. 1656-1660 [doi]

Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential NetworksRyo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka. 1661-1665 [doi]

Improving Prediction of Speech Activity Using Multi-Participant Respiratory StateMarcin Wlodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare. 1666-1670 [doi]

Turn-Taking Offsets and Dialogue ContextPeter A. Heeman, Rebecca Lunsford. 1671-1675 [doi]

Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue SystemsAngelika Maier, Julian Hough, David Schlangen. 1676-1680 [doi]

End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese SpeechYuichi Ishimoto, Takehiro Teraoka, Mika Enomoto. 1681-1685 [doi]

Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic ContentsChaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro. 1686-1690 [doi]

Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTCHirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara. 1691-1695 [doi]

Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic LevelsZahra Rahimi, Anish Kumar, Diane J. Litman, Susannah Paletz, Mingzhi Yu. 1696-1700 [doi]

Measuring Synchrony in Task-Based DialoguesJustine Reverdy, Carl Vogel. 1701-1705 [doi]

Sequence to Sequence Modeling for User Simulation in Dialog SystemsPaul A. Crook, Alex Marin. 1706-1710 [doi]

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog InteractionsVikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft. 1711-1715 [doi]

Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center CallsAtsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono. 1716-1720 [doi]

Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy LearningStefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-hao Su, Tsung-Hsien Wen, Milica Gasic, Steve J. Young. 1721-1725 [doi]

Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence PositionsShizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara. 1726-1730 [doi]

Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic InteractionsSyeda Narjis Fatima, Engin Erzin. 1731-1735 [doi]

An Automatically Aligned Corpus of Child-Directed SpeechMicha Elsner, Kiwako Ito. 1736-1740 [doi]

A Comparison of Danish Listeners' Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English SentencesOcke-Schwen Bohn, Trine Askjær-Jørgensen. 1741-1744 [doi]

On the Role of Temporal Variability in the Acquisition of the German Vowel Length ContrastFelicitas Kleber. 1745-1749 [doi]

A Data-Driven Approach for Perceptually Validated Acoustic Features for Children's Sibilant Fricative ProductionsPatrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson. 1750-1754 [doi]

Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as ReferenceYujia Xiao, Frank K. Soong. 1755-1759 [doi]

Mechanisms of Tone Sandhi Rule Application by Non-Native SpeakersSi Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang. 1760-1764 [doi]

Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin ChineseSeth Wiener. 1765-1769 [doi]

Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin SpeakersYing Chen, Eric Pederson. 1770-1774 [doi]

Prosody Analysis of L2 English for Naturalness Evaluation Through Speech ModificationDean Luo, Ruxin Luo, Lixin Wang. 1775-1778 [doi]

Measuring Encoding Efficiency in Swedish and English Language Learner Speech ProductionGintare Grigonyte, Gerold Schneider. 1779-1783 [doi]

Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish ListenersAdriana Hanulíková, Jenny Ekström. 1784-1788 [doi]

Qualitative Differences in L3 Learners' Neurophysiological Response to L1 versus L2 TransferAlejandra Keidel Fernández, Thomas Hörberg. 1789-1793 [doi]

Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled forJohan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva. 1794-1798 [doi]

The Relationship Between the Perception and Production of Non-Native TonesKaile Zhang, Gang Peng. 1799-1803 [doi]

MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated SpeechEllen Marklund, Elísabet Eir Cortes, Johan Sjons. 1804-1808 [doi]

Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad AliVisar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig. 1809-1813 [doi]

Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control SpeakersAntonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi. 1814-1818 [doi]

Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated AssessmentAndrea Bandini, Jordan R. Green, Lorne Zinman, Yana Yunusova. 1819-1823 [doi]

Zero Frequency Filter Based Analysis of Voice DisordersNagaraj Adiga, Vikram C. M., Keerthi Pullela, S. R. Mahadeva Prasanna. 1824-1828 [doi]

Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space AreaNikitha K., Sishir Kalita, C. M. Vikram, M. Pushpavathi, S. R. Mahadeva Prasanna. 1829-1833 [doi]

Automatic Prediction of Speech Evaluation Metrics for Dysarthric SpeechImed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier. 1834-1838 [doi]

Apkinson - A Mobile Monitoring Solution for Parkinson's DiseasePhilipp Klumpp, Thomas Janu, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth. 1839-1843 [doi]

Dysprosody Differentiate Between Parkinson's Disease, Progressive Supranuclear Palsy, and Multiple System AtrophyJan Hlavnicka, Tereza Tykalová, Roman Cmejla, Jirí Klempír, Evzen Ruzicka, Jan Rusz. 1844-1848 [doi]

Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural NetworksMing Tu, Visar Berisha, Julie Liss. 1849-1853 [doi]

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech RecognitionBhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu. 1854-1858 [doi]

Prediction of Speech Delay from Acoustic MeasurementsJason Lilley, Madhavi Vedula Ratnagiri, H. Timothy Bunnell. 1859-1863 [doi]

The Frequency Range of "The Ling Six Sounds" in Standard ChineseAijun Li, Hua Zhang, Wen Sun. 1864-1868 [doi]

Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted ChildrenWentao Gu, Jiao Yin, James J. Mahshie. 1869-1873 [doi]

Audio Content Based Geotagging in MultimediaAnurag Kumar 0003, Benjamin Elizalde, Bhiksha Raj. 1874-1878 [doi]

Time Delay Histogram Based Speech Source Separation Using a Planar ArrayZhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan 0002. 1879-1883 [doi]

Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech SequenceGayadhar Pradhan, Avinash Kumar, Syed Shahnawazuddin. 1884-1888 [doi]

A Contrast Function and Algorithm for Blind Separation of Audio SignalsWei Gao, Roberto Togneri, Victor Sreeram. 1889-1893 [doi]

Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech SourceChenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li. 1894-1898 [doi]

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent BeampatternFeng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan. 1899-1903 [doi]

A Mask Estimation Method Integrating Data Field Model for Speech EnhancementXianyun Wang, Changchun Bao, Feng Bao 0003. 1904-1908 [doi]

Improved End-of-Query Detection for Streaming Speech RecognitionMatt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada. 1909-1913 [doi]

Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AEDDi He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen. 1914-1918 [doi]

Improving Source Separation via Multi-Speaker RepresentationsJeroen Zegers, Hugo Van Hamme. 1919-1923 [doi]

Multiple Sound Source Counting and Localization Based on Spatial Principal EigenvectorBing Yang, Hong Liu, Cheng Pang. 1924-1928 [doi]

Subband Selection for Binaural Speech Source LocalizationGirija Ramesan Karthik, Prasanta Kumar Ghosh. 1929-1933 [doi]

Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech RecordingsBo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu. 1934-1937 [doi]

Bimodal Recurrent Neural Network for Audiovisual Voice Activity DetectionFei Tao, Carlos Busso. 1938-1942 [doi]

Domain-Specific Utterance End-Point Detection for Speech RecognitionRoland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister. 1943-1947 [doi]

Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant EnvironmentsVinay Kothapally, John H. L. Hansen. 1948-1952 [doi]

A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech EnhancementYi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang. 1953-1957 [doi]

Multi-Target Ensemble Learning for Monaural Speech SeparationHui Zhang, Xueliang Zhang, Guanglai Gao. 1958-1962 [doi]

Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example SearchAtsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani. 1963-1967 [doi]

Subjective Intelligibility of Deep Neural Network-Based Speech EnhancementFemke B. Gelderblom, Tron V. Tronstad, Erlend Magnus Viggen. 1968-1972 [doi]

Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech IntelligibilityMaria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh. 1973-1977 [doi]

On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech SignalsHans-Günter Hirsch, Michael Gref. 1978-1982 [doi]

MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech EnhancementRobert Rehr, Timo Gerkmann. 1983-1987 [doi]

Binary Mask Estimation Strategies for Constrained Imputation-Based Speech EnhancementRicard Marxer, Jon Barker. 1988-1992 [doi]

A Fully Convolutional Neural Network for Speech EnhancementSe Rim Park, Jinwon Lee. 1993-1997 [doi]

Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral RegularizationLi Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino. 1998-2002 [doi]

A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech SeparationDanny Websdale, Ben Milner. 2003-2007 [doi]

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker VerificationDaniel Michelsanti, Zheng-Hua Tan. 2008-2012 [doi]

Speech Enhancement Using Bayesian WavenetKaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florêncio, Mark Hasegawa-Johnson. 2013-2017 [doi]

Binaural Reverberant Speech Separation Based on Deep Neural NetworksXueliang Zhang, DeLiang Wang. 2018-2022 [doi]

On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening EnhancementTudor-Catalin Zorila, Yannis Stylianou. 2023-2027 [doi]

Applications of the BBN Sage Speech Processing PlatformRalf Meermeier, Sean Colbath. 2028-2029 [doi]

Bob Speaks KaldiMilos Cernak, Alain Komaty, Amir Mohammadi, André Anjos, Sébastien Marcel. 2030-2031 [doi]

Real Time Pitch Shifting with Formant Structure Preservation Using the Phase VocoderMichal Lenarczyk. 2032-2033 [doi]

A Signal Processing Approach for Speaker Separation Using SFF AnalysisNivedita Chennupati, B. H. V. S. Narayana Murthy, B. Yegnanarayana. 2034-2035 [doi]

Speech Recognition and Understanding on Hardware-Accelerated DSPGeorg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef G. Bauer, Jakub Nowicki, Tobias Bocklet, Hannah R. Colett, Ohad Falik, Michael Deisher, Sylvia J. Downing. 2036-2037 [doi]

MetaLab: A Repository for Meta-Analyses on Language Development, and MoreSho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristià. 2038-2039 [doi]

Evolving Recurrent Neural Networks That Process and Classify Raw Audio in a Streaming FashionAdrien Daniel. 2040-2041 [doi]

Combining Gaussian Mixture Models and Segmental Feature Models for Speaker RecognitionMilana Milosevic, Ulrike Glavitsch. 2042-2043 [doi]

"Did you laugh enough today?" - Deep Neural Networks for Mobile and Wearable Laughter TrackersGerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn W. Schuller. 2044-2045 [doi]

Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public TransportationKwang Myung Jeon, Nam-Kyun Kim, Chan Woong Kwak, Jung-Min Moon, Hong Kook Kim. 2046-2047 [doi]

Real-Time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA JetsonSean U. N. Wood, Jean Rouat. 2048-2049 [doi]

Reading Validation for Pronunciation Evaluation in the Digitala ProjectAku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo. 2050-2051 [doi]

Conversing with Social Agents That Smile and LaughCatherine Pelachaud. 2052 [doi]

Team ELISA System for DARPA LORELEI Speech Evaluation 2016Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick Baskar, Martin Karafiát, Lukás Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth S. Narayanan. 2053-2057 [doi]

First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe RegionPéter Mihajlik, Lili Szabó, Balázs Tarján, András Balog, Krisztina Rábai. 2058-2062 [doi]

The Motivation and Development of MPAi, a Māori Pronunciation AidCatherine I. Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, J. King. 2063-2067 [doi]

On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic ModelingSiyuan Feng, Tan Lee. 2068-2072 [doi]

Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic TranscriptionsAmit Das, Mark Hasegawa-Johnson, Karel Veselý. 2073-2077 [doi]

Areal and Phylogenetic Features for Multilingual Speech SynthesisAlexander Gutkin, Richard Sproat. 2078-2082 [doi]

SLPAnnotator: Tools for Implementing Sign Language Phonetic AnnotationKathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman. 2083-2087 [doi]

The LENA System Applied to Swedish: Reliability of the Adult Word Count EstimateIris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund. 2088-2092 [doi]

What do Babies Hear? Analyses of Child- and Adult-Directed SpeechMarisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Elika Bergelson. 2093-2097 [doi]

A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language EnvironmentsMarisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristià, Melanie Soderstrom, Mark VanDam, Han Sloetjes. 2098-2102 [doi]

Top-Down versus Bottom-Up Theories of Phonological Acquisition: A Big Data ApproachChristina Bergmann, Sho Tsuji, Alejandrina Cristià. 2103-2107 [doi]

Which Acoustic and Phonological Factors Shape Infants' Vowel Discrimination? Exploiting Natural Variation in InPhonDBSho Tsuji, Alejandrina Cristià. 2108-2112 [doi]

The ABAIR Initiative: Bringing Spoken Irish into the Digital SpaceAilbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl. 2113-2117 [doi]

Very Low Resource Radio Browsing for Agile Developmental and Humanitarian MonitoringArmin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler. 2118-2122 [doi]

Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot ResultsNikolaos Malandrakis, Ondrej Glembek, Shrikanth S. Narayanan. 2123-2127 [doi]

Eliciting Meaningful Units from SpeechDaniil Kocharov, Tatiana Kachkovskaia, Pavel A. Skrelin. 2128-2132 [doi]

Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech ApplicationsSaurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty. 2133-2137 [doi]

Machine Assisted Analysis of Vowel Length Contrasts in WolofElodie Gauthier, Laurent Besacier, Sylvie Voisin. 2138-2142 [doi]

Leveraging Text Data for Word Segmentation for Underresourced LanguagesThomas Glarner, Benedikt T. Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach. 2143-2147 [doi]

Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual InitializationXiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu. 2148-2152 [doi]

Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource LanguagesBasil Abraham, S. Umesh, Neethu Mariam Joy. 2153-2157 [doi]

Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource LanguagesBasil Abraham, Tejaswi Seeram, S. Umesh. 2158-2162 [doi]

Building an ASR Corpus Using Althingi's Parliamentary SpeechesInga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason. 2163-2167 [doi]

Implementation of a Radiology Speech Recognition System for Estonian Using Open Source SoftwareTanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister. 2168-2172 [doi]

Building ASR Corpora Using EyraJón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir. 2173-2177 [doi]

Rapid Development of TTS Corpora for Four South African LanguagesDaniel R. van Niekerk, Charl Johannes van Heerden, Marelie H. Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha. 2178-2182 [doi]

Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced LanguagesAlexander Gutkin. 2183-2187 [doi]

Nativization of Foreign Names in TTS for Automatic Reading of World News in SwahiliJoseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King. 2188-2192 [doi]

Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin SpeechRong Tong, Nancy F. Chen, Bin Ma. 2193-2197 [doi]

Relating Unsupervised Word Segmentation to Reported Vocabulary AcquisitionElin Larsen, Alejandrina Cristià, Emmanuel Dupoux. 2198-2202 [doi]

Modelling the Informativeness of Non-Verbal Cues in Parent-Child InteractionMats Wirén, Kristina N. Björkenstam, Robert Östling. 2203-2207 [doi]

Computational Simulations of Temporal Vocalization Behavior in Adult-Child InteractionEllen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson. 2208-2212 [doi]

Approximating Phonotactic Input in Children's Linguistic Environments from Orthographic TranscriptsSofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam. 2213-2217 [doi]

Learning Weakly Supervised Multimodal Phoneme EmbeddingsRahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux. 2218-2222 [doi]

Personalized Quantification of Voice Attractiveness in Multidimensional Merit SpaceYasunari Obuchi. 2223-2227 [doi]

The Role of Temporal Amplitude Modulations in the Political Arena: Hillary Clinton vs. Donald TrumpHans Rutger Bosker. 2228-2232 [doi]

Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based CrowdsourcingLaura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller. 2233-2237 [doi]

Attractiveness of French Voices for German Listeners - Results from Native and Non-Native Read SpeechJürgen Trouvain, Frank Zimmerer. 2238-2242 [doi]

Social Attractiveness in DialogsAntje Schweitzer, Natalie Lewandowski, Daniel Duran 0001. 2243-2247 [doi]

A Gender Bias in the Acoustic-Melodic Features of Charismatic Speech?Eszter Novák-Tót, Oliver Niebuhr, Aoju Chen. 2248-2252 [doi]

Pitch Convergence as an Effect of Perceived Attractiveness and LikabilityJan Michalsky, Heike Schoormann. 2253-2256 [doi]

Does Posh English Sound Attractive?Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu. 2257-2261 [doi]

Large-Scale Speaker Ranking from Crowdsourced Pairwise Listener RatingsTimo Baumann. 2262-2266 [doi]

Aerodynamic Features of French FricativesRosario Signorello, Sergio Hassid, Didier Demolin. 2267-2271 [doi]

Inter-Speaker Variability: Speaker Normalisation and Quantitative Estimation of Articulatory Invariants in Speech Production for FrenchAntoine Serrurier, Pierre Badin, Louis-Jean Boë, Laurent Lamalle, Christiane Neuschaefer-Rube. 2272-2276 [doi]

Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance ImagingNimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth S. Narayanan. 2277-2281 [doi]

Speaker-Specific Biomechanical Model-Based Investigation of a Simple Speech Task Based on Tagged-MRIKeyi Tang, Negar M. Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney Fels. 2282-2286 [doi]

Sounds of the Human Vocal TractReed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth S. Narayanan. 2287-2291 [doi]

A Simulation Study on the Effect of Glottal Boundary Conditions on Vocal Tract FormantsYasufumi Uezu, Tokihiko Kaburagi. 2292-2296 [doi]

A Robust and Alternative Approach to Zero Frequency Filtering Method for Epoch ExtractionP. Gangamohan, B. Yegnanarayana. 2297-2300 [doi]

Improving YANGsaf F0 Estimator with Adaptive Kalman FilterKanru Hua. 2301-2305 [doi]

A Spectro-Temporal Demodulation Technique for Pitch EstimationJitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula. 2306-2310 [doi]

0 of Complex Tone Based on Pitch Perception of Amplitude Modulated SignalKenichiro Miwa, Masashi Unoki. 2311-2315 [doi]

Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution SpectraSimon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt. 2316-2320 [doi]

Harvest: A High-Performance Fundamental Frequency Estimator from Speech SignalsMasanori Morise. 2321-2325 [doi]

Prosodic Event Recognition Using Convolutional Neural Networks with Context InformationSabrina Stehwien, Ngoc Thang Vu. 2326-2330 [doi]

Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized StatementsRamiro H. Gálvez, Stefan Benus, Agustín Gravano, Marián Trnka. 2331-2335 [doi]

An Investigation of Pitch Matching Across Adjacent Turns in a Corpus of Spontaneous GermanMargaret Zellers, Antje Schweitzer. 2336-2340 [doi]

The Relationship Between F0 Synchrony and Speech Convergence in Dyadic InteractionSankar Mukherjee, Alessandro D'Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino. 2341-2345 [doi]

The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone CallsJordi Luque, Carlos Segura, Ariadna Sánchez, Martí Umbert, Luis Angel Galindo. 2346-2350 [doi]

Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine SpanishPablo Brusco, Juan Manuel Pérez, Agustín Gravano. 2351-2355 [doi]

Emotional Features for Speech Overlaps ClassificationOlga Egorow, Andreas Wendemuth. 2356-2360 [doi]

Computing Multimodal Dyadic Behaviors During Spontaneous Diagnosis Interviews Toward Automatic Categorization of Autism Spectrum DisorderChin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee. 2361-2365 [doi]

Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior FeaturesYun-Shao Lin, Chi-Chun Lee. 2366-2370 [doi]

Spotting Social Signals in Conversational Speech over IP: A Deep Learning PerspectiveRaymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn W. Schuller. 2371-2375 [doi]

Optimized Time Series Filters for Detecting Laughter and Filler EventsGábor Gosztolya. 2376-2380 [doi]

Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement Within TED TalksFasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell 0001. 2381-2385 [doi]

Large-Scale Domain Adaptation via Teacher-Student LearningJinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong. 2386-2390 [doi]

Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram InversionW. Ahmad, Syed Shahnawazuddin, Hemant Kumar Kathania, Gayadhar Pradhan, A. B. Samaddar. 2391-2395 [doi]

RNN-LDA Clustering for Feature Based DNN AdaptationXurong Xie, Xunying Liu, Tan Lee, Lan Wang. 2396-2400 [doi]

Robust Online i-Vectors for Unsupervised Adaptation of DNN Acoustic Models: A Study in the Context of Digital Voice AssistantsHarish Arsikere, Sri Garimella. 2401-2405 [doi]

Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic ControlAjay Srinivasamurthy, Petr Motlícek, Ivan Himawan, György Szaszák, Youssef Oualil, Hartmut Helmke. 2406-2410 [doi]

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech RecognitionTaesup Kim, Inchul Song, Yoshua Bengio. 2411-2415 [doi]

An Entrained Rhythm's Frequency, Not Phase, Influences Temporal Sampling of SpeechHans Rutger Bosker, Anne Kösem. 2416-2420 [doi]

Context Regularity Indexed by Auditory N1 and P2 Event-Related PotentialsXiao Wang, Yanhui Zhang, Gang Peng. 2421-2425 [doi]

Discovering Language in Marmoset VocalizationSakshi Verma, K. L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema A. Murthy. 2426-2430 [doi]

Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech PerceptionHiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001. 2431-2435 [doi]

The Phonological Status of the French Initial Accent and its Role in Semantic Processing: An Event-Related Potentials StudyNoémie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano. 2436-2440 [doi]

A Neuro-Experimental Evidence for the Motor Theory of Speech PerceptionBin Zhao, Jianwu Dang, Gaoyan Zhang. 2441-2445 [doi]

Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASRPurvi Agrawal, Sriram Ganapathy. 2446-2450 [doi]

Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech RecognitionMasato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara. 2451-2455 [doi]

Recognizing Multi-Talker Speech with Permutation Invariant TrainingDong Yu, Xuankai Chang, Yanmin Qian. 2456-2460 [doi]

Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral InformationYuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya, Shinji Watanabe, Jonathan Le Roux. 2461-2465 [doi]

Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASRErfan Loweimi, Jon Barker, Thomas Hain. 2466-2470 [doi]

Robust Speech Recognition via Anchor Word RepresentationsBrian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, Sree Hari Krishnan Parthasarathi, Björn Hoffmeister. 2471-2475 [doi]

Towards Zero-Shot Frame Semantic Parsing for Domain ScalingAnkur Bapna, Gökhan Tür, Dilek Hakkani-Tür, Larry P. Heck. 2476-2480 [doi]

ClockWork-RNN Based Architectures for Slot FillingDespoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis. 2481-2485 [doi]

Investigating the Effect of ASR Tuning on Named Entity RecognitionMohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset. 2486-2490 [doi]

Label-Dependency Coding in Simple Recurrent Networks for Spoken Language UnderstandingMarco Dinarelli, Vedran Vukotic, Christian Raymond. 2491-2495 [doi]

Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational SpeechZhong Meng, Biing-Hwang Juang. 2496-2500 [doi]

Topic Identification for Speech Without ASRChunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur. 2501-2505 [doi]

An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented DialogBing Liu, Ian Lane. 2506-2510 [doi]

Deep Reinforcement Learning of Dialogue Policies with Less Weight UpdatesHeriberto Cuayáhuitl, Seunghak Yu. 2511-2515 [doi]

Towards End-to-End Spoken Dialogue Systems with Turn EmbeddingsAli Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi. 2516-2520 [doi]

Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer InteractionOleg Akhtiamov, Maxim Sidorov, Alexey A. Karpov, Wolfgang Minker. 2521-2525 [doi]

Rushing to Judgement: How do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human-Machine Dialog?Vikram Ramanarayanan, Chee Wee Leong, David Suendermann-Oeft. 2526-2530 [doi]

Hyperarticulation of Corrections in Multilingual Dialogue SystemsIvan Kraljevski, Diane Hirschfeld. 2531-2535 [doi]

Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionBenjamin Milde, Christoph Schmidt, Joachim Köhler. 2536-2540 [doi]

Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection FrameworkXiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur. 2541-2545 [doi]

Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and TextTakahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig. 2546-2550 [doi]

Improved Subword Modeling for WFST-Based Speech RecognitionPeter Smit, Sami Virpioja, Mikko Kurimo. 2551-2555 [doi]

Pronunciation Learning with RNN-TransducersAntoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Françoise Beaufays. 2556-2560 [doi]

Learning Similarity Functions for Pronunciation VariationsEinat Naaman, Yossi Adi, Joseph Keshet. 2561-2565 [doi]

Spoken Language Identification Using LSTM-Based Angular ProximityGregory Gelly, Jean-Luc Gauvain. 2566-2570 [doi]

End-to-End Language Identification Using High-Order Utterance Representation with Bilinear PoolingMa Jin, Yan Song, Ian Vince McLoughlin, Wu Guo, Li-Rong Dai. 2571-2575 [doi]

Dialect Recognition Based on Unsupervised Bottleneck FeaturesQian Zhang, John H. L. Hansen. 2576-2580 [doi]

Investigating Scalability in Hierarchical Language Identification SystemSaad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li. 2581-2585 [doi]

Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English SpeechYao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong. 2586-2590 [doi]

QMDIS: QCRI-MIT Advanced Dialect Identification SystemSameer Khurana, Maryam Najafian, Ahmed M. Ali, Tuka Al Hanai, Yonatan Belinkov, James R. Glass. 2591-2595 [doi]

Detection of Replay Attacks Using Single Frequency Filtering Cepstral CoefficientsK. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala. 2596-2600 [doi]

Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech DetectionHardik B. Sailor, Madhu R. Kamble, Hemant A. Patil. 2601-2605 [doi]

Independent Modelling of High and Low Energy Speech Frames for Spoofing DetectionGajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah. 2606-2610 [doi]

Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed DataAchintya Kr. Sarkar, Md. Sahidullah, Zheng-Hua Tan, Tomi Kinnunen. 2611-2615 [doi]

VoxCeleb: A Large-Scale Speaker Identification DatasetArsha Nagrani, Joon Son Chung, Andrew Zisserman. 2616-2620 [doi]

Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition TechnologyKaren Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright. 2621-2624 [doi]

Sequence-to-Sequence Models Can Directly Translate Foreign SpeechRon J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen. 2625-2629 [doi]

Structured-Based Curriculum Learning for End-to-End English-Japanese Speech TranslationTakatomo Kano, Sakriani Sakti, Satoshi Nakamura 0001. 2630-2634 [doi]

Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition ErrorsNicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico. 2635-2639 [doi]

Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and EmphasisQuoc Truong Do, Sakriani Sakti, Satoshi Nakamura 0001. 2640-2644 [doi]

NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language TranslationEunah Cho, Jan Niehues, Alex Waibel. 2645-2649 [doi]

Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering EmbeddingsLukas Drude, Reinhold Haeb-Umbach. 2650-2654 [doi]

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech MixturesKaterina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani. 2655-2659 [doi]

Eigenvector-Based Speech Mask Estimation Using Logistic RegressionLukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf. 2660-2664 [doi]

Real-Time Speech Enhancement with GCC-NMFSean U. N. Wood, Jean Rouat. 2665-2669 [doi]

Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy EnvironmentYouna Ji, Jun Byun, Young-Cheol Park. 2670-2674 [doi]

Glottal Model Based Speech Beamforming for ad-hoc Microphone ArraysYang Zhang, Dinei Florêncio, Mark Hasegawa-Johnson. 2675-2679 [doi]

Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior FeaturesYuanyuan Liu, Tan Lee, P. C. Ching, Thomas K. T. Law, Kathy Y. S. Lee. 2680-2684 [doi]

Multi-Stage DNN Training for Automatic Recognition of Dysarthric SpeechEmre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik. 2685-2689 [doi]

Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult SpeechDaniel Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera Tawil, Angela Morgan. 2690-2694 [doi]

On Improving Acoustic Models for TORGO Dysarthric Speech DatabaseNeethu Mariam Joy, S. Umesh, Basil Abraham. 2695-2699 [doi]

Glottal Source Features for Automatic Speech-Based Depression AssessmentOlympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke. 2700-2704 [doi]

Speech Processing Approach for Diagnosing Dementia in an Early StageRoozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian. 2705-2709 [doi]

Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic SignalsFadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro. 2710-2714 [doi]

Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary FeaturesSalil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain. 2715-2719 [doi]

Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech RecognitionMittul Singh, Youssef Oualil, Dietrich Klakow. 2720-2724 [doi]

Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the CheapCiprian Chelba, Diamantino Caseiro, Fadi Biadsy. 2725-2729 [doi]

Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken InteractionsManoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas D. Lyon, Shrikanth S. Narayanan. 2730-2734 [doi]

Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech RecognitionWeiwu Zhu. 2735-2738 [doi]

Developing On-Line Speaker Diarization SystemDimitrios Dimitriadis, Petr Fousek. 2739-2743 [doi]

Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech ProcessingShreyas Seshadri, Ulpu Remes, Okko Räsänen. 2744-2748 [doi]

Automatic Evaluation of Children Reading Aloud on Sentences and PseudowordsJorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão. 2749-2753 [doi]

Off-Topic Spoken Response Detection with Word EmbeddingsSu-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini. 2754-2758 [doi]

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep ModelsWei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee. 2759-2763 [doi]

Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture SlidesShoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa. 2764-2768 [doi]

Multiview Representation Learning via Deep CCA for Silent Speech RecognitionMyung Jong Kim, Beiming Cao, Ted Mau, Jun Wang 0037. 2769-2773 [doi]

Use of Graphemic Lexicons for Spoken Language AssessmentKate M. Knill, Mark J. F. Gales, K. Kyriakopoulos, Anton Ragni, Yu Wang. 2774-2778 [doi]

Distilling Knowledge from an Ensemble of Models for Punctuation PredictionJiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li. 2779-2783 [doi]

A Mostly Data-Driven Approach to Inverse Text NormalizationErnest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Plátek, Donald McAllaster, Venki Nagesha. 2784-2788 [doi]

Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering ApproachWenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim. 2789-2793 [doi]

Experiments in Character-Level Neural Network Models for PunctuationWilliam Gale, Sarangarajan Parthasarathy. 2794-2798 [doi]

Multi-Channel Apollo Mission Speech Transcripts CalibrationLakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen. 2799-2803 [doi]

Calibration Approaches for Language DetectionMitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson. 2804-2808 [doi]

Bidirectional Modelling for Short Duration Language IdentificationSarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps. 2809-2813 [doi]

Conditional Generative Adversarial Nets Classifier for Spoken Language IdentificationPeng Shen, Xugang Lu, Sheng Li, Hisashi Kawai. 2814-2818 [doi]

Tied Hidden Factors in Neural Networks for End-to-End Speaker RecognitionAntonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida. 2819-2823 [doi]

Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster LabelsSungrack Yun, Hye Jin Jang, Taesu Kim. 2824-2828 [doi]

Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker ClusteringIgnacio Viñals, Alfonso Ortega, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida. 2829-2833 [doi]

LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language ModellingMiquel India, José A. R. Fonollosa, Javier Hernando. 2834-2838 [doi]

Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game LocalizationAdrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre. 2839-2843 [doi]

Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice ComparisonMoez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn. 2844-2848 [doi]

Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker RecognitionYosef A. Solewicz, Michael Jessen, David van der Vloed. 2849-2853 [doi]

The Opensesame NIST 2016 Speaker Recognition Evaluation SystemGang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao. 2854-2858 [doi]

IITG-Indigo System for NIST 2016 SRE ChallengeNagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B. K, H. Kashyap, K. Sri Rama Murty, Sriram Ganapathy, Rohit Sinha, S. R. Mahadeva Prasanna. 2859-2863 [doi]

Locally Weighted Linear Discriminant Analysis for Robust Speaker VerificationAbhinav Misra, Shivesh Ranjan, John H. L. Hansen. 2864-2868 [doi]

Recursive Whitening Transformation for Speaker Recognition on Language Mismatched ConditionSuwon Shon, Seongkyu Mun, Hanseok Ko. 2869-2873 [doi]

Query-by-Example Search with Discriminative Neural Acoustic Word EmbeddingsShane Settle, Keith Levin, Herman Kamper, Karen Livescu. 2874-2878 [doi]

Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term DetectionDaisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh 0001. 2879-2883 [doi]

Fast and Accurate OOV Decoder on High-Level FeaturesYuri Y. Khokhlov, Natalia A. Tomashenko, Ivan Medennikov, Aleksei Romanenko. 2884-2888 [doi]

Exploring the Use of Significant Words Language Modeling for Spoken Document RetrievalYing-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen. 2889-2893 [doi]

Incorporating Acoustic Features for Spontaneous Speech Driven Content RetrievalHiroto Tasaki, Tomoyosi Akiba. 2894-2898 [doi]

Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal ClassificationBo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-yi Lee, Lin-Shan Lee. 2899-2903 [doi]

Automatic Alignment Between Classroom Lecture Utterances and Slide ComponentsMasatoshi Tsuchiya, Ryo Minamiguchi. 2904-2908 [doi]

Compensating Gender Variability in Query-by-Example Search on Speech Using Voice ConversionPaula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo. 2909-2913 [doi]

Zero-Shot Learning Across Heterogeneous Overlapping DomainsAnjishnu Kumar, Pavankumar Reddy Muddireddy, Markus Dreyer, Björn Hoffmeister. 2914-2918 [doi]

Hierarchical Recurrent Neural Network for Story SegmentationEmiru Tsunoo, Peter Bell 0001, Steve Renals. 2919-2923 [doi]

Evaluating Automatic Topic Segmentation as a Segment Retrieval TaskAbdessalam Bouchekif, Delphine Charlet, Géraldine Damnati, Nathalie Camelin, Yannick Estève. 2924-2928 [doi]

Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle TimestampsJeong-Uk Bang, Mu Yeol Choi, Sang-hun Kim, Oh-Wook Kwon. 2929-2933 [doi]

A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation EmbeddingsJan Svec, Josef V. Psutka, Lubos Smídl, Jan Trmal. 2934-2938 [doi]

Predicting Automatic Speech Recognition Performance Over Communication Channels from Instrumental Speech Quality and Intelligibility ScoresLaura Fernández Gallardo, Sebastian Möller, John Beerends. 2939-2943 [doi]

Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener AgeCassia Valentini-Botinhao, Junichi Yamagishi. 2944-2948 [doi]

Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion RatioKatsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani. 2949-2953 [doi]

Intelligibilities of Mandarin Chinese Sentences with Spectral "Holes"Yafan Chen, Yong Xu, Jun Yang. 2954-2957 [doi]

The Effect of Situation-Specific Non-Speech Acoustic Cues on the Intelligibility of Speech in NoiseLauren Ward, Ben Shirley, Yan Tang, William J. Davies. 2958-2962 [doi]

On the Use of Band Importance Weighting in the Short-Time Objective Intelligibility MeasureAsger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001. 2963-2967 [doi]

Listening in the Dips: Comparing Relevant Features for Speech Recognition in Humans and MachinesConstantin Spille, Bernd T. Meyer. 2968-2972 [doi]

Mental Representation of Japanese Mora; Focusing on its Intrinsic DurationKosuke Sugai. 2973-2977 [doi]

Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian EnglishJia-ying, Christopher Carignan, Jason A. Shaw, Michael I. Proctor, Donald Derrick, Catherine T. Best. 2978-2982 [doi]

Vowel and Consonant Sequences in three Bavarian Dialects of AustriaNicola Klingler, Sylvia Moosmüller, Hannes Scheutz. 2983-2987 [doi]

Acoustic Cues to the Singleton-Geminate Contrast: The Case of Libyan Arabic SonorantsAmel Issa. 2988-2992 [doi]

Mel-Cepstral Distortion of German Vowels in Different Information Density ContextsErika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius. 2993-2997 [doi]

Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech SynthesisTomás Boril, Pavel Sturm, Radek Skarnitzl, Jan Volín. 2998-3002 [doi]

An Ultrasound Study of Alveolar and Retroflex Consonants in Arrernte: Stressed and Unstressed SyllablesMarija Tabain, Richard Beare. 3003-3007 [doi]

dChrister Gobl. 3008-3012 [doi]

Kinematic Signatures of Prosody in Lombard SpeechStefan Benus, Juraj Simko, Mona Lehtinen. 3013-3017 [doi]

What do Finnish and Central Bavarian Have in Common? Towards an Acoustically Based Quantity TypologyMarkus Jochim, Felicitas Kleber. 3018-3022 [doi]

Locating Burst Onsets Using SFF Envelope and Phase InformationBhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana. 3023-3027 [doi]

A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin ChineseHongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang. 3028-3032 [doi]

A Quantitative Measure of the Impact of Coarticulation on Phone DiscriminabilityThomas Schatz, Rory Turnbull, Francis Bach, Emmanuel Dupoux. 3033-3037 [doi]

Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude DifferenceKin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui. 3038-3042 [doi]

Audio Scene Classification with Deep Recurrent Neural NetworksHuy Phan, Philipp Koch, Fabrice Katzberg, Marco Maaß, Radoslaw Mazur, Alfred Mertins. 3043-3047 [doi]

Automatic Time-Frequency Analysis of Echolocation Signals Using the Matched Gaussian Multitaper SpectrogramMaria Sandsten, Isabella Reinhold, Josefin Starkhammar. 3048-3052 [doi]

Classification-Based Detection of Glottal Closure Instants from Speech SignalsJindrich Matousek, Daniel Tihelka. 3053-3057 [doi]

A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural NetworkXiaoke Qi, Jianhua Tao. 3058-3062 [doi]

Laryngeal Articulation During Trumpet Performance: An Exploratory StudyLuis M. T. Jesus, Bruno Rocha, Andreia Hall. 3063-3067 [doi]

Matrix of Polynomials Model Based Polynomial Dictionary Learning Method for Acoustic Impulse Response ModelingJian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang. 3068-3072 [doi]

Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image FeaturesRakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H. L. Hansen, Taufiq Hasan. 3073-3077 [doi]

An Environmental Feature Representation for Robust Speech Recognition and for Environment IdentificationXue Feng, Brigitte Richardson, Scott Amman, James Glass. 3078-3082 [doi]

Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio TaggingYong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley. 3083-3087 [doi]

An Audio Based Piano Performance Evaluation Method Using Deep Neural Network Based Acoustic ModelingJing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu. 3088-3092 [doi]

Music Tempo Estimation Using Sub-Band SynchronyShreyan Chowdhury, Tanaya Guha, Rajesh M. Hegde. 3093-3096 [doi]

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal ClassificationYun Wang, Florian Metze. 3097-3101 [doi]

A Note Based Query By Humming System Using Convolutional Neural NetworkNaziba Mostafa, Pascale Fung. 3102-3106 [doi]

Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound ClassificationHardik B. Sailor, Dharmesh M. Agrawal, Hemant A. Patil. 3107-3111 [doi]

Novel Shifted Real Spectrum for Exact Signal ReconstructionMeet H. Soni, Rishabh Tak, Hemant A. Patil. 3112-3116 [doi]

Manual and Automatic Transcriptions in Dementia Detection from SpeechJochen Weiner, Mathis Engelbart, Tanja Schultz. 3117-3121 [doi]

An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural NetworksRahul Gupta, Saurabh Sahu, Carol Espy Wilson, Shrikanth S. Narayanan. 3122-3126 [doi]

Cross-Database Models for the Classification of Dysarthria PresenceStephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel. 3127-3131 [doi]

Acoustic Evaluation of Nasality in Cerebellar SyndromesMichal Novotný, Jan Rusz, K. Spálenka, Jirí Klempír, D. Horáková, Evzen Ruzicka. 3132-3136 [doi]

Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First FindingsSimone Hantke, Hesam Sagha, Nicholas Cummins, Björn W. Schuller. 3137-3141 [doi]

Phonological Markers of Oxytocin and MDMA IngestionCarla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo A. Cecchi. 3142-3146 [doi]

An Avatar-Based System for Identifying Individuals Likely to Develop DementiaBahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen. 3147-3151 [doi]

Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep DeprivationYue Zhang 0014, Felix Weninger, Björn W. Schuller. 3152-3156 [doi]

Depression Detection Using Automatic Transcriptions of De-Identified SpeechPaula Lopez-Otero, Laura Docío Fernández, Alberto Abad, Carmen García-Mateo. 3157-3161 [doi]

An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer's Disease from Spoken LanguageSebastian Wankerl, Elmar Nöth, Stefan Evert. 3162-3166 [doi]

Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' TherapyKarel Mundnich, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 3167-3171 [doi]

Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and PolishMassimo Pettorino, Wentao Gu, Pawel Pólrola, Ping Fan. 3172-3176 [doi]

Trisyllabic Tone 3 Sandhi Patterns in Mandarin Produced by Cantonese SpeakersJung-Yueh Tu, Janice Wing Sze Wong, Jih-Ho Cha. 3177-3180 [doi]

Intonation of Contrastive Topic in EstonianHeete Sahkai, Meelis Mihkla. 3181-3185 [doi]

Reanalyze Fundamental Frequency Peak Delay in MandarinLixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang. 3186-3190 [doi]

How Does the Absence of Shared Knowledge Between Interlocutors Affect the Production of French Prosodic Forms?Amandine Michelas, Cecile Cau, Maud Champagne-Lavau. 3191-3195 [doi]

Three Dimensions of Sentence Prosody and Their (Non-)InteractionsMichael Wagner, Michael McAuliffe. 3196-3200 [doi]

Using Prosody to Classify Discourse RelationsJanine Kleinhans, Mireia Farrús, Agustín Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner. 3201-3205 [doi]

Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in SpeechElizabeth Godoy, James R. Williamson, Thomas F. Quatieri. 3206-3210 [doi]

Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise ConditionsSofoklis Kakouros, Okko Räsänen, Paavo Alku. 3211-3215 [doi]

Creaky Voice as a Function of Tonal Categories and Prosodic BoundariesJianjing Kuang. 3216-3220 [doi]

The Acoustics of Word Stress in Czech as a Function of Speaking StyleRadek Skarnitzl, Anders Eriksson. 3221-3225 [doi]

What You See is What You Get Prosodically Less - Visibility Shapes Prosodic Prominence Production in Spontaneous InteractionPetra Wagner, Nataliya Bryhadyr. 3226-3230 [doi]

Focus Acoustics in Mandarin NominalsYu-Yin Hsu, Anqi Xu. 3231-3235 [doi]

Exploring Multidimensionality: Acoustic and Articulatory Correlates of Swedish Word AccentsMalin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald. 3236-3240 [doi]

The Perception of English Intonation Patterns by German L2 Speakers of EnglishKarin Puga, Robert Fuchs, Jane Setter, Peggy Mok. 3241-3245 [doi]

The Perception of Emotions in Noisified Nonsense SpeechEmilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn W. Schuller. 3246-3250 [doi]

Attention Networks for Modeling Behaviors in Addiction CounselingJames Gibson, Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan. 3251-3255 [doi]

Computational Analysis of Acoustic Descriptors in Psychotic PatientsTorsten Wörtwein, Tadas Baltrusaitis, Eugene Laksana, Luciana Pennant, Elizabeth S. Liebson, Dost Öngür, Justin T. Baker, Louis-Philippe Morency. 3256-3260 [doi]

Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion RecognitionYa-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee. 3261-3265 [doi]

Implementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression RecognitionBogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn W. Schuller. 3266-3270 [doi]

Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural NetsFarhad Bin Siddique, Pascale Fung. 3271-3275 [doi]

Emotion Category Mapping to Emotional Space by Cross-Corpus Emotion LabelingYoshiko Arimoto, Hiroki Mori. 3276-3280 [doi]

Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality CorpusCedric Fayet, Arnaud Delhay, Damien Lolive, Pierre-François Marteau. 3281-3285 [doi]

Speech Rate Comparison When Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map TaskHayakawa Akira, Carl Vogel, Saturnino Luz, Nick Campbell 0001. 3286-3290 [doi]

Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence EmbeddingsShao-Yen Tseng, Brian R. Baucom, Panayiotis G. Georgiou. 3291-3295 [doi]

Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment InterviewsMd. Nasir, Brian R. Baucom, Craig J. Bryan, Shrikanth S. Narayanan, Panayiotis G. Georgiou. 3296-3300 [doi]

An Investigation of Emotion Dynamics and Kalman Filtering for Speech-Based Emotion PredictionZhaocheng Huang, Julien Epps. 3301-3305 [doi]

Zero-Shot Learning for Natural Language Understanding Using Domain-Independent Sequential Structure and Question TypesKugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo. 3306-3310 [doi]

Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document ClassificationNaoki Sawada, Ryo Masumura, Hiromitsu Nishizaki. 3311-3315 [doi]

Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language UnderstandingMohamed Morchid. 3316-3319 [doi]

Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal DescriptionsMandy Korpusik, Zachary Collins, James Glass. 3320-3324 [doi]

Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone ConversationsTitouan Parcollet, Mohamed Morchid, Georges Linarès. 3325-3328 [doi]

ASR Error Management for Improving Spoken Language UnderstandingEdwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato de Mori. 3329-3333 [doi]

Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural NetworksMingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou. 3334-3338 [doi]

To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language GenerationNeha Nayak, Dilek Hakkani-Tür, Marilyn A. Walker, Larry P. Heck. 3339-3343 [doi]

Online Adaptation of an Attention-Based Neural Network for Natural Language GenerationMatthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefèvre. 3344-3348 [doi]

Spanish Sign Language Recognition with Different Topology Hidden Markov ModelsCarlos D. Martínez-Hinarejos, Zuzanna Parcheta. 3349-3353 [doi]

OpenMM: An Open-Source Multimodal Feature Extraction ToolMichelle Renee Morales, Stefan Scherer, Rivka Levitan. 3354-3358 [doi]

Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement RecognitionYuyun Huang, Emer Gilmartin, Nick Campbell 0001. 3359-3363 [doi]

Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial NetworksChin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang. 3364-3368 [doi]

CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice ConversionToru Nakashika. 3369-3373 [doi]

Phoneme-Discriminative Features for Dysarthric Speech ConversionRyo Aihara, Tetsuya Takiguchi, Yasuo Ariki. 3374-3378 [doi]

Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice ConversionJie Wu, D.-Y. Huang, Lei Xie, Haizhou Li. 3379-3383 [doi]

Speaker Dependent Approach for Enhancing a Glossectomy Patient's Speech via GMM-Based Voice ConversionKei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi. 3384-3388 [doi]

Generative Adversarial Network-Based Postfilter for STFT SpectrogramsTakuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi. 3389-3393 [doi]

Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech SynthesisBajibabu Bollepalli, Lauri Juvela, Paavo Alku. 3394-3398 [doi]

Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional DataZhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki. 3399-3403 [doi]

Speaker Adaptation in DNN-Based Speech Synthesis Using d-VectorsRama Doddipatla, Norbert Braunschweiler, Ranniery Maia. 3404-3408 [doi]

Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice ConversionRunnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai. 3409-3413 [doi]

Segment Level Voice Conversion with Recurrent Neural NetworksMiguel Varela Ramos, Alan W. Black, Ramón Fernández Astudillo, Isabel Trancoso, Nuno Fonseca. 3414-3418 [doi]

Creating a Voice for MiRo, the World's First Commercial Biomimetic RobotRoger K. Moore, Ben Mitchinson. 3419-3420 [doi]

A Thematicity-Based Prosody Enrichment Tool for CTSMónica Domínguez, Mireia Farrús, Leo Wanner. 3421-3422 [doi]

WebSubDub - Experimental System for Creating High-Quality Alternative Audio Track for TV BroadcastingMartin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Jakub Vít, Daniel Tihelka. 3423-3424 [doi]

Voice Conservation and TTS System for People Facing Total LaryngectomyMarkéta Juzová, Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlícek. 3425-3426 [doi]

TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS VoiceAtish Shankar Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, S. Aswin Shanmugam, Sasikumar M., Hema A. Murthy. 3427-3428 [doi]

SIAK - A Game for Foreign Language Pronunciation LearningReima Karhila, Sari Ylinen, Seppo Enarvi, Kalle J. Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Katja Junttila, Maria Uther, Perttu Hämäläinen, Mikko Kurimo. 3429-3430 [doi]

Integrating the Talkamatic Dialogue Manager with AlexaStaffan Larsson, Alexander Berman, Andreas Krona, Fredrik Kronlid. 3431-3432 [doi]

A Robust Medical Speech-to-Speech/Speech-to-Sign PhraselatorFarhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis. 3433-3434 [doi]

Towards an Autarkic Embedded Cognitive User InterfaceFrank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff. 3435-3436 [doi]

Nora the Empathetic PsychologistGenta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung. 3437-3438 [doi]

Modifying Amazon's Alexa ASR Grammar and Lexicon - A Case StudyHassan Alam, Aman Kumar, Manan Vyas, Tina Werner, Rachmat Hartono. 3439-3440 [doi]

Re-Inventing Speech - The Biological WayBjörn Lindblom. 3441 [doi]

The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & SnoringBjörn W. Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang 0014, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou. 3442-3446 [doi]

It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold ChallengeMark Huckvale, András Beke. 3447-3451 [doi]

End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware SpectrumDanwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li. 3452-3456 [doi]

Infected Phonemes: How a Cold Impairs Speech on a Phonetic LevelJohannes Wagner 0001, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André. 3457-3461 [doi]

Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy ConditionAkshay Kalkunte Suresh, Srinivasa Raghavan K. M., Prasanta Kumar Ghosh. 3462-3466 [doi]

An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNNTin Lay Nwe, Tran Huy Dat, Wen Zheng Terence Ng, Bin Ma. 3467-3471 [doi]

Acoustic Analysis of Detailed Three-Dimensional Shape of the Human Nasal Cavity and Paranasal SinusesTatsuya Kitamura, Hironori Takemoto, Hisanori Makinae, Tetsutaro Yamaguchi, Koutaro Maki. 3472-3476 [doi]

A Semi-Polar Grid Strategy for the Three-Dimensional Finite Element Simulation of Vowel-Vowel SequencesMarc Arnela, Saeed Dabbaghchian, Oriol Guasch, Olov Engwall. 3477-3481 [doi]

A Fast Robust 1D Flow Model for a Self-Oscillating Coupled 2D FEM Vocal Fold SimulationArvind Vasudevan, Victor Zappi, Peter Anderson, Sidney Fels. 3482-3486 [doi]

Waveform Patterns in Pitch Glides Near a Vocal Tract ResonanceTiina Murtola, Jarmo Malinen. 3487-3491 [doi]

A Unified Numerical Simulation of Vowel Production That Comprises Phonation and the Emitted SoundNiyazi Cem Degirmenci, Johan Jansson, Johan Hoffman, Marc Arnela, Patricia Sánchez-Martín, Oriol Guasch, Sten Ternström. 3492-3496 [doi]

Synthesis of VV Utterances from Muscle Activation to Sound with a 3D ModelSaeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch. 3497-3501 [doi]

A Dual Source-Filter Model of Snore Audio for Snorer Group ClassificationM. V. Achuth Rao, Shivani Yadav, Prasanta Kumar Ghosh. 3502-3506 [doi]

An 'End-to-Evolution' Hybrid Approach for Snore Sound ClassificationMichael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn W. Schuller. 3507-3511 [doi]

Snore Sound Classification Using Image-Based Deep Spectrum FeaturesShahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn W. Schuller. 3512-3516 [doi]

Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic InformationDavid Tavarez, Xabier Sarasola, Agustín Alonso, Jon Sánchez, Luis Serrano, Eva Navas, Inma Hernáez. 3517-3521 [doi]

DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring IdentificationGábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth. 3522-3526 [doi]

Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and ColdHeysem Kaya, Alexey A. Karpov. 3527-3531 [doi]

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech RecognitionShubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu. 3532-3536 [doi]

Optimizing Expected Word Error Rate via Sampling for Speech RecognitionMatt Shannon. 3537-3541 [doi]

Annealed f-Smoothing as a Mechanism to Speed up Neural Network TrainingTara N. Sainath, Vijayaditya Peddinti, Olivier Siohan, Arun Narayanan. 3542-3546 [doi]

Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword SpottingZhong Meng, Biing-Hwang Juang. 3547-3551 [doi]

Exploiting Eigenposteriors for Semi-Supervised Training of DNN Acoustic Models with Sequence DiscriminationPranay Dighe, Afsaneh Asaei, Hervé Bourlard. 3552-3556 [doi]

Discriminative Autoencoders for Acoustic ModelingMing-Han Yang, Hung-Shin Lee, Yu-Ding Lu, Kuan-Yu Chen, Yu Tsao, Berlin Chen, Hsin-Min Wang. 3557-3561 [doi]

Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation RefinementZbynek Zajíc, Marek Hrúz, Ludek Müller. 3562-3566 [doi]

Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker SegmentationArindam Jati, Panayiotis G. Georgiou. 3567-3571 [doi]

A Triplet Ranking-Based Neural Network for Speaker Diarization and LinkingGaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier. 3572-3576 [doi]

Estimating Speaker Clustering Quality Using Logistic RegressionYishai Cohen, Itshak Lapidot. 3577-3581 [doi]

Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker DiarizationGuillaume Wisniewski, Hervé Bredin, Gregory Gelly, Claude Barras. 3582-3586 [doi]

pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization SystemsHervé Bredin. 3587-3591 [doi]

A Rescoring Approach for Keyword Search Using Lattice Context InformationZhipeng Chen, Ji Wu. 3592-3596 [doi]

The Kaldi OpenKWS System: Improving Low Resource Keyword SearchJan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Yiming Wang, Vimal Manohar, Hainan Xu, Daniel Povey, Sanjeev Khudanpur. 3597-3601 [doi]

The STC Keyword Search System for OpenKWS 2016 EvaluationYuri Y. Khokhlov, Ivan Medennikov, Aleksei Romanenko, Valentin Mendelev, Maxim Korenevsky, Alexey Prudnikov, Natalia A. Tomashenko, Alexander Zatvornitsky. 3602-3606 [doi]

Compressed Time Delay Neural Network for Small-Footprint Keyword SpottingMing Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni. 3607-3611 [doi]

Symbol Sequence Search from Telephone ConversationMasayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Kenneth Ward Church, Mark Drake. 3612-3616 [doi]

Similarity Learning Based Query Modeling for Keyword SearchBatuhan Gündogdu, Murat Saraclar. 3617-3621 [doi]

Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann MachinesSuman Samui, Indrajit Chakrabarti, Soumya K. Ghosh. 3622-3626 [doi]

Improved Codebook-Based Speech Enhancement Based on MBE ModelQizheng Huang, Changchun Bao, Xianyun Wang. 3627-3631 [doi]

Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual ConnectionZhuo Chen, Yan Huang, Jinyu Li, Yifan Gong. 3632-3636 [doi]

Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech RecognitionBi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen. 3637-3641 [doi]

SEGAN: Speech Enhancement Generative Adversarial NetworkSantiago Pascual, Antonio Bonafonte, Joan Serrà. 3642-3646 [doi]

Concatenative Resynthesis Using Twin NetworksSoumi Maiti, Michael I. Mandel. 3647-3651 [doi]

Combining Residual Networks with LSTMs for LipreadingThemos Stafylakis, Georgios Tzimiropoulos. 3652-3656 [doi]

Improving Computer Lipreading via DNN Sequence Discriminative Training TechniquesKwanchiva Thangthai, Richard W. Harvey. 3657-3661 [doi]

Improving Speaker-Independent Lipreading with Domain-Adversarial TrainingMichael Wand, Jürgen Schmidhuber. 3662-3666 [doi]

Turbo Decoders for Audio-Visual Continuous Speech RecognitionAhmed Hussen Abdelaziz. 3667-3671 [doi]

DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech InterfaceTamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, Alexandra Markó. 3672-3676 [doi]

Visually Grounded Learning of Keyword Prediction from Untranscribed SpeechHerman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu. 3677-3681 [doi]

Deep Neural Factorization for Speech RecognitionJen-Tzung Chien, Chen Shen. 3682-3686 [doi]

Semi-Supervised DNN Training with Word Selection for ASRKarel Veselý, Lukás Burget, Jan Cernocký. 3687-3691 [doi]

Gaussian Prediction Based Attention for Online End-to-End Speech RecognitionJunfeng Hou, Shiliang Zhang, Li-Rong Dai. 3692-3696 [doi]

Efficient Knowledge Distillation from an Ensemble of TeachersTakashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran. 3697-3701 [doi]

An Analysis of "Attention" in Sequence-to-Sequence ModelsRohit Prabhavalkar, Tara N. Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly. 3702-3706 [doi]

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech RecognitionHagen Soltau, Hank Liao, Hasim Sak. 3707-3711 [doi]

CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short UtterancesJinxi Guo, Usha Amrutha Nookala, Abeer Alwan. 3712-3716 [doi]

Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker RecognitionShivesh Ranjan, Abhinav Misra, John H. L. Hansen. 3717-3721 [doi]

i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker RecognitionShivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka. 3722-3726 [doi]

Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker VerificationQiongqiong Wang, Takafumi Koshinaka. 3727-3731 [doi]

Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural NetworksMd. Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann. 3732-3736 [doi]

Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled DataDiego Castán, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez. 3737-3741 [doi]

CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTubeK. Abidi, Mohamed Amine Menacer, Kamel Smaïli. 3742-3746 [doi]

PRAV: A Phonetically Rich Audio Visual CorpusAbhishek Narwekar, Prasanta Kumar Ghosh. 3747-3751 [doi]

NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech RecognitionAhmed Hussen Abdelaziz. 3752-3756 [doi]

The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information DensityDavid M. Howcroft, Dietrich Klakow, Vera Demberg. 3757-3761 [doi]

Automatic Construction of the Finnish Parliament Speech CorpusAndré Mansikkaniemi, Peter Smit, Mikko Kurimo. 3762-3766 [doi]

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to SpeechOmnia Abdo, Sherif M. Abdou, Mervat Fashal. 3767-3771 [doi]

What is the Relevant Population? Considerations for the Computation of Likelihood Ratios in Forensic Voice ComparisonVincent Hughes, Paul Foulkes. 3772-3776 [doi]

Voice Disguise vs. Impersonation: Acoustic and Perceptual Measurements of Vocal Flexibility in Non ExpertsVéronique Delvaux, Lise Caucheteux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies. 3777-3781 [doi]

Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-Linguistic Factors in Large CorporaYaru Wu, Martine Adda-Decker, Cécile Fougeron, Lori Lamel. 3782-3786 [doi]

The Social Life of Setswana EjectivesDaniel Duran 0001, Jagoda Bruni, Grzegorz Dogil, Justus Roux. 3787-3791 [doi]

How Long is Too Long? How Pause Features After Requests Affect the Perceived Willingness of Affirmative AnswersLea S. Kohtz, Oliver Niebuhr. 3792-3796 [doi]

Shadowing Synthesized Speech - Segmental Analysis of Phonetic ConvergenceIona Gessinger, Eran Raveh, Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner. 3797-3801 [doi]

Occupancy Detection in Commercial and Residential Environments Using Audio SignalShabnam Ghaffarzadegan, Attila Reiss, Mirko Ruhs, Robert Dürichen, Zhe Feng. 3802-3806 [doi]

Data Augmentation, Missing Feature Mask and Kernel Classification for Through-the-Wall Acoustic SurveillanceTran Huy Dat, Wen Zheng Terence Ng, Yi Ren Leng. 3807-3811 [doi]

Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech RecognitionShuo-Yiin Chang, Bo Li, Tara N. Sainath, Gabor Simko, Carolina Parada. 3812-3816 [doi]

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian LanguagesArun Baby, Jeena J. Prakash, S. Rupak Vignesh, Hema A. Murthy. 3817-3821 [doi]

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme BoundariesYu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee. 3822-3826 [doi]

Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory NetworksRuiqing Yin, Hervé Bredin, Claude Barras. 3827-3831 [doi]

Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising AutoencoderCong-Thanh Do, Yannis Stylianou. 3832-3836 [doi]

Factored Deep Convolutional Neural Networks for Noise Robust Speech RecognitionMasakiyo Fujimoto. 3837-3841 [doi]

Global SNR Estimation of Speech Signals for Unknown Noise Conditions Using Noise Adapted Non-Linear RegressionPavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan. 3842-3846 [doi]

Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech RecognitionFengpei Ge, Kehuang Li, Bo Wu 0011, Sabato Marco Siniscalchi, Yonghong Yan 0002, Chin-Hui Lee. 3847-3851 [doi]

Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic ModelingDung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani. 3852-3856 [doi]

Attention-Based LSTM with Multi-Task Learning for Distant Speech RecognitionYu Zhang, Pengyuan Zhang, Yonghong Yan 0002. 3857-3861 [doi]

To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple HistoriesHengguan Huang, Brian Mak. 3862-3866 [doi]

End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech RecognitionSuyoun Kim, Ian Lane. 3867-3871 [doi]

Robust Speech Recognition Based on Binaural Auditory ProcessingAnjali Menon, Chanwoo Kim, Richard M. Stern. 3872-3876 [doi]

Adaptive Multichannel Dereverberation for Automatic Speech RecognitionJoe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose. 3877-3881 [doi]

The Effects of Real and Placebo Alcohol on DeaffricationUrban Zihlmann. 3882-3886 [doi]

Polyglot and Speech Corpus Tools: A System for Representing, Integrating, and Querying Speech CorporaMichael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger. 3887-3891 [doi]

Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System TestingVincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo. 3892-3896 [doi]

Effect of Language, Speaking Style and Speaker on Long-Term F0 EstimationPablo Arantes, Anders Eriksson, Suska Gutzeit. 3897-3901 [doi]

Stability of Prosodic Characteristics Across Age and Gender GroupsJan Volín, Tereza Tykalová, Tomás Boril. 3902-3906 [doi]

Electrophysiological Correlates of Familiar Voice RecognitionJulien Plante-Hébert, Victor J. Boucher, Boutheina Jemel. 3907-3910 [doi]

Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme DeletionJamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda. 3911-3915 [doi]

d as a Control Parameter to Explore Affective Correlates of the Tense-Lax ContinuumAndy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl. 3916-3920 [doi]

Cross-Linguistic Distinctions Between Professional and Non-Professional Speaking StylesPlínio Almeida Barbosa, Sandra Madureira, Philippe Boula de Mareüil. 3921-3925 [doi]

Perception and Production of Word-Final /ʁ/ in FrenchCédric Gendrot. 3926-3930 [doi]

Glottal Source Estimation from Coded Telephone Speech Using a Deep Neural NetworkN. P. Narendra, Manu Airaksinen, Paavo Alku. 3931-3935 [doi]

Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert ListenersGeorge Christodoulides, Mathieu Avanzi, Anne-Catherine Simon. 3936-3940 [doi]

Don't Count on ASR to Transcribe for You: Breaking Bias with Two CrowdsMichael Levit, Yan Huang, Shuangyu Chang, Yifan Gong. 3941-3945 [doi]

Effects of Training Data Variety in Generating Glottal Pulses from Acoustic Features with DNNsManu Airaksinen, Paavo Alku. 3946-3950 [doi]

Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real WorldSimone Hantke, Zixing Zhang 0001, Björn W. Schuller. 3951-3955 [doi]

Principles for Learning Controllable TTS from Annotated and Latent VariationGustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi. 3956-3960 [doi]

Sampling-Based Speech Parameter Generation Using Moment-Matching NetworksShinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari. 3961-3965 [doi]

Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural NetsVincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Domenico Batzu. 3966-3970 [doi]

Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR DataErica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg. 3971-3975 [doi]

Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion ScoresAndrew Rosenberg, Bhuvana Ramabhadran. 3976-3980 [doi]

Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech SynthesisNagaraj Adiga, S. R. Mahadeva Prasanna. 3981-3985 [doi]

Evaluation of a Silent Speech Interface Based on Magnetic Sensing and Deep Learning for a Phonetically Rich VocabularyJosé A. González 0001, Lam Aun Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth. 3986-3990 [doi]

Predicting Head Pose from Speech with a Conditional Variational AutoencoderDavid Greenwood, Stephen D. Laycock, Iain Matthews. 3991-3995 [doi]

Real-Time Reactive Speech Synthesis: Incorporating InterruptionsMirjam Wester, David A. Braude, Blaise Potard, Matthew P. Aylett, Francesca Shaw. 3996-4000 [doi]

A Neural Parametric Singing SynthesizerMerlijn Blaauw, Jordi Bonada. 4001-4005 [doi]

Tacotron: Towards End-to-End Speech SynthesisYuxuan Wang, R. J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc V. Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous. 4006-4010 [doi]

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech SystemTim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, Becci Williamson, David Winarsky, Zhizheng Wu, Hepeng Zhang. 4011-4015 [doi]

An Expanded Taxonomy of Semiotic Classes for Text NormalizationDaan van Esch, Richard Sproat. 4016-4020 [doi]

Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency SpectraToru Nakashika, Shinji Takaki, Junichi Yamagishi. 4021-4025 [doi]

Soundtracing for Realtime Speech Adjustment to Environmental Conditions in 3D SimulationsBartosz Ziólko, Tomasz Pedzimaz, Szymon Piotr Palka. 4026-4027 [doi]

Vocal-Tract Model with Static Articulators: Lips, Teeth, Tongue, and MoreTakayuki Arai. 4028-4029 [doi]

Remote Articulation Test System Based on WebRTCIkuyo Masuda-Katsuse. 4030-4031 [doi]

The ModelTalker Project: A Web-Based Voice Banking Pipeline for ALS/MND PatientsH. Timothy Bunnell, Jason Lilley, Kathleen McGrath. 4032-4033 [doi]

Visible Vowels: A Tool for the Visualization of Vowel VariationWilbert Heeringa, Hans Van de Velde. 4034-4035 [doi]

External Links

Cite Key

Statistics

PDF

Researchr

Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017

Abstract

Table of Contents