Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016

researchr

You are not signed in
Sign in
Sign up

Nelson Morgan, editor, Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016. ISCA, 2016. [doi]

Conference: interspeech2016

Abstract is missing.

DiscussionNaomi Harte, Peter Jancovic, Karl-L. Schuchmann. [doi]

Closing RemarksNaomi Harte, Peter Jancovic, Karl-L. Schuchmann. [doi]

Speaker Comparison for Forensic and Investigative Applications IIJean-François Bonastre, Joseph P. Campbell, Anders Eriksson, Hirotaka Nakasone, Reva Schwartz. [doi]

DiscussionDayana Ribas, Emmanuel Vincent, John H. L. Hansen, Emma Jokinen, Mirco Ravanelli, Hannes Gamper, Fred Richardson. [doi]

Computational Approaches to Linguistic Code SwitchingMona T. Diab, Pascale Fung, Julia Hirschberg, Thamar Solorio. [doi]

IntroductionNaomi Harte, Peter Jancovic, Karl-L. Schuchmann. [doi]

Poster Overview PresentationsNaomi Harte, Peter Jancovic, Karl-L. Schuchmann. [doi]

Mindfulness Special EventNikki Mirghafori. [doi]

DiscussionBjörn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang 0014, Eduardo Coutinho, Keelan Evanini. [doi]

The Native Language Sub-Challenge: The DataBjörn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang 0014, Eduardo Coutinho, Keelan Evanini. [doi]

The INTERSPEECH 2016 Computational Paralinguistics Challenge: A Summary of ResultsBjörn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang 0014, Eduardo Coutinho, Keelan Evanini. [doi]

Introduction to Poster Presentation of Part IIJeesun Kim, Gérard Bailly. [doi]

The Deception Sub-Challenge: The DataBjörn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron C. Elkins, Yue Zhang 0014, Eduardo Coutinho, Keelan Evanini. [doi]

Speech VenturesNicolas Scheffer, Korbinian Riedhammer, Alexandre Lebrun, David Suendermann-Oeft. [doi]

The Sincerity Sub-Challenge: The DataBjörn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang 0014, Eduardo Coutinho, Keelan Evanini. [doi]

A 50-Year Retrospective on Speech and Language ProcessingJohn Makhoul. 1 [doi]

Improving English Conversational Telephone Speech RecognitionIvan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy. 2-6 [doi]

The IBM 2016 English Conversational Telephone Speech Recognition SystemGeorge Saon, Tom Sercu, Steven J. Rennie, Hong-Kwang Jeff Kuo. 7-11 [doi]

Small-Footprint Deep Neural Networks with Highway Connections for Speech RecognitionLiang Lu, Steve Renals. 12-16 [doi]

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and AttentionDong Yu, Wayne Xiong, Jasha Droppo, Andreas Stolcke, Guoli Ye, Jinyu Li, Geoffrey Zweig. 17-21 [doi]

Lower Frame Rate Neural Network Acoustic ModelsGolan Pundak, Tara N. Sainath. 22-26 [doi]

Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic ModelingGakuto Kurata, Brian Kingsbury. 27-31 [doi]

Automatic Scoring of Monologue Video Interviews Using Multimodal CuesLei Chen 0004, Gary Feng, Michelle Martin-Raugh, Chee Wee Leong, Christopher Kitchen, Su-Youn Yoon, Blair Lehman, Harrison Kell, Chong Min Lee. 32-36 [doi]

The Sound of Disgust: How Facial Expression May Influence Speech ProductionChee Seng Chong, Jeesun Kim, Chris Davis. 37-41 [doi]

Analyzing Temporal Dynamics of Dyadic Synchrony in Affective InteractionsZhaojun Yang, Shrikanth S. Narayanan. 42-46 [doi]

Audiovisual Speech Scene Analysis in the Context of Competing SourcesAttigodu C. Ganesh, Frédéric Berthommier, Jean-Luc Schwartz. 47-51 [doi]

Head Motion Generation with Synthetic Speech: A Data Driven ApproachNajmeh Sadoughi, Carlos Busso. 52-56 [doi]

The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic AttitudesJeesun Kim, Chris Davis. 57-61 [doi]

The Unit of Speech Encoding: The Case of RomanianIrene Vogel, Laura Spinu. 62-66 [doi]

The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented GermanJeanin Jügler, Frank Zimmerer, Jürgen Trouvain, Bernd Möbius. 67-71 [doi]

Organizing Syllables into Sandhi Domains - Evidence from F0 and Duration Patterns in Shanghai ChineseBijun Ling, Jie Liang. 72-76 [doi]

Automatic Analysis of Phonetic Speech Style DimensionsNeville Ryant, Mark Liberman. 77-81 [doi]

The Acoustic Manifestation of Prominence in Stressless LanguagesAngeliki Athanasopoulou, Irene Vogel. 82-86 [doi]

The Rhythmic Constraint on Prosodic Boundaries in Mandarin Chinese Based on Corpora of Silent Reading and Speech PerceptionWei Lai, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Liberman. 87-91 [doi]

Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial ExpressionsFu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee. 92-96 [doi]

Predicting Severity of Voice Disorder from DNN-HMM Acoustic PosteriorsTan Lee, Yuanyuan Liu, Yu Ting Yeung, Thomas K. T. Law, Kathy Y. S. Lee. 97-101 [doi]

Long-Term Stability of Tracheoesophageal VoicesKlaske E. van Sluis, Michiel W. M. van den Brekel, Frans J. M. Hilgers, Rob J. J. H. van Son. 102-106 [doi]

Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature SelectionGábor Gosztolya, László Tóth, Tamás Grósz, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Magdolna Pákáski, János Kálmán. 107-111 [doi]

Towards an Automated Screening Tool for Developmental Speech and Language ImpairmentsJen J. Gong, Maryann Gong, Dina Levy-Lambert, Jordan R. Green, Tiffany P. Hogan, John V. Guttag. 112-116 [doi]

Spectral Enhancement of Cleft Lip and Palate SpeechVikram C. M., Nagaraj Adiga, S. R. Mahadeva Prasanna. 117-121 [doi]

Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression AlgorithmsTian Guan, Guangxing Chu, Fei Chen, Feng Yang. 122-125 [doi]

Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level ConstraintsTudor-Catalin Zorila, Sheila Flanagan, Brian C. J. Moore, Yannis Stylianou. 126-130 [doi]

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant ProminenceBidisha Sharma, S. R. Mahadeva Prasanna. 131-135 [doi]

Relative Contributions of Amplitude and Phase to the Intelligibility Advantage of Ideal Binary Masked SentencesLei Wang, Shufeng Zhu, Diliang Chen, Yong Feng, Fei Chen. 136-139 [doi]

Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation AlgorithmQingju Liu, Yan Tang, Philip J. B. Jackson, Wenwu Wang. 140-144 [doi]

Automated Pause Insertion for Improved Intelligibility Under ReverberationPetko N. Petkov, Norbert Braunschweiler, Yannis Stylianou. 145-149 [doi]

Automatic Classification of Phonation Modes in Singing Voice: Towards Singing Style Characterisation and Application to Ethnomusicological RecordingsJean-Luc Rouas, Leonidas Ioannidis. 150-154 [doi]

Novel Nonlinear Prediction Based Features for Spoofed Speech DetectionHimanshu N. Bhavsar, Tanvina B. Patel, Hemant A. Patil. 155-159 [doi]

Robust Vowel Landmark Detection Using Epoch-Based FeaturesSri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, B. Yegnanarayana. 160-164 [doi]

Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction SettingsJohannes Töger, Yongwan Lim, Sajan Goud Lingala, Shrikanth S. Narayanan, Krishna S. Nayak. 165-169 [doi]

Sound Pattern Matching for Automatic Prosodic Event DetectionMilos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner, Hervé Bourlard. 170-174 [doi]

Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep LearningMostafa Ali Shahin, Julien Epps, Beena Ahmed. 175-179 [doi]

Development of Mandarin Onset-Rime Detection in Relation to Age and Pinyin InstructionFei Chen, Nan Yan, Xunan Huang, Hao Zhang, Lan Wang, Gang Peng. 180-184 [doi]

Joint Effect of Dialect and Mandarin on English Vowel Production: A Case Study in Changsha EFL LearnersXinyi Wen, Yuan Jia. 185-189 [doi]

Effects of L1 Phonotactic Constraints on L2 Word Segmentation StrategiesTamami Katayama. 190-194 [doi]

Putting German [ʃ] and [ç] in Two Different Boxes: Native German vs L2 German of French LearnersJane Wottawa, Martine Adda-Decker, Frédéric Isel. 195-199 [doi]

Naturalness Judgement of L2 English Through Dubbing PracticeDean Luo, Ruxin Luo, Lixin Wang. 200-203 [doi]

Audiovisual Training Effects for Japanese Children Learning English /r/-/l/Yasuaki Shinohara. 204-207 [doi]

L2 Acquisition and Production of the English Rhotic Pharyngeal GestureSarah Harper, Louis Goldstein, Shrikanth S. Narayanan. 208-212 [doi]

Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary ResultsAlexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen. 213-217 [doi]

Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological SpeechEmre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik. 218-222 [doi]

Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric SpeechImed Laaridh, Corinne Fredouille, Christine Meunier. 223-227 [doi]

Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral EstimationChitralekha Bhat, Bhavik Vachhani, Sunil Kumar Kopparapu. 228-232 [doi]

Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum DisordersFei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng. 233-237 [doi]

Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch ModulationKathleen F. Nagle, James T. Heaton. 238-242 [doi]

Identifying Hearing Loss from Learned Speech KernelsShamima Najnin, Bonny Banerjee, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson. 243-247 [doi]

Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral SclerosisPanying Rong, Yana Yunusova, Jordan R. Green. 248-252 [doi]

The Production of Intervocalic Glides in Non Dysarthric Parkinsonian SpeechVéronique Delvaux, V. Roland, Kathy Huet, Myriam Piccaluga, M. C. Haelewyck, Bernard Harmegnies. 253-256 [doi]

Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or PalateYang Feng, Zhang Lu. 257-261 [doi]

Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear ImplantsZhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki. 262-266 [doi]

Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy VoicingKeiko Ochi, Koichi Mori, Naomi Sakai, Nobutaka Ono. 267-271 [doi]

Effect of Noise on Lexical Tone Perception in Cantonese-Speaking AmusicsJing Shao, Caicai Zhang, Gang Peng, Yike Yang, William S.-Y. Wang. 272-276 [doi]

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing LossYuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono. 277-281 [doi]

Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore MandarinYuling Gu, Boon Pang Lim, Nancy F. Chen. 282-286 [doi]

A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training SentencesFeng-Long Xie, Frank K. Soong, Haifeng Li. 287-291 [doi]

Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix FactorizationRyo Aihara, Tetsuya Takiguchi, Yasuo Ariki. 292-296 [doi]

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural NetworksYu Gu, Zhen-Hua Ling, Li-Rong Dai. 297-301 [doi]

Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame FeaturesYi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu. 302-306 [doi]

Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global VarianceNaoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. 307-311 [doi]

Comparing Articulatory and Acoustic Strategies for Reducing Non-Native AccentsSandesh Aryal, Ricardo Gutierrez-Osuna. 312-316 [doi]

Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited DataSeyyed Saeed Sarfjoo, Cenk Demiroglu. 317-321 [doi]

Personalized, Cross-Lingual TTS Using Phonetic PosteriorgramsLifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen M. Meng. 322-326 [doi]

Acoustic Analysis of Syllables Across Indian LanguagesAnusha Prakash, Jeena J. Prakash, Hema A. Murthy. 327-331 [doi]

Objective Evaluation Methods for Chinese Text-To-Speech SystemsTeng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert. 332-336 [doi]

Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech SynthesisYusuke Ijima, Taichi Asami, Hideyuki Mizuno. 337-341 [doi]

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural NetworksTakenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda. 342-346 [doi]

Text-to-Speech for Individuals with Vision Loss: A User StudyMonika Podsiadlo, Shweta Chahar. 347-351 [doi]

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural NetworksCassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi. 352-356 [doi]

Data Selection and Adaptation for Naturalness in HMM-Based Speech SynthesisErica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg. 357-361 [doi]

A Portable Automatic PA-TA-KA Syllable Detection System to Derive Biomarkers for Neurological DisordersFei Tao, Louis Daudet, Christian Poellabauer, Sandra L. Schneider, Carlos Busso. 362-366 [doi]

Deep Neural Networks for i-Vector Language Identification of Short Utterances in CarsOmid Ghahabi, Antonio Bonafonte, Javier Hernando, Asunción Moreno. 367-371 [doi]

Improving i-Vector and PLDA Based Speaker Clustering with Long-Term FeaturesAbraham Woubie, Jordi Luque, Javier Hernando. 372-376 [doi]

Open Language Interface for Voice Exploitation (OLIVE)Aaron Lawson, Mitchell McLaren, Harry Bratt, Martin Graciarena, Horacio Franco, Christopher George, Allen R. Stauffer, Chris Bartels, Julien van Hout. 377-378 [doi]

A Multimodal Dialogue System for Air Traffic Control Trainees Based on Discrete-Event SimulationLubos Smídl, Adam Chýlek, Jan Svec. 379-380 [doi]

Lig-Aikuma: A Mobile App to Collect Parallel Speech for Under-Resourced Language StudiesElodie Gauthier, David Blachon, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker, Annie Rialland, Gilles Adda, Grégoire Bachman. 381-382 [doi]

ARET - Automatic Reading of Educational Texts for Visually Impaired StudentsMartin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Zdenek Krnoul, Zbynek Zajíc. 383-384 [doi]

Segmental Recurrent Neural Networks for End-to-End Speech RecognitionLiang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals. 385-389 [doi]

Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional UnitsMarkus Nußbaum-Thom, Jia Cui, Bhuvana Ramabhadran, Vaibhava Goel. 390-394 [doi]

Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech RecognitionWei-Ning Hsu, Yu Zhang, Ann Lee, James R. Glass. 395-399 [doi]

Stimulated Deep Neural Network for Speech RecognitionChunyang Wu, Penny Karanasou, Mark J. F. Gales, Khe Chai Sim. 400-404 [doi]

Phonetic Context Embeddings for DNN-HMM Phone RecognitionLeonardo Badino. 405-409 [doi]

Towards End-to-End Speech Recognition with Deep Convolutional Neural NetworksYing Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron C. Courville. 410-414 [doi]

Joint Speaker and Lexical Modeling for Short-Term Characterization of SpeakerGuangsen Wang, Kong-Aik Lee, Trung Hieu Nguyen, Hanwu Sun, Bin Ma. 415-419 [doi]

Tandem Features for Text-Dependent Speaker Verification on the RedDots CorpusMd. Jahangir Alam, Patrick Kenny, Vishwa Gupta. 420-424 [doi]

Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBMAchintya Kumar Sarkar, Zheng-Hua Tan. 425-429 [doi]

Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots CorpusTomi Kinnunen, Md. Sahidullah, Ivan Kukanov, Héctor Delgado, Massimiliano Todisco, Achintya Kumar Sarkar, Nicolai Bæk Thomsen, Ville Hautamäki, Nicholas W. D. Evans, Zheng-Hua Tan. 430-434 [doi]

Parallel Speaker and Content Modelling for Text-Dependent Speaker VerificationJianbo Ma, Saad Irtza, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah. 435-439 [doi]

i-Vector/HMM Based Text-Dependent Speaker Verification System for RedDots ChallengeHossein Zeinali, Hossein Sameti, Lukás Burget, Jan Cernocký, Nooshin Maghsoodi, Pavel Matejka. 440-444 [doi]

Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short UtterancesRohan Kumar Das, Sarfaraz Jelil, S. R. Mahadeva Prasanna. 445-449 [doi]

Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another SpeakerHidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu. 450-454 [doi]

Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech InversionGanesh Sivaraman, Vikramjit Mitra, Hosung Nam, Mark K. Tiede, Carol Y. Espy-Wilson. 455-459 [doi]

Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance ImagingAdam C. Lammert, Christine H. Shadle, Shrikanth S. Narayanan, Thomas F. Quatieri. 460-464 [doi]

Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRITanner Sorensen, Asterios Toutios, Louis Goldstein, Shrikanth S. Narayanan. 465-469 [doi]

Tracking Contours of Orofacial Articulators from Real-Time MRI of SpeechMathieu Labrunie, Pierre Badin, Dirk Voit, Arun A. Joseph, Laurent Lamalle, Coriandre Vilain, Louis-Jean Boë, Jens Frahm. 470-474 [doi]

State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and FunctionSajan Goud Lingala, Asterios Toutios, Johannes Töger, Yongwan Lim, Yinghua Zhu, Yoon-Chul Kim, Colin Vaz, Shrikanth S. Narayanan, Krishna S. Nayak. 475-479 [doi]

DBN-ivector Framework for Acoustic Emotion RecognitionRui Xia, Yang Liu. 480-484 [doi]

An Investigation of Emotional Speech in Depression ClassificationBrian Stasak, Julien Epps, Nicholas Cummins, Roland Goecke. 485-489 [doi]

Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning SamplesReza Lotfian, Carlos Busso. 490-494 [doi]

At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in SpeechMaximilian Schmitt, Fabien Ringeval, Björn W. Schuller. 495-499 [doi]

Speech Emotion Recognition Using Affective SaliencyArodami Chorianopoulou, Polychronis Koutsakis, Alexandros Potamianos. 500-504 [doi]

Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic CuesRahul Gupta, Nishant Nath, Taruna Agrawal, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan. 505-509 [doi]

Respiratory Belts and Whistles: A Preliminary Study of Breathing Acoustics for Turn-TakingMarcin Wlodarczak, Mattias Heldner. 510-514 [doi]

/r/ as Language Marker in Bilingual Speech Production and PerceptionConstantijn Kaland, Vincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti. 515-519 [doi]

Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-Native SpeechManfred Pützer, Frank Zimmerer, Wolfgang Wokurek, Jeanin Jügler. 520-524 [doi]

0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean SpeechSofia Strömbergsson. 525-529 [doi]

A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert TransformLei He, Volker Dellwo. 530-534 [doi]

Likelihood Ratio Calculation in Acoustic-Phonetic Forensic Voice Comparison: Comparison of Three Statistical Modelling ApproachesEwald Enzinger. 535-539 [doi]

A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer FunctionsXiaoke Qi, Jianhua Tao. 540-544 [doi]

Single-Channel Multi-Speaker Separation Using Deep ClusteringYusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey. 545-549 [doi]

Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech SeparationHao Li, Shuai Nie, Xueliang Zhang, Hui Zhang. 550-554 [doi]

A Feature Study for Masking-Based Reverberant Speech SeparationMasood Delfarah, DeLiang Wang. 555-559 [doi]

Discriminative Layered Nonnegative Matrix Factorization for Speech SeparationChung-Chien Hsu, Tai-Shih Chi, Jen-Tzung Chien. 560-564 [doi]

On Discriminative Framework for Single Channel Audio Source SeparationArpita Gang, Pravesh Biyani. 565-569 [doi]

Generating Natural Video Descriptions via Multimodal ProcessingQin Jin, Junwei Liang, Xiaozhu Lin. 570-574 [doi]

Feature-Level Decision Fusion for Audio-Visual Word Prominence DetectionMartin Heckmann. 575-579 [doi]

Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted SpeechSlim Ouni, Vincent Colotte, Sara Dahmani, Soumaya Azzi. 580-584 [doi]

Characterization of Audiovisual Dramatic AttitudesAdela Barbulescu, Rémi Ronfard, Gérard Bailly. 585-589 [doi]

Conversational Engagement Recognition Using Auditory and Visual CuesYuyun Huang, Emer Gilmartin, Nick Campbell. 590-594 [doi]

An Acoustic Analysis of Child-Child and Child-Robot Interactions for Understanding Engagement during Speech-Controlled Computer GamesTheodora Chaspari, Jill Fain Lehman. 595-599 [doi]

Auditory-Visual Lexical Tone Perception in Thai Elderly Listeners with and without Hearing ImpairmentBenjawan Kasisopa, Chutamanee Onsuwan, Charturong Tantibundhit, Nittayapa Klangpornkun, Suparak Techacharoenrungrueang, Sudaporn Luksaneeyanawin, Denis Burnham. 600-604 [doi]

Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion RecognitionHossein Khaki, Engin Erzin. 605-609 [doi]

Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition ModelMarc René Schädler, David Hülsmeier, Anna Warzybok, Sabine Hochmuth, Birger Kollmeier. 610-614 [doi]

DNN-Based Automatic Speech Recognition as a Model for Human Phoneme PerceptionMats Exter, Bernd T. Meyer. 615-619 [doi]

Undoing Misperceptions: A Microscopic Analysis of Consistent Confusions Through Signal ModificationsMáté Attila Tóth, Martin Cooke. 620-624 [doi]

Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMsMahdie Karbasi, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa. 625-629 [doi]

Misperceptions Arising from Speech-in-Babble InteractionsMáté Attila Tóth, Martin Cooke, Jon Barker. 630-634 [doi]

Introducing Temporal Rate Coding for Speech in Cochlear Implants: A Microscopic Evaluation in Humans and ModelsAnja Eichenauer, Mathias Dietz, Bernd T. Meyer, Tim Jürgens. 635-639 [doi]

Language Effects in Noise-Induced Word MisperceptionsMaria Luisa Garcia Lecumberri, Jon Barker, Ricard Marxer, Martin Cooke. 640-644 [doi]

Speech Reductions Cause a De-Weighting of Secondary Acoustic CuesLéo Varnet, Fanny Meunier, Michel Hoen. 645-649 [doi]

Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic IntelligibilityLionel Fontan, Isabelle Ferrané, Jérôme Farinas, Julien Pinquier, Xavier Aumont. 650-654 [doi]

The Impact of Manner of Articulation on the Intelligibility of Voicing Contrast in Noise: Cross-Linguistic ImplicationsMayuki Matsui. 655-659 [doi]

Directly Comparing the Listening Strategies of Humans and MachinesMichael I. Mandel. 660-664 [doi]

LSTM-Based NeuroCRFs for Named Entity RecognitionMarc-Antoine Rondeau, Yi Su. 665-669 [doi]

Exploring Word Mover's Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News SummarizationShih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu. 670-674 [doi]

Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech RecognitionImran A. Sheikh, Irina Illina, Dominique Fohr, Georges Linarès. 675-679 [doi]

Beyond Utterance Extraction: Summary Recombination for Speech SummarizationJérémy Trione, Benoît Favre, Frédéric Béchet. 680-684 [doi]

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot FillingBing Liu, Ian Lane. 685-689 [doi]

Domain Adaptation of Recurrent Neural Networks for Natural Language UnderstandingAaron Jaech, Larry Heck, Mari Ostendorf. 690-694 [doi]

LatticeRnn: Recurrent Neural Networks Over LatticesFaisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister. 695-699 [doi]

Learning Document Representations Using Subspace Multinomial ModelSantosh Kesiraju, Lukás Burget, Igor Szöke, Jan Cernocký. 700-704 [doi]

Attention-Based Convolutional Neural Networks for Sentence ClassificationZhiwei Zhao, Youzheng Wu. 705-709 [doi]

Spoken Language Understanding in a Latent Topic-Based SubspaceMohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès. 710-714 [doi]

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTMDilek Hakkani-Tür, Gökhan Tür, Asli Çelikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, Ye-Yi Wang. 715-719 [doi]

Deep Stacked Autoencoders for Spoken Language UnderstandingKillian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato de Mori. 720-724 [doi]

Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot FillingGakuto Kurata, Bing Xiang, Bowen Zhou. 725-729 [doi]

Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language UnderstandingSabrina Stehwien, Ngoc Thang Vu. 730-734 [doi]

Analysis on Gated Recurrent Unit Based Question Detection ApproachYaodong Tang, Zhiyong Wu, Helen M. Meng, Mingxing Xu, Lianhong Cai. 735-739 [doi]

Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term DetectionShuji Oishi, Tatsuya Matsuba, Mitsuaki Makino, Atsuhiko Kai. 740-744 [doi]

A Novel Discriminative Score Calibration Method for Keyword SearchZhiqiang Lv, Meng Cai, Wei-Qiang Zhang, Jia Liu. 745-749 [doi]

Segmented Dynamic Time Warping for Spoken Query-by-Example SearchJorge Proença, Fernando Perdigão. 750-754 [doi]

Generating Complementary Acoustic Model Spaces in DNN-Based Sequence-to-Frame DTW Scheme for Out-of-Vocabulary Spoken Term DetectionShi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh 0001. 755-759 [doi]

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword SpottingSankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni. 760-764 [doi]

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence AutoencoderYu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-yi Lee, Lin-Shan Lee. 765-769 [doi]

Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword SpottingZhong Meng, Biing-Hwang Juang. 770-774 [doi]

Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training ConditionsArseniy Gorin, Rasa Lileikyte, Guangpu Huang, Lori Lamel, Jean-Luc Gauvain, Antoine Laurent. 775-779 [doi]

STON: Efficient Subtitling in Dutch Using State-of-the-Art ToolsLyan Verwimp, Brecht Desplanques, Kris Demuynck, Joris Pelemans, Marieke Lycke, Patrick Wambacq. 780-781 [doi]

An Automatic Training Tool for Air Traffic Control TrainingPetr Stanislav, Lubos Smídl, Jan Svec. 782-783 [doi]

Digitala: An Augmented Test and Review Process Prototype for High-Stakes Spoken Foreign Language ExaminationReima Karhila, Aku Rouhe, Peter Smit, André Mansikkaniemi, Heini Kallio, Erik Lindroos, Raili Hildén, Martti Vainio, Mikko Kurimo. 784-785 [doi]

Exploring Collections of Multimedia Archives Through Innovative Interfaces in the Context of Digital HumanitiesGéraldine Damnati, Delphine Charlet, Marc Denjean. 786-787 [doi]

Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair InformationYougen Yuan, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li. 788-792 [doi]

Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic ModelingYuzong Liu, Katrin Kirchhoff. 793-797 [doi]

Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech RecognitionBasil Abraham, Srinivasan Umesh, Neethu Mariam Joy. 798-802 [doi]

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic ModelsTasha Nagamine, Michael L. Seltzer, Nima Mesgarani. 803-807 [doi]

Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic ModelingEhsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani. 808-812 [doi]

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR TasksTara N. Sainath, Bo Li. 813-817 [doi]

The Speakers in the Wild (SITW) Speaker Recognition DatabaseMitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson. 818-822 [doi]

The 2016 Speakers in the Wild Speaker Recognition EvaluationMitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson. 823-827 [doi]

Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 ChallengeOndrej Novotný, Pavel Matejka, Oldrich Plchot, Ondrej Glembek, Lukás Burget, Jan Cernocký. 828-832 [doi]

A Speaker Recognition System for the SITW ChallengeOleg Kudashev, Sergey Novoselov, Konstantin Simonchik, Alexander Kozlov. 833-837 [doi]

Speakers In The Wild (SITW): The QUT Speaker Recognition SystemHouman Ghaemmaghami, Md. Hafizur Rahman, Ivan Himawan, David Dean, Ahilan Kanagasundaram, Sridha Sridharan, Clinton Fookes. 838-842 [doi]

AUT System for SITW Speaker Recognition ChallengeAbbas Khosravani, Mohammad Mehdi Homayounpour. 843-847 [doi]

LIA System for the SITW Speaker Recognition ChallengeWaad Ben Kheder, Moez Ajili, Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre. 848-852 [doi]

Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition ChallengeYi Liu, Yao Tian, Liang He, Jia Liu. 853-857 [doi]

Does the Importance of Word-Initial and Word-Final Information Differ in Native versus Non-Native Spoken-Word Recognition?Odette Scharenborg, Juul Coumans, Sofoklis Kakouros, Roeland Van Hout. 858-862 [doi]

The Effect of Sentence Accent on Non-Native Speech Perception in NoiseOdette Scharenborg, Elea Kolkman, Sofoklis Kakouros, Brechtje Post. 863-867 [doi]

The Effects of Modified Speech Styles on Intelligibility for Non-Native ListenersMartin Cooke, Maria Luisa Garcia Lecumberri. 868-872 [doi]

The Influence of Language Experience on the Categorical Perception of Vowels: Evidence from Mandarin and KoreanHao Zhang, Fei Chen, Nan Yan, Lan Wang, Feng Shi, Manwa L. Ng. 873-877 [doi]

Multiple Influences on Vocabulary Acquisition: Parental Input DominatesDominic W. Massaro. 878-882 [doi]

Can Intensive Exposure to Foreign Language Sounds Affect the Perception of Native Sounds?Jian Gong, Maria Luisa Garcia Lecumberri, Martin Cooke. 883-887 [doi]

Privacy-Preserving Speech Analytics for Automatic Assessment of Student CollaborationNikoletta Bassiou, Andreas Tsiartas, Jennifer Smith, Harry Bratt, Colleen Richey, Elizabeth Shriberg, Cynthia D'Angelo, Nonye Alozie. 888-892 [doi]

Complexity in Prosody: A Nonlinear Dynamical Systems Approach for Dyadic Conversations; Behavior and Outcomes in Couples TherapyMd. Nasir, Brian R. Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou. 893-897 [doi]

Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language ModelsShao-Yen Tseng, Sandeep Nallan Chakravarthula, Brian G. Baucom, Panayiotis G. Georgiou. 898-902 [doi]

Speech Likability and Personality-Based Social Relations: A Round-Robin Analysis over Communication ChannelsLaura Fernández Gallardo, Benjamin Weiss. 903-907 [doi]

Behavioral Coding of Therapist Language in Addiction Counseling Using Recurrent Neural NetworksBo Xiao, Dogan Can, James Gibson, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 908-912 [doi]

Factor Analysis Based Speaker Normalisation for Continuous Emotion PredictionTing Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah. 913-917 [doi]

Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term DetectionDhananjay Ram, Afsaneh Asaei, Hervé Bourlard. 918-922 [doi]

Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term DetectionHongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li. 923-927 [doi]

A Nonparametric Bayesian Approach for Spoken Term Detection by Example QueryAmir Hossein Harati Nejad Torbati, Joseph Picone. 928-932 [doi]

Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword SamplesVan Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li. 933-937 [doi]

Unrestricted Vocabulary Keyword Spotting Using LSTM-CTCYimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu. 938-942 [doi]

Interactive Spoken Content Retrieval by Deep Reinforcement LearningYen-Chen Wu, Tzu-Hsiang Lin, Yang-de Chen, Hung-yi Lee, Lin-Shan Lee. 943-947 [doi]

Relating Estimated Cyclic Spectral Peak Frequency to Measured Epilarynx Length Using Magnetic Resonance ImagingElizabeth Godoy, Andrew Dumas, Jennifer Melot, Nicolas Malyska, Thomas F. Quatieri. 948-952 [doi]

Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture ModelPatrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura. 953-957 [doi]

Formant Estimation and Tracking Using Deep LearningYehoshua Dissen, Joseph Keshet. 958-962 [doi]

Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series DataColin Vaz, Asterios Toutios, Shrikanth S. Narayanan. 963-967 [doi]

Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse FilteringLauri Juvela, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku. 968-972 [doi]

0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech RecognitionXiaoyun Wang, Xugang Lu, Hisashi Kawai, Seiichi Yamamoto. 973-977 [doi]

Vowels and Diphthongs in Cangnan Southern Min Chinese DialectFang Hu, Chunyu Ge. 978-982 [doi]

Diphthongization of Nuclear Vowels and the Emergence of a Tetraphthong in Hetang CantoneseWenqi Hu, Fang Hu, Jian Jin. 983-987 [doi]

PhonVoc: A Phonetic and Phonological Vocoding ToolkitMilos Cernak, Philip N. Garner. 988-992 [doi]

Vowels and Diphthongs in the Taiyuan Jin Chinese DialectLiping Xia, Fang Hu. 993-997 [doi]

The Effects of Prosody on French V-to-V Coarticulation: A Corpus-Based StudyGiuseppina Turco, Cécile Fougeron, Nicolas Audibert. 998-1001 [doi]

An Acoustic Analysis of /r/ in TyroleanVincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti, Constantijn Kaland. 1002-1006 [doi]

Hyperarticulated Production of Korean Glides by Age GroupSeung-Eun Chang, Minsook Kim. 1007-1010 [doi]

Coda Stop and Taiwan Min Checked Tone Sound ChangesHo-hsien Pan, Hsiao-tung Huang, Shao-Ren Lyu. 1011-1015 [doi]

The Influence of Modality and Speaking Style on the Assimilation Type and Categorization Consistency of Non-Native SpeechSarah E. Fenwick, Catherine T. Best, Chris Davis, Michael D. Tyler. 1016-1020 [doi]

Prosodic Convergence with Spoken Stimuli in Laboratory DataMargaret Zellers. 1021-1025 [doi]

Effects of Stress on Fricatives: Evidence from Standard Modern GreekCharalambos Themistocleous, Angelandria Savva, Andrie Aristodemou. 1026-1029 [doi]

Analysis of Chinese Syllable Durations in Running Speech of Japanese L2 LearnersYue Sun, Shudon Hsiao, Yoshinori Sagisaka, Jin-Song Zhang. 1030-1033 [doi]

Automatic Paragraph Segmentation with Lexical and Prosodic FeaturesCatherine Lai, Mireia Farrús, Johanna D. Moore. 1034-1038 [doi]

Automatic Glottal Inverse Filtering with Non-Negative Matrix FactorizationManu Airaksinen, Lauri Juvela, Tom Bäckström, Paavo Alku. 1039-1043 [doi]

Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker RecognitionSoo-Jin Park, Caroline Sigouin, Jody Kreiman, Patricia A. Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, Abeer Alwan. 1044-1048 [doi]

Analysis of Glottal Stop in Assam Sora LanguageSishir Kalita, Luke Horo, Priyankoo Sarmah, S. R. Mahadeva Prasanna, Samarendra Dandapat. 1049-1053 [doi]

Acoustic Differences Between English /t/ Glottalization and Phrasal CreakMarc Garellek, Scott Seyfarth. 1054-1058 [doi]

The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking StyleAnders Eriksson, Pier Marco Bertinetto, Mattias Heldner, Rosalba Nodari, Giovanna Lenoci. 1059-1063 [doi]

Cross-Gender and Cross-Dialect Tone Recognition for VietnameseAntje Schweitzer, Ngoc Thang Vu. 1064-1068 [doi]

Prosody Modification Using Allpass Residual of Speech SignalsKarthika Vijayan, K. Sri Rama Murty. 1069-1073 [doi]

Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence ProminenceSofoklis Kakouros, Joris Pelemans, Lyan Verwimp, Patrick Wambacq, Okko Räsänen. 1074-1078 [doi]

A Longitudinal Study of Children's Intonation in Narrative SpeechJeffrey Kallay, Melissa A. Redford. 1079-1083 [doi]

Velum Control for Oral SoundsReed Blaylock, Louis Goldstein, Shrikanth S. Narayanan. 1084-1088 [doi]

F0 Development in Acquiring Korean Stop DistinctionGayeon Son. 1089-1093 [doi]

Phonetic Reduction Can Lead to Lengthening, and Enhancement Can Lead to ShorteningClara Cohen, Matt Carlson. 1094-1098 [doi]

Mechanical Production of [b], [m] and [w] Using Controlled Labial and Velopharyngeal GesturesTakayuki Arai. 1099-1103 [doi]

An Improved 3D Geometric Tongue ModelQiang Fang, Yun Chen, Haibo Wang, Jianguo Wei, Jianrong Wang, Xiyu Wu, Aijun Li. 1104-1107 [doi]

Congruency Effect Between Articulation and Grasping in Native English SpeakersMikko Tiainen, Fatima M. Felisberti, Kaisa Tiippana, Martti Vainio, Juraj Simko, Jirí Lukavský, Lari Vainio. 1108-1112 [doi]

Emergence of Vocal Developmental Sequences in a Predictive Coding Model of Speech AcquisitionShamima Najnin, Bonny Banerjee. 1113-1117 [doi]

Categorization of Natural Spanish Whistled Vowels by Naïve Spanish ListenersJulien Meyer, Laure Dentel, Fanny Meunier. 1118-1121 [doi]

Between- and Within-Speaker Effects of Bilingualism on F0 VariationRob Voigt, Dan Jurafsky, Meghan Sumner. 1122-1126 [doi]

Vowel Characteristics in the Assessment of L2 English PronunciationCalbert Graham, Paula Buttery, Francis Nolan. 1127-1131 [doi]

Kulning (Swedish Cattle Calls): Acoustic, EGG, Stroboscopic and High-Speed Video Analyses of an Unusual Singing StyleAhmed Geneid, Anne-Maria Laukkanen, Anita McAllister, Robert Eklund. 1132-1135 [doi]

Glottal Squeaks in VC SequencesMísa Hejná, Pertti Palo, Scott Moisik. 1136-1140 [doi]

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural NetworksNaoya Takahashi, Tofigh Naghibi, Beat Pfister. 1141-1145 [doi]

Personalized Natural Language UnderstandingXiaohu Liu, Ruhi Sarikaya, Liang Zhao, Yong Ni, Yi-Cheng Pan. 1146-1150 [doi]

A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue SystemsLayla El Asri, Jing He, Kaheer Suleman. 1151-1155 [doi]

Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue SystemsSpiros Georgiladakis, Georgia Athanasopoulou, Raveesh Meena, José Lopes, Arodami Chorianopoulou, Elisavet Palogiannidi, Elias Iosif, Gabriel Skantze, Alexandros Potamianos. 1156-1160 [doi]

Making Personal Digital Assistants Aware of What They Do Not KnowOmar Zia Khan, Ruhi Sarikaya. 1161-1165 [doi]

Implementing Acoustic-Prosodic Entrainment in a Conversational AvatarRivka Levitan, Stefan Benus, Ramiro H. Gálvez, Agustín Gravano, Florencia Savoretti, Marián Trnka, Andreas Weise, Julia Hirschberg. 1166-1170 [doi]

Perceived Usability and Cognitive Demand of Secondary Tasks in Spoken Versus Visual-Manual Automotive InteractionAnnika Silvervarg, Sofia Lindvall, Jonatan Andersson, Ida Esberg, Christian Jernberg, Filip Frumerie, Arne Jönsson. 1171-1175 [doi]

Zara: An Empathetic Interactive Virtual AgentPascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Yan Wan, Ricky Ho Yin Chan. 1176-1177 [doi]

Measuring Pronunciation Improvement in Users of CAPT Tool TipTopTalk!Cristian Tejedor García, David Escudero Mancebo, Enrique Cámara Arenas, César González Ferreras, Valentín Cardeñoso-Payo. 1178-1179 [doi]

SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model ComponentHideki Kawahara. 1180-1181 [doi]

Real-Time Tracking of Speakers' Emotions, States, and Traits on Mobile PlatformsErik Marchi, Florian Eyben, Gerhard Hagerer, Björn W. Schuller. 1182-1183 [doi]

The Human Speech CortexEdward Chang. 1184 [doi]

Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental DisordersDaniel Bone, Somer Bishop, Rahul Gupta, Sungbok Lee, Shrikanth S. Narayanan. 1185-1189 [doi]

Automatic Detection of Parkinson's Disease Based on Modulated VowelsDaria Hemmerling, Juan Rafael Orozco-Arroyave, Andrzej Skalski, Janusz Gajda, Elmar Nöth. 1190-1194 [doi]

Towards Automatic Detection of Amyotrophic Lateral Sclerosis from Speech Acoustic and Articulatory SamplesJun Wang, Prasanna V. Kothalkar, Beiming Cao, Daragh Heitzman. 1195-1199 [doi]

Neurophysiological Vocal Source Modeling for Biomarkers of DiseaseGregory Ciccarelli, Thomas F. Quatieri, Satrajit S. Ghosh. 1200-1204 [doi]

Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral SclerosisRachelle L. Horwitz-Martin, Thomas F. Quatieri, Adam C. Lammert, James R. Williamson, Yana Yunusova, Elizabeth Godoy, Daryush D. Mehta, Jordan R. Green. 1205-1209 [doi]

Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of ChildrenFabien Ringeval, Erik Marchi, Charline Grossard, Jean Xavier, Mohamed Chetouani, David Cohen, Björn W. Schuller. 1210-1214 [doi]

Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific KnowledgeSoheil Khorram, John Gideon, Melvin G. McInnis, Emily Mower Provost. 1215-1219 [doi]

Diagnosing People with Dementia Using Automatic Conversation AnalysisBahman Mirheidari, Daniel Blackburn, Markus Reuber, Traci Walker, Heidi Christensen. 1220-1224 [doi]

SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile PlatformsPaul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li. 1225-1229 [doi]

Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016Jordi Bonada, Martí Umbert, Merlijn Blaauw. 1230-1234 [doi]

Vocal Effort Modification for Singing SynthesisOlivier Perrotin, Christophe d'Alessandro. 1235-1239 [doi]

Bertsokantari: a TTS Based Singing Synthesis SystemEder del Blanco, Inma Hernáez, Eva Navas, Xabier Sarasola, Daniel Erro. 1240-1244 [doi]

Evaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative SystemsLionel Feugère, Christophe d'Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel. 1245-1249 [doi]

Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 ModelLuc Ardaillon, Celine Chabot-Canet, Axel Roebel. 1250-1254 [doi]

Optimal Unit Stitching in a Unit Selection Singing Synthesis SystemMarius Cotescu. 1255-1259 [doi]

The Perception of Overlapping Speech: Effects of Speaker Prosody and Listener AttitudesKatherine Hilton. 1260-1264 [doi]

Who Do You Think Will Speak Next? Perception of Turn-Taking Cues in Slovak and Argentine SpanishAgustín Gravano, Pablo Brusco, Stefan Benus. 1265-1269 [doi]

Disentrainment may be a Positive Thing: A Novel Measure of Unsigned Acoustic-Prosodic Synchrony, and its Relation to Speaker EngagementJuan M. Pérez, Ramiro H. Gálvez, Agustín Gravano. 1270-1274 [doi]

Respiratory Turn-Taking CuesMarcin Wlodarczak, Mattias Heldner. 1275-1279 [doi]

The Discourse Marker "so" in Turn-Taking and Turn-Releasing BehaviorEmma Rennie, Rebecca Lunsford, Peter A. Heeman. 1280-1284 [doi]

Acoustic Properties of Formality in Conversational JapaneseEthan Sherr-Ziarko. 1285-1289 [doi]

Inferring Phonemic Classes from CNN Activation Maps Using Clustering TechniquesThomas Pellegrini, Sandrine Mouysset. 1290-1294 [doi]

Joint Learning of Speaker and Phonetic Similarities with Siamese NetworksNeil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux. 1295-1299 [doi]

Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen NetsVikramjit Mitra, Dimitra Vergyri, Horacio Franco. 1300-1304 [doi]

Learning Multiscale Features Directly from WaveformsZhenyao Zhu, Jesse H. Engel, Awni Y. Hannun. 1305-1309 [doi]

Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM ClusteringMichael Heck, Sakriani Sakti, Satoshi Nakamura. 1310-1314 [doi]

Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource ConditionsHaihua Xu, Hang Su, Chongjia Ni, Xiong Xiao, Hao Huang, Eng Siong Chng, Haizhou Li. 1315-1319 [doi]

Recurrent Out-of-Vocabulary Word Detection Using Distribution of FeaturesTaichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda. 1320-1324 [doi]

Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural NetworksNaoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai. 1325-1329 [doi]

Acoustic Word Embeddings for ASR Error DetectionSahar Ghannay, Yannick Estève, Nathalie Camelin, Paul Deléglise. 1330-1334 [doi]

Combining Semantic Word Classes and Sub-Word Unit Speech Recognition for Robust OOV DetectionAxel Horndasch, Anton Batliner, Caroline Kaufhold, Elmar Nöth. 1335-1339 [doi]

Web Data Selection Based on Word Embedding for Low-Resource Speech RecognitionChuandong Xie, Wu Guo, Guoping Hu, Junhua Liu. 1340-1344 [doi]

Colloquialising Modern Standard Arabic Text for Improved Speech RecognitionSarah Al-Shareef, Thomas Hain. 1345-1349 [doi]

Pitch-Range Perception: The Dynamic Interaction Between Voice Quality and Fundamental FrequencyJianjing Kuang, Mark Liberman. 1350-1354 [doi]

Comparing the Contributions of Amplitude and Phase to Speech Intelligibility in a Vocoder-Based Speech Synthesis ModelFei Chen, Benson C. L. Chiao. 1355-1358 [doi]

Modeling Noise Influence to Speech Intelligibility Non-Intrusively by Reduced Speech Dynamic RangeFei Chen. 1359-1362 [doi]

Do GMM Phoneme Classifiers Perceive Synthetic Sibilants as Humans Do?Gábor Pintér, Hiroki Watanabe. 1363-1367 [doi]

Neural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter BankMarina Frye, Cristiano Micheli, Inga M. Schepers, Gerwin Schalk, Jochem W. Rieger, Bernd T. Meyer. 1368-1372 [doi]

Comparing Different Methods for Analyzing ERP SignalsKimberley Mulder, Louis ten Bosch, Lou Boves. 1373-1377 [doi]

Supplementary Motor Area Activation in Disfluency Perception: An fMRI Study of Listener Neural Responses to Spontaneously Produced Unfilled and Filled PausesRobert Eklund, Martin Ingvar. 1378-1381 [doi]

Vowel Fundamental and Formant Frequency Contributions to English and Mandarin Sentence IntelligibilityDaniel Fogerty, Fei Chen. 1382-1386 [doi]

Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion RecognitionChe-Wei Huang, Shrikanth S. Narayanan. 1387-1391 [doi]

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate CompetitionLinchuan Li, Zhiyong Wu, Mingxing Xu, Helen M. Meng, Lianhong Cai. 1392-1396 [doi]

Inter-Speech Clicks in an Interspeech KeynoteJürgen Trouvain, Zofia Malisz. 1397-1401 [doi]

Speaker Age Classification and Regression Using i-VectorsJoanna Grzybowska, Stanislaw Kacprzak. 1402-1406 [doi]

Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples' TherapyHaoQi Li, Brian G. Baucom, Panayiotis G. Georgiou. 1407-1411 [doi]

Automatically Classifying Self-Rated Personality Scores from SpeechGuozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg. 1412-1416 [doi]

Estimation of Children's Physical Characteristics from Their VoicesJill Fain Lehman, Rita Singh. 1417-1421 [doi]

Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map TaskHayakawa Akira, Saturnino Luz, Nick Campbell. 1422-1426 [doi]

Predicting Affective Dimensions Based on Self Assessed Depression SeverityRahul Gupta, Shrikanth S. Narayanan. 1427-1431 [doi]

Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech InformationWen-Yu Huang, Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Chi-Chun Lee. 1432-1436 [doi]

Use of Vowels in Discriminating Speech-Laugh from Laughter and Neutral SpeechSri Harsha Dumpala, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana. 1437-1441 [doi]

A Convex Model for Linguistic Influence in Group ConversationsKan Kawabata, Visar Berisha, Anna Scaglione, Amy LaCross. 1442-1446 [doi]

A Deep Learning Approach to Modeling Empathy in Addiction CounselingJames Gibson, Dogan Can, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 1447-1451 [doi]

Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood DisorderKun-Yi Huang, Chung-Hsien Wu, Yu-Ting Kuo, Fong-Lin Jang. 1452-1456 [doi]

Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme ConversionAbir Masmoudi, Mariem Ellouze, Fethi Bougares, Yannick Estève, Lamia Hadrich Belguith. 1457-1461 [doi]

Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence ModelingSittipong Saychum, Sarawoot Kongyoung, Anocha Rugchatjaroen, Patcharika Chootrakool, Sawit Kasuriya, Chai Wutiwiwatchai. 1462-1466 [doi]

An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips ImagingAurore Jaumard-Hakoun, Kele Xu, Clémence Leboullenger, Pierre Roussel-Ragot, Bruce Denby. 1467-1471 [doi]

Phoneme Embedding and its Application to Speech Driven Talking Avatar SynthesisXu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai. 1472-1476 [doi]

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal DataXu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai. 1477-1481 [doi]

Audio-to-Visual Speech Conversion Using Deep Neural NetworksSarah Taylor, Akihiro Kato, Iain A. Matthews, Ben P. Milner. 1482-1486 [doi]

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann MachineToru Nakashika, Yasuhiro Minami. 1487-1491 [doi]

Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging DataAsterios Toutios, Tanner Sorensen, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan. 1492-1496 [doi]

Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence InformationXurong Xie, Xunying Liu, Lan Wang. 1497-1501 [doi]

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural NetworksZheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai. 1502-1506 [doi]

Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of SpeechChristopher Liberatore, Ricardo Gutierrez-Osuna. 1507-1511 [doi]

On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS EngineDavid Guennec, Damien Lolive. 1512-1516 [doi]

Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech SynthesisDecha Moungsri, Tomoki Koriyama, Takao Kobayashi. 1517-1521 [doi]

Using Zero-Frequency Resonator to Extract Multilingual Intonation StructureJinfu Ni, Yoshinori Shiga, Hisashi Kawai. 1522-1526 [doi]

A DNN-HMM Approach to Story SegmentationJia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng, Haizhou Li. 1527-1531 [doi]

The SIWIS Database: A Multilingual Speech Database with Acted EmphasisJean Philippe Goldman, Pierre-Edouard Honnet, Robert A. J. Clark, Philip N. Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, Junichi Yamagishi. 1532-1535 [doi]

Open Source Speech and Language Resources for FrisianEmre Yilmaz, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David A. van Leeuwen. 1536-1540 [doi]

The SRI CLEO Speaker-State CorpusAndreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti. 1541-1544 [doi]

SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin ChineseNancy F. Chen, Rong Tong, Darren Wee, Pei Xuan Lee, Bin Ma, Haizhou Li. 1545-1549 [doi]

The SRI Speech-Based Collaborative Learning CorpusColleen Richey, Cynthia D'Angelo, Nonye Alozie, Harry Bratt, Elizabeth Shriberg. 1550-1554 [doi]

An Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple AnnotatorsAnil Ramakrishna, Rahul Gupta, Ruth B. Grossman, Shrikanth S. Narayanan. 1555-1559 [doi]

Voting Detector: A Combination of Anomaly Detectors to Reveal Annotation Errors in TTS CorporaJindrich Matousek, Daniel Tihelka. 1560-1564 [doi]

The Magic Stone: A Video Game to Improve Communication Skills of People with Intellectual DisabilitiesMario Corrales-Astorgano, David Escudero Mancebo, César González Ferreras, Yurena Gutiérrez-González, Valle Flores-Lucas, Valentín Cardeñoso-Payo, Lourdes Aguilar-Cuevas. 1565-1566 [doi]

Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic FeaturesFinnian Kelly, Anil Alexander, Oscar Forth, Samuel Kent, Jonas Lindh, Joel Åkesson. 1567-1568 [doi]

A Real-Time Framework for Visual Feedback of Articulatory Data Using Statistical Shape ModelsKristy James, Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer. 1569-1570 [doi]

Flexible, Rapid Authoring of Goal-Orientated, Multi-Turn Dialogues Using the Task Completion PlatformAlex Marin, Paul A. Crook, Omar Zia Khan, Vasiliy Radostev, Khushboo Aggarwal, Ruhi Sarikaya. 1571-1572 [doi]

Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic ModelsMarc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani. 1573-1577 [doi]

Transfer Learning with Bottleneck Feature Networks for Whispered Speech RecognitionBoon Pang Lim, Faith Wong, Yuyao Li, Jia Wei Bay. 1578-1582 [doi]

Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-ActivationsTasha Nagamine, Zhuo Chen, Nima Mesgarani. 1583-1587 [doi]

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource SettingsMasayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon. 1588-1592 [doi]

Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic ModelsLahiru Samarakoon, Khe Chai Sim. 1593-1597 [doi]

Improving Children's Speech Recognition Through Out-of-Domain Data AugmentationJoachim Fainberg, Peter Bell 0001, Mike Lincoln, Steve Renals. 1598-1602 [doi]

Virtual Machines and Containers as a Platform for ExperimentationFlorian Metze, Eric Riebling, Anne S. Warlaumont, Elika Bergelson. 1603-1607 [doi]

CloudCAST - Remote Speech Technology for Speech ProfessionalsPhil D. Green, Ricard Marxer, Stuart P. Cunningham, Heidi Christensen, Frank Rudzicz, Maria Yancheva, André Coy, Massimiliano Malavasi, Lorenzo Desideri, Fabio Tamburini. 1608-1612 [doi]

webASR 2 - Improved Cloud Based Speech TechnologyThomas Hain, Jeremy Christian, Oscar Saz, Salil Deena, Madina Hasan, Raymond W. M. Ng, Rosanna Milner, Mortaza Doulaty, Yulan Liu. 1613-1617 [doi]

Sharing Speech Synthesis Software for Research and Education Within Low-Tech and Low-Resource CommunitiesAndrew R. Plummer, Mary E. Beckman. 1618-1622 [doi]

The Berkeley Phonetics MachineRonald L. Sprouse, Keith Johnson. 1623-1626 [doi]

Experiences with Shared Resources for Research and Education in Speech and Language ProcessingRebecca Bates, Eric Fosler-Lussier, Florian Metze, Martha Larson, Gina-Anne Levow, Emily Mower Provost. 1627-1631 [doi]

The Voice Conversion Challenge 2016Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi. 1632-1636 [doi]

Analysis of the Voice Conversion Challenge 2016 Evaluation ResultsMirjam Wester, Zhizheng Wu, Junichi Yamagishi. 1637-1641 [doi]

0 ConversionLing-Hui Chen, Li-juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai. 1642-1646 [doi]

A Voice Conversion Mapping Function Based on a Stacked Joint-AutoencoderSeyed Hamidreza Mohammadi, Alexander Kain. 1647-1651 [doi]

Locally Linear Embedding for Exemplar-Based Spectral ConversionYi-Chiao Wu, Hsin-Te Hwang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang. 1652-1656 [doi]

Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada, Felipe Espic. 1657-1661 [doi]

ML Parameter Generation with a Reformulated MGE Training Criterion - Participation in the Voice Conversion Challenge 2016Daniel Erro, Agustín Alonso, Luis Serrano, David Tavarez, Igor Odriozola, Xabier Sarasola, Eder del Blanco, Jon Sánchez, Ibon Saratxaga, Eva Navas, Inma Hernáez. 1662-1666 [doi]

The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda. 1667-1671 [doi]

Release from Energetic Masking Caused by Repeated Patterns of Glimpsing WindowsMaury Lander-Portnoy. 1672-1676 [doi]

Glimpsing Predictions for Natural and Vocoded Sentence Intelligibility During Modulation Masking: Effect of the Glimpse Cutoff CriterionBobby Gibbs II, Daniel Fogerty. 1677-1681 [doi]

Temporal Envelopes in Sine-Wave Speech RecognitionLi Xu. 1682-1686 [doi]

Understanding Periodically Interrupted Mandarin SpeechJing Liu, Rosanna H. N. Tong, Fei Chen. 1687-1691 [doi]

Factors Affecting the Intelligibility of Sine-Wave SpeechFei Chen, Daniel Fogerty. 1692-1695 [doi]

Effects of Urgent Speech and Preceding Sounds on Speech Intelligibility in Noisy and Reverberant EnvironmentsNao Hodoshima. 1696-1699 [doi]

Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Hong Yu, Tomi Kinnunen, Nicholas Evans, Zheng-Hua Tan. 1700-1704 [doi]

Cross-Database Evaluation of Audio-Based Spoofing Detection SystemsPavel Korshunov, Sébastien Marcel. 1705-1709 [doi]

Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine SpeechKaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah. 1710-1714 [doi]

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant ConditionsXiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li. 1715-1719 [doi]

Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone SpeechMd. Sahidullah, Rosa González Hautamäki, Dennis Alexander Lehmann Thomsen, Tomi Kinnunen, Zheng-Hua Tan, Ville Hautamäki, Robert Parts, Martti Pitkänen. 1720-1724 [doi]

Statistical Modeling of Speaker's Voice with Temporal Co-Location for Active Voice AuthenticationZhong Meng, Biing-Hwang Juang. 1725-1729 [doi]

Joint Enhancement and Coding of Speech by Incorporating Wiener Filtering in a CELP CodecJohannes Fischer, Tom Bäckström. 1730-1734 [doi]

Multi-Channel Linear Prediction Based on Binaural Coherence for Speech DereverberationHong Liu, Xiuling Wang, Miao Sun, Cheng Pang. 1735-1739 [doi]

Single-Channel Speech Enhancement Using Double SpectrumMartin Blass, Pejman Mowlaee, W. Bastiaan Kleijn. 1740-1744 [doi]

On the Appropriateness of Complex-Valued Neural Networks for Speech EnhancementLukas Drude, Bhiksha Raj, Reinhold Haeb-Umbach. 1745-1749 [doi]

Introducing the Turbo-Twin-HMM for Audio-Visual Speech EnhancementSteffen Zeiler, Hendrik Meutzner, Ahmed Hussen Abdelaziz, Dorothea Kolossa. 1750-1754 [doi]

Assessing Speech Quality in Speech-Aware Hearing Aids Based on Phoneme PosteriorgramsConstantin Spille, Hendrik Kayser, Hynek Hermansky, Bernd T. Meyer. 1755-1759 [doi]

Time-Varying Quasi-Closed-Phase Weighted Linear Prediction Analysis of Speech for Accurate Formant Detection and TrackingDhananjaya N. Gowda, Paavo Alku. 1760-1764 [doi]

Improved Depiction of Tissue Boundaries in Vocal Tract Real-Time MRI Using Automatic Off-Resonance CorrectionYongwan Lim, Sajan Goud Lingala, Asterios Toutios, Shrikanth S. Narayanan, Krishna S. Nayak. 1765-1769 [doi]

Modeling and Transforming Speech Using Variational AutoencodersMerlijn Blaauw, Jordi Bonada. 1770-1774 [doi]

Phase-Encoded Speech SpectrogramsChandra Sekhar Seelamantula. 1775-1779 [doi]

Towards Minimally Invasive Velar State Detection in Normal and Silent SpeechPeter Birkholz, Petko Bakardjiev, Steffen Kürbis, Rico Petrick. 1780-1784 [doi]

RNN-BLSTM Based Multi-Pitch EstimationJianshu Zhang, Jian Tang, Li-Rong Dai. 1785-1789 [doi]

TUSK: A Framework for Overviewing the Performance of F0 EstimatorsMasanori Morise, Hideki Kawahara. 1790-1794 [doi]

A Robust Non-Parametric and Filtering Based Approach for Glottal Closure Instant DetectionPradeep Rengaswamy, Gurunath Reddy M., K. Sreenivasa Rao, Pallab Dasgupta. 1795-1799 [doi]

Analysis of Face Mask Effect on Speaker RecognitionRahim Saeidi, Ilkka Huhtakallio, Paavo Alku. 1800-1804 [doi]

Data Selection for Within-Class Covariance EstimationElliot Singer, Tyler Campbell, Douglas A. Reynolds. 1805-1809 [doi]

Inter-Task System Fusion for Speaker RecognitionMarc Ferras, Srikanth R. Madikeri, Subhadeep Dey, Petr Motlícek, Hervé Bourlard. 1810-1814 [doi]

Mahalanobis Metric Scoring Learned from Weighted Pairwise Constraints in I-Vector Speaker Recognition SystemZhenchun Lei, Yanhong Wan, Jian Luo, Yingen Yang. 1815-1819 [doi]

Novel Subband Autoencoder Features for Detection of Spoofed SpeechMeet H. Soni, Tanvina B. Patel, Hemant A. Patil. 1820-1824 [doi]

On the Issue of Calibration in DNN-Based Speaker Recognition SystemsMitchell McLaren, Diego Castán, Luciana Ferrer, Aaron Lawson. 1825-1829 [doi]

Probabilistic Approach Using Joint Long and Short Session i-Vectors Modeling to Deal with Short Utterances for Speaker RecognitionWaad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre. 1830-1834 [doi]

Short Utterance Variance Modelling and Utterance Partitioning for PLDA Speaker VerificationAhilan Kanagasundaram, David Dean, Sridha Sridharan, Clinton Fookes, Ivan Himawan. 1835-1838 [doi]

Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker VerificationNicolai Bæk Thomsen, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen. 1839-1843 [doi]

Text-Available Speaker Recognition System for Forensic ApplicationsChengzhu Yu, Chunlei Zhang, Finnian Kelly, Abhijeet Sangwan, John H. L. Hansen. 1844-1847 [doi]

Transfer Learning for Speaker Verification on Short UtterancesQingyang Hong, Lin Li, Lihong Wan, Jun Zhang, Feng Tong. 1848-1852 [doi]

Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker VerificationJianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee. 1853-1857 [doi]

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker ClusteringXiao-lei Zhang. 1858-1862 [doi]

Improving Deep Neural Networks Based Speaker Verification Using Unlabeled DataYao Tian, Meng Cai, Liang He, Wei-Qiang Zhang, Jia Liu. 1863-1867 [doi]

Maximum a posteriori Based Decoding for CTC Acoustic ModelsNaoyuki Kanda, Xugang Lu, Hisashi Kawai. 1868-1872 [doi]

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity StructuresAfsaneh Asaei, Gil Luyet, Milos Cernak, Hervé Bourlard. 1873-1877 [doi]

Model Compression Applied to Small-Footprint Keyword SpottingGeorge Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni. 1878-1882 [doi]

Why do ASR Systems Despite Neural Nets Still Depend on Robust FeaturesAngel Mario Castro Martinez, Marc René Schädler. 1883-1887 [doi]

An Adaptive Multi-Band System for Low Power Voice Command RecognitionQing He, Gregory W. Wornell, Wei Ma. 1888-1892 [doi]

Memory-Efficient Modeling and Search Techniques for Hardware ASR DecodersMichael Price, Anantha Chandrakasan, James R. Glass. 1893-1897 [doi]

Log-Linear System Combination Using Structured Support Vector MachinesJingzhou Yang, Anton Ragni, Mark J. F. Gales, Kate M. Knill. 1898-1902 [doi]

Efficient Segmental Cascades for Speech RecognitionHao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu. 1903-1907 [doi]

A WFST Framework for Single-Pass Multi-Stream DecodingSirui Xu, Eric Fosler-Lussier. 1908-1912 [doi]

Comparison of Multiple System Combination Techniques for Keyword SpottingWilliam Hartmann, Le Zhang, Kerri Barnes, Roger Hsiao, Stavros Tsakalidis, Richard M. Schwartz. 1913-1917 [doi]

Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-ExampleMasato Obara, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh 0001. 1918-1922 [doi]

Phone Synchronous Decoding with CTC LatticeZhehuai Chen, Wei Deng, Tao Xu, Kai Yu. 1923-1927 [doi]

Speech Features for Depression DetectionSaurabh Sahu, Carol Y. Espy-Wilson. 1928-1932 [doi]

Parkinson's Disease Progression Assessment from Speech Using GMM-UBMT. Arias-Vergara, J. C. Vásquez-Correa, Juan Rafael Orozco-Arroyave, Jesus Francisco Vargas Bonilla, Elmar Nöth. 1933-1937 [doi]

Speech-Based Detection of Alzheimer's Disease in Conversational GermanJochen Weiner, Christian Herff, Tanja Schultz. 1938-1942 [doi]

Cross-Cultural Depression Recognition from Vocal BiomarkersSharifa Alghowinem, Roland Goecke, Julien Epps, Michael Wagner 0004, Jeffrey Cohn. 1943-1947 [doi]

Speech Recognition in Alzheimer's Disease and in its AssessmentLuke Zhou, Kathleen C. Fraser, Frank Rudzicz. 1948-1952 [doi]

Does She Speak RTT? Towards an Earlier Identification of Rett Syndrome Through Intelligent Pre-Linguistic Vocalisation AnalysisFlorian B. Pokorny, Peter B. Marschik, Christa Einspieler, Björn W. Schuller. 1953-1957 [doi]

Speech Rhythm in Parkinson's Disease: A Study on ItalianMassimo Pettorino, Maria Grazia Busa, Elisa Pellegrino. 1958-1961 [doi]

English Language Speech AssistantXavier Anguera, Vu Van. 1962-1963 [doi]

Remeeting - Deep Insights to ConversationsAllen Guo, Arlo Faria, Korbinian Riedhammer. 1964-1965 [doi]

SERAPHIM Live! - Singing Synthesis for the Performer, the Composer, and the 3D Game DeveloperPaul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li. 1966-1967 [doi]

My-Own-Voice: A Web Service That Allows You to Create a Text-to-Speech Voice From Your Own VoiceFabrice Malfrère, Olivier Deroo, Emmanuelle Franques, Jonathan Hourez, Nicolas Mazars, Vincent Pagel, Geoffrey Wilfart. 1968-1969 [doi]

Talking with Kids Really Matters: Early Language Experience Shapes Later Life ChancesAnne Fernald. 1970 [doi]

Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature ExtractionTara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran. 1971-1975 [doi]

Neural Network Adaptive Beamforming for Robust Multichannel Speech RecognitionBo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani. 1976-1980 [doi]

Improved MVDR Beamforming Using Single-Channel Mask Prediction NetworksHakan Erdogan, John R. Hershey, Shinji Watanabe, Michael I. Mandel, Jonathan Le Roux. 1981-1985 [doi]

Channel Selection for Distant Speech Recognition Exploiting Cepstral DistanceCristina Guerrero, Georgina Tryfou, Maurizio Omologo. 1986-1990 [doi]

Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched ConditionsMichael I. Mandel, Jon Barker. 1991-1995 [doi]

Far-Field ASR Without Parallel DataVijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur. 1996-2000 [doi]

The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native LanguageBjörn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron C. Elkins, Yue Zhang 0014, Eduardo Coutinho, Keelan Evanini. 2001-2005 [doi]

Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception DetectionSarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg. 2006-2010 [doi]

Is Deception Emotional? An Emotion-Driven Predictive ApproachShahin Amiriparian, Jouni Pohjalainen, Erik Marchi, Sergey Pugachevskiy, Björn W. Schuller. 2011-2015 [doi]

Prosodic Cues and Answer Type Detection for the Deception Sub-ChallengeClaude Montacié, Marie-José Caraty. 2016-2020 [doi]

Automatic Estimation of Perceived Sincerity from Spoken LanguageBrandon M. Booth, Rahul Gupta, Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan. 2021-2025 [doi]

Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic AnalysisGábor Gosztolya, Tamás Grósz, György Szaszák, László Tóth. 2026-2030 [doi]

Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity EvaluationHung-Shin Lee, Yu Tsao, Chi-Chun Lee, Hsin-Min Wang, Wei-Cheng Lin, Wei-Chen Chen, Shan-Wen Hsiao, Shyh-Kang Jeng. 2031-2035 [doi]

Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based FeaturesRobert Herms. 2036-2040 [doi]

Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning PerspectiveYue Zhang 0014, Felix Weninger, Zhao Ren, Björn W. Schuller. 2041-2045 [doi]

Fusing Acoustic Feature Representations for Computational Paralinguistics TasksHeysem Kaya, Alexey A. Karpov. 2046-2050 [doi]

A Stochastic Model for Computer-Aided Human-Human DialogueMerwan Barlier, Romain Laroche, Olivier Pietquin. 2051-2055 [doi]

Highlighting Psychological Features for Predicting Child Interjections During Story TellingGaël Lejeune, François Rioult, Bruno Crémilleux. 2056-2059 [doi]

Hybrid Dialogue State Tracking for Real World Human-to-Human DialoguesKai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, Kai Yu. 2060-2064 [doi]

Automatic Recognition of Social Roles Using Long Term Role Transitions in Small Group InteractionsGaurav Fotedar, Aditya Gaonkar P., Saikat Chatterjee, Prasanta Kumar Ghosh. 2065-2069 [doi]

On the Influence of Gender on Interruptions in Multiparty DialoguePaul Van Eecke, Raquel Fernández. 2070-2074 [doi]

Detection of User Escalation in Human-Computer InteractionsIan Beaver, Cynthia Freeman. 2075-2079 [doi]

Assessing Idiosyncrasies in a Bayesian Model of Speech CommunicationMarie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz. 2080-2084 [doi]

Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and CognitionMaria K. Wolters, Najoung Kim, Jung Ho Kim, Sarah E. MacPherson, Jong C. Park. 2085-2089 [doi]

Sensorimotor Response to Visual Imagery of Tongue DisplacementWilliam F. Katz, Divya Prabhakaran. 2090-2094 [doi]

Does Auditory-Motor Learning of Speech Transfer from the CV Syllable to the CVCV Word?Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan. 2095-2099 [doi]

Exemplar Dynamics in Phonetic Convergence of Speech RateAntje Schweitzer, Michael Walsh 0001. 2100-2104 [doi]

Articulation Rate in Adverse Listening Conditions in Younger and Older AdultsOuti Tuomainen, Valérie Hazan. 2105-2109 [doi]

Error Correction in Lightly Supervised Alignment of Broadcast SubtitlesJulia Olcoz, Oscar Saz, Thomas Hain. 2110-2114 [doi]

Automatic Genre and Show Identification of Broadcast MediaMortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain. 2115-2119 [doi]

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party EnvironmentsGuan-Lin Chao, William Chan, Ian Lane. 2120-2124 [doi]

Text-Dependent Audiovisual Synchrony Detection for Spoofing Detection in Mobile Person RecognitionAmit Aides, Hagai Aronowitz. 2125-2129 [doi]

Improving Boundary Estimation in Audiovisual Speech Activity Detection Using Bayesian Information CriterionFei Tao, John H. L. Hansen, Carlos Busso. 2130-2134 [doi]

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASRSebastian Gergen, Steffen Zeiler, Ahmed Hussen Abdelaziz, Robert M. Nickel, Dorothea Kolossa. 2135-2139 [doi]

Retrieval of Textual Song Lyrics from Sung InputsAnna M. Kruspe. 2140-2144 [doi]

Phoneme, Phone Boundary, and Tone in Automatic Scoring of Mandarin ProficiencyJiahong Yuan, Mark Liberman. 2145-2149 [doi]

Tone Classification in Mandarin Chinese Using Convolutional Neural NetworksCharles Chen, Razvan C. Bunescu, Li Xu, Chang Liu. 2150-2154 [doi]

Robust Estimation of Fundamental Frequency Using Single Frequency Filtering ApproachVishala Pannala, G. Aneeja, Sudarsana Reddy Kadiri, B. Yegnanarayana. 2155-2159 [doi]

A Fast and Accurate Fundamental Frequency Estimator Using Recursive Moving Average FiltersRyunosuke Daido, Yuji Hisaminato. 2160-2164 [doi]

Frequency Estimation from Waveforms Using Multi-Layered Neural NetworksPrateek Verma, Ronald W. Schafer. 2165-2169 [doi]

Speaker Linking and Applications Using Non-Parametric Hashing MethodsDouglas E. Sturim, William M. Campbell. 2170-2174 [doi]

Iterative PLDA Adaptation for Speaker DiarizationGaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier. 2175-2179 [doi]

A Speaker Diarization System for Studying Peer-Led Team Learning GroupsHarishchandra Dubey, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen. 2180-2184 [doi]

DNN-Based Speaker Clustering for Speaker DiarisationRosanna Milner, Thomas Hain. 2185-2189 [doi]

On the Importance of Efficient Transition Modeling for Speaker DiarizationItshak Lapidot, Jean-François Bonastre. 2190-2193 [doi]

Priors for Speaker Counting and Diarization with AHCGregory Sell, Alan McCree, Daniel Garcia-Romero. 2194-2198 [doi]

Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based FeaturesNauman Dawalatabad, Srikanth R. Madikeri, Chandra Sekhar C., Hema A. Murthy. 2199-2203 [doi]

DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker IdentificationZeyan Oo, Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa, Xiong Xiao, Masahiro Iwahashi. 2204-2208 [doi]

Unit-Selection Attack Detection Based on Unfiltered Frequency-Domain FeaturesUlrich Scherhag, Andreas Nautsch, Christian Rathgeb, Christoph Busch. 2209-2213 [doi]

Investigating the Impact of Dialect Prestige on Lexical DecisionMairym Lloréns Monteserín, Jason Zevin. 2214-2218 [doi]

Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic FeaturesJinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan. 2219-2222 [doi]

Factor Analysis Based Speaker Verification Using ASRHang Su, Steven Wegmann. 2223-2227 [doi]

Joint Sound Source Separation and Speaker RecognitionJeroen Zegers, Hugo Van Hamme. 2228-2232 [doi]

Robust Multichannel Gender Classification from Speech in Movie AudioNaveen Kumar, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 2233-2237 [doi]

Recent Advances in Google Real-Time HMM-Driven Unit Selection SynthesizerXavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silén. 2238-2242 [doi]

First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural AttentionWenfu Wang, Shuang Xu, Bo Xu. 2243-2247 [doi]

The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech SynthesisZhengqi Wen, Ya Li, Jianhua Tao. 2248-2252 [doi]

Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech SynthesisEunwoo Song, Frank K. Soong, Hong-Goo Kang. 2253-2257 [doi]

Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive TrainingYamato Ohtani, Koichiro Mori, Masahiro Morita. 2258-2262 [doi]

Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech SynthesisFelipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King. 2263-2267 [doi]

Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech SynthesisYi Zhao, Daisuke Saito, Nobuaki Minematsu. 2268-2272 [doi]

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile DevicesHeiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemyslaw Szczepaniak. 2273-2277 [doi]

An Investigation of DNN-Based Speech Synthesis Using Speaker CodesNobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno. 2278-2282 [doi]

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural NetworksLauri Juvela, Xin Wang, Shinji Takaki, Manu Airaksinen, Junichi Yamagishi, Paavo Alku. 2283-2287 [doi]

Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts FrameworkKentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. 2288-2292 [doi]

Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNNBlaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlicek. 2293-2297 [doi]

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving ProsodyAlexandros Lazaridis, Milos Cernak, Philip N. Garner. 2298-2302 [doi]

On Smoothing and Enhancing Dynamics of Pitch Contours Represented by Discrete Orthogonal Polynomials for Prosody GenerationChen-Yu Chiang. 2303-2307 [doi]

An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break PredictionAnandaswarup Vadapalli, Suryakanth V. Gangashetty. 2308-2312 [doi]

Model-Based Parametric Prosody Synthesis with Deep Neural NetworkHao Liu, Heng Lu, Xu Shao, Yi Xu. 2313-2317 [doi]

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language ModelsThomas Drugman, Janne Pylkkönen, Reinhard Kneser. 2318-2322 [doi]

Learning N-Gram Language Models from Uncertain DataVitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark. 2323-2327 [doi]

Entropy Based Pruning for Non-Negative Matrix Based Language Models with Contextual FeaturesBarlas Oguz, Issac Alphonso, Shuangyu Chang. 2328-2332 [doi]

Unsupervised Adaptation of Recurrent Neural Network Language ModelsSiva Reddy Gangireddy, Pawel Swietojanski, Peter Bell 0001, Steve Renals. 2333-2337 [doi]

Contextual Prediction Models for Speech RecognitionYoni Halpern, Keith B. Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Bäuml. 2338-2342 [doi]

Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech RecognitionSalil Deena, Madina Hasan, Mortaza Doulaty, Oscar Saz, Thomas Hain. 2343-2347 [doi]

A Low Cost Desktop Robot and Tele-Presence Device for Interactive Speech ResearchMichael C. Brady. 2348-2349 [doi]

Silent-Speech Command Word Recognition Using Electro-Optical StomatographySimon Stone, Peter Birkholz. 2350-2351 [doi]

An Engine for Online Video Search in Large Archives of the Holocaust TestimoniesPetr Stanislav, Jan Svec, Pavel Ircing. 2352-2353 [doi]

Data Selection by Sequence Summarizing Neural Network in Mismatch Condition TrainingKaterina Zmolíková, Martin Karafiát, Karel Veselý, Marc Delcroix, Shinji Watanabe, Lukás Burget, Jan Cernocký. 2354-2358 [doi]

Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech RecognitionSouvik Kundu, Khe Chai Sim, Mark J. F. Gales. 2359-2363 [doi]

Robust Speech Recognition Using Generalized Distillation FrameworkKonstantin Markov, Tomoko Matsui. 2364-2368 [doi]

Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech RecognitionYusuke Shinohara. 2369-2372 [doi]

The Use of Locally Normalized Cepstral Coefficients (LNCC) to Improve Speaker Recognition Accuracy in Highly Reverberant RoomsVíctor Poblete, Juan Pablo Escudero, Josué Fredes, José Novoa, Richard M. Stern, Simon King, Néstor Becerra Yoma. 2373-2377 [doi]

Two-Stage Data Augmentation for Low-Resourced Speech RecognitionWilliam Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard M. Schwartz. 2378-2382 [doi]

Native Language Identification Using Spectral and Source-Based FeaturesAvni Rajpal, Tanvina B. Patel, Hardik B. Sailor, Maulik C. Madhavi, Hemant A. Patil, Hiroya Fujisaki. 2383-2387 [doi]

Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term FeaturesYishan Jiao, Ming Tu, Visar Berisha, Julie M. Liss. 2388-2392 [doi]

Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native LanguageGil Keren, Jun Deng, Jouni Pohjalainen, Björn W. Schuller. 2393-2397 [doi]

Native Language Detection Using the I-Vector FrameworkMohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro L. Koerich. 2398-2402 [doi]

Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics ChallengeMark Huckvale. 2403-2407 [doi]

Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language IdentificationPrashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis G. Georgiou. 2408-2412 [doi]

Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English SpeakersAlberto Abad, Eugénio Ribeiro, Fábio Kepler, Ramón Fernández Astudillo, Isabel Trancoso. 2413-2417 [doi]

Determining Native Language and Deception Using Phonetic Features and Classifier CombinationGábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth. 2418-2422 [doi]

A Preliminary Ultrasound Study of Nasal and Lateral Coronals in ArrernteMarija Tabain, Richard Beare. 2423-2427 [doi]

Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance ImagingAsterios Toutios, Sajan Goud Lingala, Colin Vaz, Jangwon Kim, John Esling, Patricia A. Keating, Matthew Gordon, Dani Byrd, Louis Goldstein, Krishna S. Nayak, Shrikanth S. Narayanan. 2428-2432 [doi]

Marginal Contrast Among Romanian Vowels: Evidence from ASR and Functional LoadMargaret E. L. Renwick, Ioana Vasilescu, Camille Dutrey, Lori Lamel, Bianca Vieru. 2433-2437 [doi]

Effects of Subglottal-Coupling and Interdental-Space on Formant Trajectories During Front-to-Back Vowel Transitions in ChineseShuanglin Fan, Kiyoshi Honda, Jianwu Dang, Hui Feng. 2438-2442 [doi]

Perceptual Lateralization of Coda Rhotic Production in Puerto Rican SpanishMairym Lloréns Monteserín, Shrikanth S. Narayanan, Louis Goldstein. 2443-2447 [doi]

Interaction Between Lexical Tone and Intonation: An EMA StudyHao Yi, Sam Tilsen. 2448-2452 [doi]

Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice ConversionHuaiping Ming, Dong-Yan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li. 2453-2457 [doi]

Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNsAusdang Thangthai, Ben Milner, Sarah Taylor. 2458-2462 [doi]

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMsSrikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King. 2463-2467 [doi]

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech SynthesisBo Li, Heiga Zen. 2468-2472 [doi]

GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech SynthesisManu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku. 2473-2477 [doi]

Singing Voice Synthesis Based on Deep Neural NetworksMasanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. 2478-2482 [doi]

Blind Recovery of Perceptual Models in Distributed Speech and Audio CodingTom Bäckström, Florin Ghido, Johannes Fischer. 2483-2487 [doi]

Glimpse-Based Metrics for Predicting Speech Intelligibility in Additive Noise ConditionsYan Tang, Martin Cooke. 2488-2492 [doi]

Analyzing the Relation Between Overall Quality and the Quality of Individual Phases in a Telephone ConversationFriedemann Köster, Sebastian Möller. 2493-2497 [doi]

Intelligibility Enhancement at the Receiving End of the Speech Transmission System - Effects of Far-End Noise ReductionEmma Jokinen, Paavo Alku. 2498-2502 [doi]

Intelligibility of Disordered Speech: Global and Detailed ScoresMario Ganzeboom, Marjoke Bakker, Catia Cucchiarini, Helmer Strik. 2503-2507 [doi]

Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in NoiseMaria Koutsogiannaki, Yannis Stylianou. 2508-2512 [doi]

Dynamic Transcription for Low-Latency Speech TranslationJan Niehues, Thai Son Nguyen, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel. 2513-2517 [doi]

Learning a Translation Model from Word LatticesOliver Adams, Graham Neubig, Trevor Cohn, Steven Bird. 2518-2522 [doi]

Disfluency Detection Using a Bidirectional LSTMVicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi. 2523-2527 [doi]

Sentence Boundary Detection Based on Parallel Lexical and Acoustic ModelsXiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel. 2528-2532 [doi]

Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network ModelsQuoc Truong Do, Sakriani Sakti, Graham Neubig, Satoshi Nakamura. 2533-2537 [doi]

Better Evaluation of ASR in Speech Translation Context Using Word EmbeddingsNgoc-Tien Le, Christophe Servan, Benjamin Lecouteux, Laurent Besacier. 2538-2542 [doi]

Entropy Coding of Spectral Envelopes for Speech and Audio Coding Using Distribution QuantizationSrikanth Korse, Tobias Jähnel, Tom Bäckström. 2543-2547 [doi]

An Objective Evaluation Methodology for Blind Bandwidth ExtensionStéphane Villette, Sen Li, Pravin Ramadas, Daniel J. Sinder. 2548-2552 [doi]

EVS Channel Aware Mode Robustness to Frame ErasuresAnssi Rämö, Antti Kurittu, Henri Toukomaa. 2553-2557 [doi]

An Interaural Magnification Algorithm for Enhancement of Naturally-Occurring Level DifferencesShadi Pirhosseinloo, Kostas Kokkinakis. 2558-2561 [doi]

Probabilistic Spatial Filter Estimation for Signal Enhancement in Multi-Channel Automatic Speech RecognitionHendrik Kayser, Niko Moritz, Jörn Anemüller. 2562-2566 [doi]

Improved a priori SAP Estimator in Complex Noisy Environment for Dual Channel Microphone SystemYouna Ji, Young-Cheol Park. 2567-2571 [doi]

A Spectral Modulation Sensitivity Weighted Pre-Emphasis Filter for Active Noise Control SystemKah-Meng Cheong, Yuh-Yuan Wang, Tai-Shih Chi. 2572-2576 [doi]

Semi-Coupled Dictionary Based Automatic Bandwidth Extension Approach for Enhancing Children's ASRGanji Sreeram, Rohit Sinha. 2577-2581 [doi]

Bird Song Synthesis Based on Hidden Markov ModelsJordi Bonada, Robert Lachlan, Merlijn Blaauw. 2582-2586 [doi]

Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase ClassificationKantapon Kaewtip, Charles E. Taylor, Abeer Alwan. 2587-2591 [doi]

A Framework for Automated Marmoset Vocalization Detection and ClassificationAlan Wisler, Laura J. Brattain, Rogier Landman, Thomas F. Quatieri. 2592-2596 [doi]

Call Alternation Between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural HabitatIkkyu Aihara, Takeshi Mizumoto, Hiromitsu Awano, Hiroshi G. Okuno. 2597-2601 [doi]

Sinusoidal Modelling for EcoacousticsPatrice Guyot, Alice Eldridge, Ying Chen Eyre-Walker, Alison Johnston, Thomas Pellegrini, Mika Peck. 2602-2606 [doi]

Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid CallsDan Stowell, Veronica Morfi, Lisa F. Gill. 2607-2611 [doi]

Recognition of Multiple Bird Species Based on Penalised Maximum Likelihood and HMM-Based Modelling of Individual Vocalisation ElementsPeter Jancovic, Münevver Köküer. 2612-2616 [doi]

Cost Effective Acoustic Monitoring of Bird SpeciesCiira Wa Maina. 2617-2620 [doi]

Feature Learning and Automatic Segmentation for Dolphin Communication AnalysisDaniel Kohlsdorf, Denise Herzing, Thad Starner. 2621-2625 [doi]

Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone ArrayReiji Suzuki, Shiho Matsubayashi, Kazuhiro Nakadai, Hiroshi G. Okuno. 2626-2630 [doi]

Robust Detection of Multiple Bioacoustic Events with Repetitive StructuresFrank Kurth. 2631-2635 [doi]

A Real-Time Parametric General-Purpose Mammalian Vocal SynthesiserRoger K. Moore. 2636-2640 [doi]

YIN-Bird: Improved Pitch Tracking for Bird VocalisationsColm O'Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte. 2641-2645 [doi]

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision FunctionsYao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Berlin Chen. 2646-2650 [doi]

Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered SpeechPeter A. Heeman, Rebecca Lunsford, Andy McMillin, J. Scott Yaruss. 2651-2655 [doi]

Deep Neural Networks for Voice Quality Assessment Based on the GRBAS ScaleSimin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji. 2656-2660 [doi]

Automated Screening of Speech Development Issues in Children by Identifying Phonological Error PatternsLauren Ward, Alessandro Stefani, Daniel Smith, Andreas Duenser, Jill Freyne, Barbara Dodd, Angela Morgan. 2661-2665 [doi]

Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence MeasuresJu Lin, Yanlu Xie, Jinsong Zhang. 2666-2670 [doi]

Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov ModelMyung Jong Kim, Jun Wang, Hoirin Kim. 2671-2675 [doi]

Detection of Total Syllables and Canonical Syllables in Infant VocalizationsAnne S. Warlaumont, Heather L. Ramsdell-Hudock. 2676-2680 [doi]

Improving Automatic Recognition of Aphasic Speech with AphasiaBankDuc Le, Emily Mower Provost. 2681-2685 [doi]

Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic InformationVincent Laborde, Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Halima Sahraoui, Jérôme Farinas. 2686-2690 [doi]

Pronunciation Error Detection for New Language LearnersSean Robertson, Cosmin Munteanu, Gerald Penn. 2691-2695 [doi]

L2 English Rhythm in Read Speech by Chinese StudentsHongwei Ding, Xinping Xu. 2696-2700 [doi]

Improving the Probabilistic Framework for Representing Dialogue Systems with User Response ModelMiao Li, Zhipeng Chen, Ji Wu. 2701-2705 [doi]

Dialogue Session Segmentation by Embedding-Enhanced TextTilingYiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang. 2706-2710 [doi]

Target-Based State and Tracking Algorithm for Spoken Dialogue SystemMiao Li, Zhiyang He, Ji Wu. 2711-2715 [doi]

Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act DetectionSheng-syun Shen, Hung-yi Lee. 2716-2720 [doi]

Objective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism AssessmentManoj Kumar, Rahul Gupta, Daniel Bone, Nikolaos Malandrakis, Somer Bishop, Shrikanth S. Narayanan. 2721-2725 [doi]

Improving Generalisation to New Speakers in Spoken Dialogue State TrackingIñigo Casanueva, Thomas Hain, Phil D. Green. 2726-2730 [doi]

Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by MachineBo-Hsiang Tseng, Sheng-syun Shen, Hung-yi Lee, Lin-Shan Lee. 2731-2735 [doi]

How Neural Network Depth Compensates for HMM Conditional Independence Assumptions in DNN-HMM Acoustic ModelsSuman V. Ravuri, Steven Wegmann. 2736-2740 [doi]

Jointly Learning to Locate and Classify Words Using Convolutional NetworksDimitri Palaz, Gabriel Synnaeve, Ronan Collobert. 2741-2745 [doi]

On the Efficient Representation and Execution of Deep Acoustic ModelsRaziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin. 2746-2750 [doi]

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMIDaniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur. 2751-2755 [doi]

Virtual Adversarial Training Applied to Neural Higher-Order Factors for Phone ClassificationMartin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf. 2756-2760 [doi]

Sequence Student-Teacher Training of Deep Neural NetworksJeremy H. M. Wong, Mark J. F. Gales. 2761-2765 [doi]

Robustness in Speech, Speaker, and Language Recognition: "You've Got to Know Your Limitations"John H. L. Hansen, Hynek Boril. 2766-2770 [doi]

The Use of Read versus Conversational Lombard Speech in Spectral Tilt Modeling for Intelligibility Enhancement in Near-End Noise ConditionsEmma Jokinen, Ulpu Remes, Paavo Alku. 2771-2775 [doi]

Corpora for the Evaluation of Robust Speaker Recognition SystemsDouglas E. Sturim, Pedro A. Torres-Carrasquillo, Joseph P. Campbell. 2776-2780 [doi]

A French Corpus for Distant-Microphone Speech Processing in Real HomesNancy Bertin, Ewen Camberlein, Emmanuel Vincent, Romain Lebarbenchon, Stéphane Peillon, Éric Lamande, Sunit Sivasankaran, Frédéric Bimbot, Irina Illina, Ariane Tom, Sylvain Fleury, Éric Jamet. 2781-2785 [doi]

Realistic Multi-Microphone Data Simulation for Distant Speech RecognitionMirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo. 2786-2790 [doi]

Synthesis of Device-Independent Noise Corpora for Realistic ASR EvaluationHannes Gamper, Mark R. P. Thomas, Lyle Corbin, Ivan Tashev. 2791-2795 [doi]

Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel CompensationFred Richardson, Michael Brandstein, Jennifer Melot, Douglas A. Reynolds. 2796-2800 [doi]

Combining Data-Oriented and Process-Oriented Approaches to Modeling Reaction Time DataLouis ten Bosch, Lou Boves, Mirjam Ernestus. 2801-2805 [doi]

Do Listeners Learn Better from Natural Speech?Michael McAuliffe, Molly Babel, Charlotte Vaughn. 2806-2810 [doi]

Processing and Adaptation to Ambiguous Sounds during the Course of Perceptual LearningPolina Drozdova, Roeland Van Hout, Odette Scharenborg. 2811-2815 [doi]

The Effect of Background Noise on the Activation of Phonological and Semantic Information During Spoken-Word RecognitionFlorian Hintz, Odette Scharenborg. 2816-2820 [doi]

Relationships Between Functional Load and Auditory Confusability Under Different Speech EnvironmentsShinae Kang, Clara Cohen. 2821-2825 [doi]

The Role of Pitch in Punjabi Word IdentificationJasmeen Kanwal, Amanda Ritchart. 2826-2830 [doi]

Improving TTS with Corpus-Specific Pronunciation AdaptationMarie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive. 2831-2835 [doi]

Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion Utilizing Complex Many-to-Many AlignmentsAmr El-Desoky Mousa, Björn W. Schuller. 2836-2840 [doi]

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural NetworksDaan van Esch, Mason Chua, Kanishka Rao. 2841-2845 [doi]

Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech SynthesisMaël Pouget, Olha Nahorna, Thomas Hueber, Gérard Bailly. 2846-2850 [doi]

Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and ParsingRasmus Dall, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. 2851-2855 [doi]

Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS SystemXin Wang, Shinji Takaki, Junichi Yamagishi. 2856-2860 [doi]

Local Sparsity Based Online Dictionary Learning for Environment-Adaptive Speech Enhancement with Nonnegative Matrix FactorizationKwang Myung Jeon, Hong Kook Kim. 2861-2865 [doi]

Noise Aware and Combined Noise Models for Speech Denoising in Unknown Noise ConditionsPavlos Papadopoulos, Colin Vaz, Shrikanth S. Narayanan. 2866-2869 [doi]

Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule EstimationSeyedmahdad Mirsamadi, Ivan Tashev. 2870-2874 [doi]

A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic EnvironmentsAlessio Brutti, Antigoni Tsiami, Athanasios Katsamanis, Petros Maragos. 2875-2879 [doi]

Generalizing Steady State Suppression for Enhanced Intelligibility Under ReverberationPetko N. Petkov, Yannis Stylianou. 2880-2884 [doi]

Speech Intelligibility Prediction Based on the Envelope Power Spectrum Model with the Dynamic Compressive Gammachirp Auditory FilterbankKatsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani. 2885-2889 [doi]

Prediction and Generation of Backchannel Form for Attentive Listening SystemsTatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, Nigel G. Ward. 2890-2894 [doi]

Measuring Turn-Taking Offsets in Human-Human DialoguesRebecca Lunsford, Peter A. Heeman, Emma Rennie. 2895-2899 [doi]

Using Past Speaker Behavior to Better Predict Turn TransitionsTomer Meshorer, Peter A. Heeman. 2900-2904 [doi]

Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological TestsGérard Bailly, Frédéric Elisei, Alexandra Juphard, Olivier Moreaud. 2905-2909 [doi]

Predicting User Satisfaction from Turn-Taking in Spoken ConversationsShammur Absar Chowdhury, Evgeny A. Stepanov, Giuseppe Riccardi. 2910-2914 [doi]

Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback UtterancesCatharine Oertel, Joakim Gustafson, Alan W. Black. 2915-2919 [doi]

Language Recognition via Sparse CodingYoungjune L. Gwon, William M. Campbell, Douglas E. Sturim, H. T. Kung. 2920-2924 [doi]

A Feature Normalisation Technique for PLLR Based Language Identification SystemsSarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. 2925-2929 [doi]

An Investigation of Deep Neural Network Architectures for Language Recognition in Indian LanguagesMounika K. V., Sivanand Achanta, Lakshmi H. R., Suryakanth V. Gangashetty, Anil Kumar Vuppala. 2930-2933 [doi]

Automatic Dialect Detection in Arabic Broadcast SpeechAhmed M. Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James R. Glass, Peter Bell 0001, Steve Renals. 2934-2938 [doi]

Combining Weak Tokenisers for Phonotactic Language Recognition in a Resource-Constrained SettingRaymond W. M. Ng, Bhusan Chettri, Thomas Hain. 2939-2943 [doi]

End-to-End Language Identification Using Attention-Based Recurrent Neural NetworksWang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu. 2944-2948 [doi]

Enhancing Multilingual Recognition of Emotion in Speech by Language IdentificationHesam Sagha, Pavel Matejka, Maryna Gavryukova, Filip Povolný, Erik Marchi, Björn W. Schuller. 2949-2953 [doi]

Deep Neural Network Bottleneck Features for Acoustic Event RecognitionSeongkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko. 2954-2957 [doi]

Combining Energy and Cross-Entropy Analysis for Nuclear Segments DetectionAntonio Origlia, Francesco Cutugno. 2958-2962 [doi]

Anchored Speech DetectionRoland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister. 2963-2967 [doi]

Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the RoadMahesh Kumar Nandwana, Taufiq Hasan. 2968-2971 [doi]

Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse RepresentationK. V. Vijay Girish, A. G. Ramakrishnan, T. V. Ananthapadmanabha. 2972-2976 [doi]

Robust Sound Event Detection in Continuous Audio EnvironmentsHaomin Zhang, Ian Vince McLoughlin, Yan Song. 2977-2981 [doi]

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event RecognitionNaoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool. 2982-2986 [doi]

Artificial Neural Network-Based Feature Combination for Spatial Voice Activity DetectionStefan Meier, Walter Kellermann. 2987-2991 [doi]

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity DetectorsTomi Kinnunen, Alexey Sholokhov, Elie el Khoury, Dennis Alexander Lehmann Thomsen, Md. Sahidullah, Zheng-Hua Tan. 2992-2996 [doi]

Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)developmentFlorian B. Pokorny, Robert Peharz, Wolfgang Roth, Matthias Zöhrer, Franz Pernkopf, Peter B. Marschik, Björn W. Schuller. 2997-3001 [doi]

Minimizing Annotation Effort for Adaptation of Speech-Activity Detection SystemsLuciana Ferrer, Martin Graciarena. 3002-3006 [doi]

Progress and Prospects for Spoken Language Technology: What Ordinary People ThinkRoger K. Moore, Hui Li, Shih-Hao Liao. 3007-3011 [doi]

Progress and Prospects for Spoken Language Technology: Results from Four Sexennial SurveysRoger K. Moore, Ricard Marxer. 3012-3016 [doi]

On Employing a Highly Mismatched Crowd for Speech TranscriptionPurushotam G. Radadia, Rahul Kumar, Kanika Kalra, Shirish Subhash Karande, Sachin Lodha. 3017-3021 [doi]

Sage: The New BBN Speech Processing PlatformRoger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe, Jan Silovský, William Hartmann, Francis Keith, Omer Lang, Man-Hung Siu, Owen Kimball. 3022-3026 [doi]

DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech RecognitionKang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim. 3027-3031 [doi]

Deep Neural Network Frontend for Continuous EMG-Based Speech RecognitionMichael Wand, Jürgen Schmidhuber. 3032-3036 [doi]

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource LanguagesBasil Abraham, S. Umesh, Neethu Mariam Joy. 3037-3041 [doi]

Multi-Language Neural Network Language ModelsAnton Ragni, Edgar Dakin, Xie Chen, Mark J. F. Gales, Kate M. Knill. 3042-3046 [doi]

Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation RestorationOttokar Tilk, Tanel Alumäe. 3047-3051 [doi]

TheanoLM - An Extensible Toolkit for Neural Network Language ModelingSeppo Enarvi, Mikko Kurimo. 3052-3056 [doi]

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition SystemsPierre Lanchantin, Mark J. F. Gales, Penny Karanasou, X. Liu, Y. Qian, L. Wang, Philip C. Woodland, C. Zhang. 3057-3061 [doi]

Manipulating Word Lattices to Incorporate Human CorrectionsYashesh Gaur, Florian Metze, Jeffrey P. Bigham. 3062-3065 [doi]

Context-Aware Restaurant Recommendation for Natural Language Queries: A Formative User Study in the Automotive DomainPhilipp Fischer, Cornelius Styp von Rekowski, Andreas Nürnberger. 3066-3070 [doi]

Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval ApplicationStephanie Pancoast, Murat Akbacak. 3071-3075 [doi]

Automatic Speech Transcription for Low-Resource Languages - The Case of Yoloxóchitl Mixtec (Mexico)Vikramjit Mitra, Andreas Kathol, Jonathan D. Amith, Rey Castillo García. 3076-3080 [doi]

Real-Time Presentation Tracking Using Semantic Keyword SpottingReza Asadi, Harriet J. Fell, Timothy W. Bickmore, Ha Trinh. 3081-3085 [doi]

Deriving Phonetic Transcriptions and Discovering Word Segmentations for Speech-to-Speech Translation in Low-Resource SettingsAndrew Wilkinson, Tiancheng Zhao, Alan W. Black. 3086-3090 [doi]

Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech RecognitionSatoshi Tsujioka, Sakriani Sakti, Koichiro Yoshino, Graham Neubig, Satoshi Nakamura. 3091-3095 [doi]

Learning Personalized Pronunciations for Contact Name RecognitionAntoine Bruguier, Fuchun Peng, Françoise Beaufays. 3096-3100 [doi]

Generation and Pruning of Pronunciation Variants to Improve ASR AccuracyZhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss. 3101-3105 [doi]

Optimizing Speech Recognition Evaluation Using Stratified SamplingJanne Pylkkönen, Thomas Drugman, Max Bisani. 3106-3110 [doi]

Ketchup, Interdisciplinarity, and the Spread of Innovation in Speech and Language ProcessingDan Jurafsky. 3111 [doi]

Context Aware Mispronunciation Detection for Mandarin Pronunciation TrainingRong Tong, Nancy F. Chen, Bin Ma, Haizhou Li. 3112-3116 [doi]

DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech ScoringJidong Tao, Lei Chen 0004, Chong Min Lee. 3117-3121 [doi]

Self-Adaptive DNN for Improving Spoken Language Proficiency AssessmentYao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft. 3122-3126 [doi]

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision TreesWei Li, Kehuang Li, Sabato Marco Siniscalchi, Nancy F. Chen, Chin-Hui Lee. 3127-3131 [doi]

Phoneme Set Design Considering Integrated Acoustic and Linguistic Features of Second Language SpeechXiaoyun Wang, Tsuneo Kato, Seiichi Yamamoto. 3132-3136 [doi]

HMM-Based Non-Native Accent Assessment Using Posterior FeaturesRamya Rasipuram, Milos Cernak, Mathew Magimai-Doss. 3137-3141 [doi]

Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese LearnersShuju Shi, Yosuke Kashiwagi, Shohei Toyama, Junwei Yue, Yutaka Yamauchi, Daisuke Saito, Nobuaki Minematsu. 3142-3146 [doi]

Multiplicity of the Acoustic Correlates of the Fortis-Lenis Contrast: Plosives in Aberystwyth EnglishMísa Hejná. 3147-3151 [doi]

Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural NetworksYossi Adi, Joseph Keshet, Olga Dmitrieva, Matthew Goldrick. 3152-3155 [doi]

L1-L2 Interference: The Case of Final Devoicing of French Voiced Fricatives in Final Position by German LearnersSucheta Ghosh, Camille Fauth, Aghilas Sini, Yves Laprie. 3156-3160 [doi]

Perceptual Salience of Voice Source Parameters in Signaling Focal ProminenceIrena Yanushevskaya, Andy Murphy, Christer Gobl, Ailbhe Ní Chasaide. 3161-3165 [doi]

Classification of Voice Modality Using Electroglottogram WaveformsMichal Borsky, Daryush D. Mehta, Julius P. Gudjohnsen, Jón Guðnason. 3166-3170 [doi]

Voice-Quality Difference Between the Vowels in Filled Pauses and Ordinary Lexical ItemsKikuo Maekawa, Hiroki Mori. 3171-3175 [doi]

Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech SynthesisYan-You Chen, Chung-Hsien Wu, Yu-Fong Huang. 3176-3180 [doi]

Direct Expressive Voice Training Based on Semantic SelectionIgor Jauk, Antonio Bonafonte. 3181-3185 [doi]

Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech SynthesisManuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi. 3186-3190 [doi]

Pause Prediction from Text for Speech Synthesis with User-Definable Pause Insertion Likelihood ThresholdNorbert Braunschweiler, Ranniery Maia. 3191-3195 [doi]

A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive TrainingQuoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 3196-3200 [doi]

Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion ApproachYibin Zheng, Ya Li, Zhengqi Wen, Xingguang Ding, Jianhua Tao. 3201-3205 [doi]

Results of The 2015 NIST Language Recognition EvaluationHui Zhao, Désiré Bansé, George R. Doddington, Craig R. Greenberg, Jaime Hernandez-Cordero, John M. Howard, Lisa P. Mason, Alvin F. Martin, Douglas A. Reynolds, Elliot Singer, Audrey Tong. 3206-3210 [doi]

The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMSKong-Aik Lee, Haizhou Li, Li Deng, Ville Hautamäki, Wei Rao, Xiong Xiao, Anthony Larcher, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Jianshu Chen, Ivan Kukanov, Amir Hossein Poorjam, Trung Ngo Trong, Chenglin Xu, Haihua Xu, Bin Ma, Eng Siong Chng, Sylvain Meignier. 3211-3215 [doi]

Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language IdentificationXugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai. 3216-3220 [doi]

Non-Iterative Parameter Estimation for Total Variability Model Using Randomized Singular Value DecompositionRuchir Travadi, Shrikanth S. Narayanan. 3221-3225 [doi]

Stacked Long-Term TDNN for Spoken Language RecognitionDaniel Garcia-Romero, Alan McCree. 3226-3230 [doi]

A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural NetworksGregory Gelly, Jean-Luc Gauvain, Viet Bac Le, Abdelkhalek Messaoudi. 3231-3235 [doi]

Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMsChiori Hori, Takaaki Hori, Shinji Watanabe, John R. Hershey. 3236-3240 [doi]

A Step Beyond Local Observations with a Dialog Aware Bidirectional GRU Network for Spoken Language UnderstandingVedran Vukotic, Christian Raymond, Guillaume Gravier. 3241-3244 [doi]

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language UnderstandingYun-Nung Chen, Dilek Hakkani-Tür, Gökhan Tür, Jianfeng Gao, Li Deng. 3245-3249 [doi]

Sequential Convolutional Neural Networks for Slot Filling in Spoken Language UnderstandingNgoc Thang Vu. 3250-3254 [doi]

A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language UnderstandingAsli Çelikyilmaz, Ruhi Sarikaya, Dilek Hakkani-Tür, Xiaohu Liu, Nikhil Ramesh, Gökhan Tür. 3255-3259 [doi]

Joint Syntactic and Semantic Analysis with a Multitask Deep Learning Framework for Spoken Language UnderstandingJérémie Tafforeau, Frédéric Béchet, Thierry Artiere, Benoît Favre. 3260-3264 [doi]

Exploiting Hidden-Layer Responses of Deep Neural Networks for Language RecognitionRuizhi Li, Sri Harish Reddy Mallidi, Lukás Burget, Oldrich Plchot, Najim Dehak. 3265-3269 [doi]

Out of Set Language Modelling in Hierarchical Language IdentificationSaad Irtza, Vidhyasaharan Sethu, Sarith Fernando, Eliathamby Ambikairajah, Haizhou Li. 3270-3274 [doi]

Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNsRyo Masumura, Taichi Asami, Hirokazu Masataki, Yushi Aono, Sumitaka Sakauchi. 3275-3279 [doi]

Gating Recurrent Enhanced Memory Neural Networks on Language IdentificationWang Geng, Yuanyuan Zhao, Wenfu Wang, Xinyuan Cai, Bo Xu. 3280-3284 [doi]

Sequence Summarizing Neural Networks for Spoken Language RecognitionJan Pesán, Lukás Burget, Jan Cernocký. 3285-3288 [doi]

The Role of Spectral Resolution in Foreign-Accented Speech PerceptionMichelle R. Kapolowicz, Vahid Montazeri, Peter F. Assmann. 3289-3293 [doi]

THU-EE System Description for NIST LRE 2015Liang He, Yao Tian, Yi Liu, Jiaming Xu, Weiwei Liu, Cai Meng, Jia Liu. 3294-3298 [doi]

Variation in Spoken North Sami LanguageKristiina Jokinen, Trung Ngo Trong, Ville Hautamäki. 3299-3303 [doi]

Improved Music Genre Classification with Convolutional Neural NetworksWeibin Zhang, Wenkang Lei, Xiangmin Xu, Xiaofeng Xing. 3304-3308 [doi]

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music SignalsGurunath Reddy M., K. Sreenivasa Rao. 3309-3313 [doi]

Long Short-Term Memory for Speaker Generalization in Supervised Speech SeparationJitong Chen, DeLiang Wang. 3314-3318 [doi]

Phonotactic Language Identification for SingingAnna M. Kruspe. 3319-3323 [doi]

Comparing the Influence of Spectro-Temporal Integration in Computational Speech SegregationThomas Bentsen, Tobias May, Abigail A. Kressner, Torsten Dau. 3324-3328 [doi]

Blind Speech Separation with GCC-NMFSean U. N. Wood, Jean Rouat. 3329-3333 [doi]

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary MaskingVahid Montazeri, Shaikat Hossain, Peter F. Assmann. 3334-3338 [doi]

Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural NetworksEmad M. Grais, Gerard Roma, Andrew J. R. Simpson, Mark D. Plumbley. 3339-3343 [doi]

Monaural Source Separation Using a Random Forest ClassifierCosimo Riday, Saurabh Bhargava, Richard H. R. Hahnloser, Shih-Chii Liu. 3344-3348 [doi]

Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source SeparationXu Li, Ziteng Wang, Xiaofei Wang, Qiang Fu, Yonghong Yan 0002. 3349-3353 [doi]

A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant EnvironmentsYanmeng Guo, Xiaofei Wang, Chao Wu, Qiang Fu, Ning Ma, Guy J. Brown. 3354-3358 [doi]

Speech Localisation in a Multitalker Mixture by Humans and MachinesNing Ma, Guy J. Brown. 3359-3363 [doi]

Reverberation-Robust One-Bit TDOA Based Moving Source Localization for Automatic Camera SteeringSundar Harshavardhan, Gokul Deepak Manavalan, T. V. Sreenivas, Chandra Sekhar Seelamantula. 3364-3368 [doi]

Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud StorageKeiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino. 3369-3373 [doi]

Phase-Aware Signal Processing for Automatic Speech RecognitionJohannes Fahringer, Tobias Schrank, Johannes Stahl, Pejman Mowlaee, Franz Pernkopf. 3374-3378 [doi]

Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech RecognitionHardik B. Sailor, Hemant A. Patil. 3379-3383 [doi]

Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and ProductionPhilip Weber, Linxue Bai, Martin J. Russell, Peter Jancovic, Stephen M. Houghton. 3384-3388 [doi]

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech RecognitionShiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai. 3389-3393 [doi]

Future Context Attention for Unidirectional LSTM Based Acoustic ModelJian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai. 3394-3398 [doi]

Hybrid Accelerated Optimization for Speech RecognitionJen-Tzung Chien, Pei-Wen Huang, Tan Lee. 3399-3403 [doi]

On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin TrainingWilliam Chan, Ian Lane. 3404-3408 [doi]

GMM-Free Flat Start Sequence-Discriminative DNN TrainingGábor Gosztolya, Tamás Grósz, László Tóth. 3409-3413 [doi]

Open-Domain Audio-Visual Speech Recognition: A Deep Learning ApproachYajie Miao, Florian Metze. 3414-3418 [doi]

Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic ModelingYuanyuan Zhao, Shuang Xu, Bo Xu. 3419-3423 [doi]

Towards Online-Recognition with Deep Bidirectional LSTM Acoustic ModelsAlbert Zeyer, Ralf Schlüter, Hermann Ney. 3424-3428 [doi]

Advances in Very Deep Convolutional Neural Networks for LVCSRTom Sercu, Vaibhava Goel. 3429-3433 [doi]

Acoustic Modelling from the Signal Domain Using CNNsPegah Ghahremani, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur. 3434-3438 [doi]

Distilling Knowledge from Ensembles of Neural Networks for Speech RecognitionYevgen Chebotar, Austin Waters. 3439-3443 [doi]

Triphone State-Tying via Deep Canonical Correlation AnalysisWeiran Wang, Hao Tang, Karen Livescu. 3444-3448 [doi]

Low-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic ModelingGil Luyet, Pranay Dighe, Afsaneh Asaei, Hervé Bourlard. 3449-3453 [doi]

Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-VectorsHao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu. 3454-3458 [doi]

Pitch-Adaptive Front-End Features for Robust Children's ASRS. Shahnawazuddin, Abhishek Dey, Rohit Sinha 0003. 3459-3463 [doi]

ASR Confidence Estimation with Speaker-Adapted Recurrent Neural NetworksMiguel Ángel del Agua, Santiago Piqueras, Adrià Giménez, Alberto Sanchís, Jorge Civera, Alfons Juan. 3464-3468 [doi]

Automatic Correction of ASR Outputs by Using Machine TranslationLuis Fernando D'Haro, Rafael E. Banchs. 3469-3473 [doi]

A Framework for Practical Multistream ASRSri Harish Reddy Mallidi, Hynek Hermansky. 3474-3478 [doi]

DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation DataNeethu Mariam Joy, Murali Karthick Baskar, S. Umesh, Basil Abraham. 3479-3483 [doi]

Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic ModelsLahiru Samarakoon, Khe Chai Sim. 3484-3488 [doi]

Speaker Normalization Through Feature Shifting of Linearly Transformed i-VectorJahyun Goo, Younggwan Kim, Hyungjun Lim, Hoirin Kim. 3489-3493 [doi]

Compositional Neural Network Language Models for Agglutinative LanguagesEbru Arisoy, Murat Saraclar. 3494-3498 [doi]

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech RecognitionBabak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier. 3499-3503 [doi]

Recurrent Neural Network Language Model with Incremental Updated Context Information Generated Using Bag-of-Words RepresentationMd. Akmal Haidar, Mikko Kurimo. 3504-3508 [doi]

Sequential Recurrent Neural Networks for Language ModelingYoussef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow. 3509-3513 [doi]

Word-Phrase-Entity Recurrent Neural Networks for Language ModelingMichael Levit, Sarangarajan Parthasarathy, Shuangyu Chang. 3514-3518 [doi]

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech RecognitionKazuki Irie, Zoltán Tüske, Tamer Alkhouli, Ralf Schlüter, Hermann Ney. 3519-3523 [doi]

Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and DinkaAmit Das, Preethi Jyothi, Mark Hasegawa-Johnson. 3524-3528 [doi]

Speed Perturbation and Vowel Duration Modeling for ASR in Hausa and Wolof LanguagesElodie Gauthier, Laurent Besacier, Sylvie Voisin. 3529-3533 [doi]

Improving the Lwazi ASR BaselineCharl Johannes van Heerden, Neil Kleynhans, Marelie H. Davel. 3534-3538 [doi]

Preliminary Experiments on Unsupervised Word Discovery in MboshiPierre Godard, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Laurent Besacier, Hélène Bonneau-Maynard, Guy-Noël Kouarata, Kevin Löser, Annie Rialland, François Yvon. 3539-3543 [doi]

Unsupervised Phoneme Segmentation of Previously Unseen LanguagesMarco Vetter, Markus Müller, Fatima Hamlaoui, Graham Neubig, Satoshi Nakamura, Sebastian Stüker, Alex Waibel. 3544-3548 [doi]

CNN-Based Phone Segmentation Experiments in a Less-Represented LanguageCéline Manenti, Thomas Pellegrini, Julien Pinquier. 3549-3553 [doi]

Part-of-Speech Tagging and Chunking in Text-to-Speech Synthesis for South African LanguagesGeorg I. Schlünz, Nkosikhona Dlamini, Rynhardt P. Kruger. 3554-3558 [doi]

The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken ZuluEwald van der Westhuizen, Thomas Niesler. 3559-3563 [doi]

A New Model of Speech Motor Control Based on Task Dynamics and State FeedbackVikram Ramanarayanan, Benjamin Parrell, Louis Goldstein, Srikantan Nagarajan, John Houde. 3564-3568 [doi]

Using a Biomechanical Model and Articulatory Data for the Numerical Production of VowelsSaeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch, Ian Stavness, Pierre Badin. 3569-3573 [doi]

A New Model for Acoustic Wave Propagation and Scattering in the Vocal TractJianguo Wei, Wendan Guan, Darcy Q. Hou, Dingyi Pan, Wenhuan Lu, Jianwu Dang. 3574-3578 [doi]

Uncontrolled Manifolds in Vowel Production: Assessment with a Biomechanical Model of the TongueAndrew Szabados, Pascal Perrier. 3579-3583 [doi]

Experimental Validation of Sound Generated from Flow in Simplified Vocal Tract Model of Sibilant /s/Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada. 3584-3587 [doi]

Bayesian Modeling in Speech Motor Control: A Principled Structure for the Integration of Various ConstraintsJean-François Patri, Pascal Perrier, Julien Diard. 3588-3592 [doi]

Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural NetworksZixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, Björn W. Schuller. 3593-3597 [doi]

Defining Emotionally Salient Regions Using Qualitative Agreement MethodSrinivas Parthasarathy, Carlos Busso. 3598-3602 [doi]

Representation Learning for Speech Emotion RecognitionSayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer. 3603-3607 [doi]

Multilingual Speech Emotion Recognition System Based on a Three-Layer ModelXingfeng Li, Masato Akagi. 3608-3612 [doi]

Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention FeaturesOzlem Kalinli. 3613-3617 [doi]

On the Correlation and Transferability of Features Between Automatic Speech Recognition and Speech Emotion RecognitionHaytham M. Fayek, Margaret Lech, Lawrence Cavedon. 3618-3622 [doi]

On the Influence of Text Content on Pass-Phrase Strength for Short-Duration Text-Dependent Automatic Speaker AuthenticationGiacomo Valenti, Adrien Daniel, Nicholas W. D. Evans. 3623-3627 [doi]

Articulation Rate Filtering of CQCC Features for Automatic Speaker VerificationMassimiliano Todisco, Héctor Delgado, Nicholas W. D. Evans. 3628-3632 [doi]

The IBM Speaker Recognition System: Recent Advances and Error AnalysisSeyed Omid Sadjadi, Jason W. Pelecanos, Sriram Ganapathy. 3633-3637 [doi]

Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker RecognitionWaad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre. 3638-3642 [doi]

Generalized Discriminant Analysis (GDA) for Improved i-Vector Based Speaker RecognitionFahimeh Bahmaninezhad, John H. L. Hansen. 3643-3647 [doi]

Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech InputYao Qian, Jidong Tao, David Suendermann-Oeft, Keelan Evanini, Alexei V. Ivanov, Vikram Ramanarayanan. 3648-3652 [doi]

Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural NetworksHuy Phan, Lars Hertel, Marco Maaß, Alfred Mertins. 3653-3657 [doi]

Audio-Based Distributional Representations of Meaning Using a Fusion of Feature EncodingsGiannis Karamanolakis, Elias Iosif, Athanasia Zlatintsi, Aggelos Pikrakis, Alexandros Potamianos. 3658-3662 [doi]

Robust DNN-Based VAD Augmented with Phone Entropy Based Rejection of Background SpeechYuya Fujita, Ken-ichi Iso. 3663-3667 [doi]

Feature Learning with Raw-Waveform CLDNNs for Voice Activity DetectionRubén Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada. 3668-3672 [doi]

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection EvaluationMartin Graciarena, Luciana Ferrer, Vikramjit Mitra. 3673-3677 [doi]

Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS ProgramDamianos Karakos, Scott Novotney, Le Zhang, Richard M. Schwartz. 3678-3682 [doi]

Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded SpeechVikramjit Mitra, Julien van Hout, Wen Wang, Chris Bartels, Horacio Franco, Dimitra Vergyri, Abeer Alwan, Adam Janin, John H. L. Hansen, Richard M. Stern, Abhijeet Sangwan, Nelson Morgan. 3683-3687 [doi]

Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term DetectionNaoki Sawada, Hiromitsu Nishizaki. 3688-3692 [doi]

Enhancing Data-Driven Phone Confusions Using Restricted RecognitionMark Kane, Julie Carson-Berndsen. 3693-3697 [doi]

Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword SearchChongjia Ni, Lei Wang, Cheung Chi Leung, Feng Rao, Li Lu, Bin Ma, Haizhou Li. 3698-3702 [doi]

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation AnalysisCheung Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li. 3703-3707 [doi]

Novel Subband Autoencoder Features for Non-Intrusive Quality Assessment of Noise Suppressed SpeechMeet H. Soni, Hemant A. Patil. 3708-3712 [doi]

SNR-Based Progressive Learning of Deep Neural Network for Speech EnhancementTian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee. 3713-3717 [doi]

A Novel Risk-Estimation-Theoretic Framework for Speech Enhancement in Nonstationary and Non-Gaussian Noise ConditionsJishnu Sadasivan, Chandra Sekhar Seelamantula. 3718-3722 [doi]

Two-Stage Temporal Processing for Single-Channel Speech EnhancementSuman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh. 3723-3727 [doi]

A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning ApproachNazreen P. M., A. G. Ramakrishnan, Prasanta Kumar Ghosh. 3728-3732 [doi]

Robust Example Search Using Bottleneck Features for Example-Based Speech EnhancementAtsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda. 3733-3737 [doi]

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural NetworksAnurag Kumar, Dinei A. F. Florencio. 3738-3742 [doi]

Perception Optimized Deep Denoising AutoEncoders for Speech EnhancementPrashanth Gurunath Shivakumar, Panayiotis G. Georgiou. 3743-3747 [doi]

HMM-Based Speech Enhancement Using Sub-Word Models and Noise AdaptationAkihiro Kato, Ben P. Milner. 3748-3752 [doi]

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy SpeechLi Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari. 3753-3757 [doi]

A priori SNR Estimation Using a Generalized Decision Directed ApproachAleksej Chinaev, Reinhold Haeb-Umbach. 3758-3762 [doi]

A DNN-HMM Approach to Non-Negative Matrix Factorization Based Speech EnhancementZiteng Wang, Xu Li, Xiaofei Wang, Qiang Fu, Yonghong Yan 0002. 3763-3767 [doi]

SNR-Aware Convolutional Neural Network Modeling for Speech EnhancementSzu-Wei Fu, Yu Tsao, Xugang Lu. 3768-3772 [doi]

An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech EnhancementKehuang Li, Bo Wu, Chin-Hui Lee. 3773-3777 [doi]

A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse RepresentationBin Liu, Jianhua Tao. 3778-3782 [doi]

Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech RecognitionVikramjit Mitra, Horacio Franco. 3783-3787 [doi]

On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic ModelsNatalia A. Tomashenko, Yuri Y. Khokhlov, Yannick Estève. 3788-3792 [doi]

Analytical Assessment of Dual-Stream Merging for Noise-Robust ASRLouis ten Bosch, Bert Cranen, Yang Sun. 3793-3797 [doi]

Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech RecognitionErfan Loweimi, Jon Barker, Thomas Hain. 3798-3802 [doi]

Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech RecognitionMasato Mimura, Shinsuke Sakai, Tatsuya Kawahara. 3803-3807 [doi]

Optimization of Speech Enhancement Front-End with Speech Recognition-Level CriterionTakuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani. 3808-3812 [doi]

Factorized Linear Input Network for Acoustic Model Adaptation in Noisy ConditionsDung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani. 3813-3817 [doi]

Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic ModelingYusuke Fujita, Ryoichi Takashima, Takeshi Homma, Masahito Togami. 3818-3822 [doi]

Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech RecognitionAnimesh Prasad, Khe Chai Sim. 3823-3827 [doi]

An Investigation on the Use of i-Vectors for Robust ASRDimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy. 3828-3832 [doi]

The Sheffield Wargame Corpus - Day Two and Day ThreeYulan Liu, Charles Fox, Madina Hasan, Thomas Hain. 3833-3837 [doi]

Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech RecognitionSuyoun Kim, Ian R. Lane. 3838-3842 [doi]

Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural NetworksWonkyum Lee, Kyu J. Han, Ian Lane. 3843-3847 [doi]

Semi-Supervised Training in Deep Learning Acoustic ModelYan Huang, Yongqiang Wang, Yifan Gong. 3848-3852 [doi]

Multilingual Data Selection for Low Resource Speech RecognitionSamuel Thomas, Kartik Audhkhasi, Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran. 3853-3857 [doi]

An Investigation on Training Deep Neural Networks Using Probabilistic TranscriptionsAmit Das, Mark Hasegawa-Johnson. 3858-3862 [doi]

Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced LanguagesVan Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson. 3863-3867 [doi]

ASR for South Slavic Languages Developed in Almost Automated WayJan Nouza, Radek Safarík, Petr Cerva. 3868-3872 [doi]

Improving Under-Resourced Language ASR Through Latent Subword Unit Space DiscoveryMarzieh Razavi, Mathew Magimai-Doss. 3873-3877 [doi]

Language Adaptive DNNs for Improved Low Resource Speech RecognitionMarkus Müller, Sebastian Stüker, Alex Waibel. 3878-3882 [doi]

Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource LanguagesTanel Alumäe, Stavros Tsakalidis, Richard M. Schwartz. 3883-3887 [doi]

MIVOQ-PTTS - A Revolutionary New Way of Thinking TTSPiero Cosi, Giulio Paci, Giacomo Sommavilla, Fabio Tesser. 3888-3889 [doi]

External Links

Cite Key

Statistics

PDF

Researchr

Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016

Abstract

Table of Contents