INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014

researchr

You are not signed in
Sign in
Sign up

Haizhou Li, Helen M. Meng, Bin Ma, Eng Siong Chng, Lei Xie, editors, INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014. ISCA, 2014. [doi]

Conference: interspeech2014

Abstract is missing.

Sound patterns in languageWilliam S.-Y. Wang. [doi]

Learning about speechAnne Cutler. [doi]

Language diversity: speech processing in a multi-lingual contextLori Lamel. [doi]

Decision learning in data science: where John Nash meets social mediaK. J. Ray Liu. [doi]

Achievements and challenges of deep learning - from speech analysis and recognition to language and multimodal processingLi Deng. [doi]

An introduction to computational networks and the computational network toolkit (invited talk)Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Christopher J. Rossbach, Jon Currey. [doi]

Language ID-based training of multilingual stacked bottleneck featuresAnne Cutler, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass. 1-5 [doi]

Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSRVan Hai Do, Xiong Xiao, Chng Eng Siong, Haizhou Li. 6-10 [doi]

Improving ASR performance on non-native speech using multilingual and crosslingual informationNgoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz. 11-15 [doi]

Language independent and unsupervised acoustic models for speech recognition and keyword spottingKate Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath. 16-20 [doi]

Cross-lingual adaptation with multi-task adaptive networksPeter Bell 0001, Joris Driesen, Steve Renals. 21-25 [doi]

On recognition of non-native speech using probabilistic lexical modelMarzieh Razavi, Mathew Magimai-Doss. 26-30 [doi]

0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulationKou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 31-35 [doi]

A target approximation intonation model for yorùbá TTSDaniel R. van Niekerk, Etienne Barnard. 36-40 [doi]

Learning continuous-valued word representations for phrase break predictionAnandaswarup Vadapalli, Kishore Prahallad. 41-45 [doi]

Improving Mandarin prosodic boundary prediction with rich syntactic featuresHao Che, Jianhua Tao, Ya Li. 46-50 [doi]

Investigating automatic & human filled pause insertion for speech synthesisRasmus Dall, Marcus Tomalin, Mirjam Wester, William J. Byrne, Simon King. 51-55 [doi]

The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speechRasmus Dall, Mirjam Wester, Martin Corley. 56-60 [doi]

Introducing i-vectors for joint anti-spoofing and speaker verificationElie el Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel. 61-65 [doi]

Random projections for large-scale speaker searchRyan Leary, Walter Andrews. 66-70 [doi]

Analysis of i-vector framework for speaker identification in TV-showsCorinne Fredouille, Delphine Charlet. 71-75 [doi]

Boosting bonsai trees for efficient features combination: application to speaker role identificationAntoine Laurent, Nathalie Camelin, Christian Raymond. 76-80 [doi]

Identifying contributors in the BBC world service archiveYves Raimond, Thomas Nixon. 81-85 [doi]

Effect of long-term ageing on i-vector speaker verificationFinnian Kelly, Rahim Saeidi, Naomi Harte, David A. van Leeuwen. 86-90 [doi]

Acoustic correlates of phonological statusMaarten Versteegh, Amanda Seidl, Alejandrina Cristia. 91-95 [doi]

Parameterization of the glottal source with the phase plane plotManu Airaksinen, Paavo Alku. 96-100 [doi]

Transcribing tone - a likelihood-based quantitative evaluation of chao's tone lettersPhil Rose. 101-105 [doi]

Intonational phonology and prosodic hierarchy in malayDiyana Hamzah, James Sneed German. 106-110 [doi]

Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for HungarianUwe D. Reichel, Katalin Mády. 111-115 [doi]

An evaluation of machine learning methods for prominence detection in FrenchGeorge Christodoulides, Mathieu Avanzi. 116-119 [doi]

Learning situated knowledge bases through dialogAasish Pappu, Alexander I. Rudnicky. 120-124 [doi]

Crowdsourcing for situated dialog systems in a moving carTeruhisa Misu. 125-129 [doi]

Evaluating coherence in open domain conversational systemsRyuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino, Yoshihiro Matsuo. 130-134 [doi]

Adapting dependency parsing to spontaneous speech for open domain spoken language understandingFrédéric Bechet, Alexis Nasr, Benoît Favre. 135-139 [doi]

Incremental on-line adaptation of POMDP-based dialogue managers to extended domainsMilica Gasic, DongHo Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, Martin Szummer, Blaise Thomson, Steve J. Young. 140-144 [doi]

Hypotheses ranking for robust domain classification and tracking in dialogue systemsJean-Philippe Robichaud, Paul A. Crook, Puyang Xu, Omar Zia Khan, Ruhi Sarikaya. 145-149 [doi]

Motor control primitives arising from a learned dynamical systems model of speech articulationVikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan. 150-154 [doi]

Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attritionChia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu. 155-158 [doi]

A unified account of prominence effects in an optimization-based model of speech timingAndreas Windmann, Juraj Simko, Petra Wagner. 159-163 [doi]

Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraintsJangwon Kim, Sungbok Lee, Shrikanth S. Narayanan. 164-168 [doi]

Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognitionPrasad Sudhakar, Prasanta Kumar Ghosh. 169-173 [doi]

Contribution of tongue lateral to consonant productionJun Wang, William Katz, Thomas F. Campbell. 174-178 [doi]

A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in MandarinMin Liu, Shuju Shi, Jin-Song Zhang. 179-183 [doi]

Vowel length impact on locus equation parameters: an investigation on jordanian ArabicMohammad Abuoudeh, Olivier Crouzet. 184-188 [doi]

Corpus-testing a fricative discriminator; or, just how invariant is this invariant?Philip J. Roberts, Henning Reetz, Aditi Lahiri. 189-192 [doi]

Modeling coarticulation in continuous speechBrian O. Bush, Alexander Kain. 193-197 [doi]

On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequencesKhalid Daoudi, Blaise Bertrac. 198-202 [doi]

Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic changeVéronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber. 203-207 [doi]

Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughterRahul Gupta, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan. 208-212 [doi]

Modeling therapist empathy through prosody in drug addiction counselingBo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 213-217 [doi]

An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based modelDaniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth S. Narayanan. 218-222 [doi]

Speech emotion recognition using deep neural network and extreme learning machineKun Han, Dong Yu, Ivan Tashev. 223-227 [doi]

An annotation scheme for sighs in spontaneous dialogueKhiet P. Truong, Gerben J. Westerhof, Franciska de Jong, Dirk Heylen. 228-232 [doi]

Speaker idiosyncratic variability of intensity across syllablesLei He, Volker Dellwo. 233-237 [doi]

Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corporaSoroosh Mariooryad, Reza Lotfian, Carlos Busso. 238-242 [doi]

Identification of age-group from children's speech by computers and humansSaeid Safavi, Martin J. Russell, Peter Jancovic. 243-247 [doi]

Theme identification in human-human conversations with features from specific speaker type hidden spacesMohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato de Mori. 248-252 [doi]

Learning phrase patterns for text classification using a knowledge graph and unlabeled dataAlex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf. 253-257 [doi]

Targeted feature dropout for robust slot filling in natural language understandingPuyang Xu, Ruhi Sarikaya. 258-262 [doi]

Spoken question answering using tree-structured conditional random fields and two-layer random walkSz-Rung Shiang, Hung-yi Lee, Lin-Shan Lee. 263-267 [doi]

Shrinkage based features for slot tagging with conditional random fieldsRuhi Sarikaya, Asli Çelikyilmaz, Anoop Deoras, Minwoo Jeong. 268-272 [doi]

Cluster based Chinese abbreviation modelingYangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang. 273-277 [doi]

Parsing named entity as syntactic structureXiantao Zhang, Dongchen Li, Xihong Wu. 278-282 [doi]

Detecting out-of-domain utterances addressed to a virtual personal assistantGökhan Tür, Anoop Deoras, Dilek Hakkani-Tür. 283-287 [doi]

Fusion of knowledge-based and data-driven approaches to grammar inductionSpiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides G. M. Petrakis, Alexandros Potamianos. 288-292 [doi]

Improving named entity recognition with prosodic featuresDenys Katerenchuk, Andrew Rosenberg. 293-297 [doi]

Neural network models for lexical addressee detectionSuman V. Ravuri, Andreas Stolcke. 298-302 [doi]

Manipulating stance and involvement using collaborative tasks: an exploratory comparisonValerie Freeman, Julian Chan, Gina-Anne Levow, Richard A. Wright, Mari Ostendorf, Victoria Zayats. 303-307 [doi]

Incremental dialog processing in a task-oriented dialogFabrizio Ghigi, Maxine Eskenazi, M. Inés Torres, Sungjin Lee. 308-312 [doi]

Detecting incorrectly-segmented utterances for posteriori restoration of turn-taking and ASR resultsNaoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano. 313-317 [doi]

Segmentation and disfluency removal for conversational speech translationHany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gökhan Tür. 318-322 [doi]

Cost-level integration of statistical and rule-based dialog managersShinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji. 323-327 [doi]

Inverse reinforcement learning for micro-turn managementDongHo Kim, Catherine Breslin, Pirros Tsiakoulis, Milica Gasic, Matthew Henderson, Steve J. Young. 328-332 [doi]

Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactionsJohn Kane, Irena Yanushevskaya, Céline De Looze, Brian Vaughan, Ailbhe Ní Chasaide. 333-337 [doi]

Long short-term memory recurrent neural network architectures for large scale acoustic modelingHasim Sak, Andrew Senior, Françoise Beaufays. 338-342 [doi]

Unfolded recurrent neural networks for speech recognitionGeorge Saon, Hagen Soltau, Ahmad Emami, Michael Picheny. 343-347 [doi]

Manifold regularized deep neural networksVikrant Singh Tomar, Richard C. Rose. 348-352 [doi]

Modeling long temporal contexts for robust DNN-based speech recognitionBo Li, Khe Chai Sim. 353-357 [doi]

A long, deep and wide artificial neural net for robust speech recognition in unknown noiseFeipeng Li, Phani S. Nidadavolu, Hynek Hermansky. 358-362 [doi]

Investigation of deep neural networks for robust recognition of nonlinearly distorted speechLadislav Seps, Jirí Málek, Petr Cerva, Jan Nouza. 363-367 [doi]

Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challengeDésiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Alvin F. Martin, Alan McCree, Mark A. Przybocki, Douglas A. Reynolds. 368-372 [doi]

Constrained speaker linkingDavid A. van Leeuwen, Niko Brümmer. 373-377 [doi]

RBM-PLDA subsystem for the NIST i-vector challengeSergey Novoselov, Timur Pekhovsky, Konstantin Simonchik, Andrey Shulipa. 378-382 [doi]

Limited labels for unlimited data: active learning for speaker recognitionStephen H. Shum, Najim Dehak, James R. Glass. 383-387 [doi]

Bayesian calibration for forensic evidence reportingNiko Brümmer, Albert Swart. 388-392 [doi]

Replicate mismatch between test/background and development databases: the impact on the performance of likelihood ratio-based forensic voice comparisonShunichi Ishihara. 393-397 [doi]

Automatic estimation of the lip radiation effect in glottal inverse filteringManu Airaksinen, Tom Bäckström, Paavo Alku. 398-402 [doi]

Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal foldsMarcelo de Oliveira Rosa. 403-407 [doi]

Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methodsHironori Takemoto, Parham Mokhtari, Tatsuya Kitamura. 408-412 [doi]

A study of invariant properties and variation patterns in the converter/distributor model for emotional speechJangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan. 413-417 [doi]

A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformationAlexander Hewer, Ingmar Steiner, Stefanie Wuhrer. 418-421 [doi]

Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative modelTokihiko Kaburagi. 422-426 [doi]

The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical loadBjörn Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang. 427-431 [doi]

Filtering and subspace selection for spectral features in detecting speech under physical stressJouni Pohjalainen, Paavo Alku. 432-436 [doi]

Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokensMing Li. 437-441 [doi]

Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load predictionHeysem Kaya, Tugçe Özkaptan, Albert Ali Salah, Sadik Fikret Gürgen. 442-446 [doi]

Ensemble of machine learning algorithms for cognitive and physical speaker load detectionHow Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang. 447-451 [doi]

Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networksGábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth. 452-456 [doi]

Revisiting the right-ear advantage for speech: implications for speech displaysNandini Iyer, Eric Thompson, Brian D. Simpson, Griffin D. Romigh. 457-461 [doi]

Comparing reaction time sequences from human participants and computational modelsLouis ten Bosch, Mirjam Ernestus, Lou Boves. 462-466 [doi]

Detecting the number of competing speakers - human selective hearing versus spectrogram distance based estimatorValentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu. 467-470 [doi]

The influence of sensory memory and attention on the context effect in talker normalizationGuo Li, Gang Peng. 471-475 [doi]

Automatic speech recognition with primarily temporal envelope informationPayton Lin, Fei Chen, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao. 476-480 [doi]

An adaptive envelope compression strategy for speech processing in cochlear implantsYing-Hui Lai, Fei Chen, Yu Tsao. 481-484 [doi]

Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBIBrian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey Shenk, Thomas Talavage, Jeff Palmer, Kristin Heaton. 485-489 [doi]

A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristicsNozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 490-494 [doi]

Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languagesDongmei Wang, James M. Kates, John H. L. Hansen. 495-498 [doi]

Vowel spectral contributions to English and Mandarin sentence intelligibilityDaniel Fogerty, Fei Chen. 499-503 [doi]

Significance of aperiodicity in the pitch perception of expressive voicesVinay Kumar Mittal, B. Yegnanarayana. 504-508 [doi]

DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languagesMirjam Wester, Maria Luisa Garcia Lecumberri, Martin Cooke. 509-513 [doi]

Cross-linguistic investigations of oral and silent readingChristophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico. 514-518 [doi]

Non-native word recognition in noise: the role of word-initial and word-final informationJuul Coumans, Roeland Van Hout, Odette Scharenborg. 519-523 [doi]

The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levelsJanice Wing Sze Wong. 524-528 [doi]

Dutch vowel production by Spanish learners: duration and spectral featuresPepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland Van Hout, Helmer Strik. 529-533 [doi]

English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memoryAngelos Lengeris, Katerina Nicolaidis. 534-538 [doi]

Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of FrenchSylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi. 539-543 [doi]

Perception of prosodic prominence and boundaries by L1 and L2 speakers of EnglishGábor Pintér, Shinobu Mizuguchi, Koichi Tateishi. 544-547 [doi]

Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged childrenRose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard. 548-552 [doi]

Phoneme category retuning in a non-native languagePolina Drozdova, Roeland Van Hout, Odette Scharenborg. 553-557 [doi]

Speech emotion recognition with cross-lingual databasesBo-Chang Chiou, Chia-Ping Chen. 558-561 [doi]

Speaker diarization using eye-gaze information in multi-party conversationsKoji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara. 562-566 [doi]

Unsupervised speaker diarization using riemannian manifold clusteringChe-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 567-571 [doi]

Towards a complete binary key system for the speaker diarization taskHéctor Delgado, Corinne Fredouille, Javier Serrano. 572-576 [doi]

An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archivesHouman Ghaemmaghami, David Dean, Sridha Sridharan. 577-581 [doi]

Speaker diarization using gesture and speechBinyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes. 582-586 [doi]

Is incremental cross-show speaker diarization efficient for processing large volumes of data?Grégor Dupuy, Sylvain Meignier, Yannick Estève. 587-591 [doi]

Detecting and labeling speakers on overlapping speech using vector taylor seriesPranay Dighe, Marc Ferras, Hervé Bourlard. 592-596 [doi]

Phoneme background model for information bottleneck based speaker diarizationSree Harsha Yella, Petr Motlícek, Hervé Bourlard. 597-601 [doi]

Diarizing large corpora using multi-modal speaker linkingMarc Ferras, Stefano Masneri, Oliver Schreer, Hervé Bourlard. 602-606 [doi]

Multimodal understanding for person recognition in video broadcastsFrédéric Béchet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoît Favre, Mickael Rouvier, Rémi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Grégory Senay, Pierre Tirilly. 607-611 [doi]

Comparing time-frequency representations for directional derivative featuresJames Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan. 612-615 [doi]

Robust speech recognition with speech enhanced deep neural networksJun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee. 616-620 [doi]

An investigation of likelihood normalization for robust ASREmmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer, Arthur Flexer. 621-625 [doi]

Identifying the human-machine differences in complex binaural scenes: what can be learned from our auditory systemConstantin Spille, Bernd T. Meyer. 626-630 [doi]

Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modellingJürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller, Gerhard Rigoll. 631-635 [doi]

Joint adaptation and adaptive training of TVWR for robust automatic speech recognitionShilin Liu, Khe Chai Sim. 636-640 [doi]

Efficient GPU-based training of recurrent neural network language models using spliced sentence bunchXie Chen, Yongqiang Wang, Xunying Liu, Mark J. F. Gales, Philip C. Woodland. 641-645 [doi]

Word pair approximation for more efficient decoding with high-order language modelsDavid Nolden, Ralf Schlüter, Hermann Ney. 646-650 [doi]

Comparing approaches to convert recurrent neural networks into backoff language models for efficient decodingHeike Adel, Katrin Kirchhoff, Ngoc Thang Vu, Dominic Telaar, Tanja Schultz. 651-655 [doi]

Removing redundancy from latticesDavid Nolden, Hagen Soltau, Daniel Povey, Pegah Ghahremani, Lidia Mangu, Hermann Ney. 656-660 [doi]

Lattice decoding and rescoring with long-Span neural network language modelsMartin Sundermeyer, Zoltán Tüske, Ralf Schlüter, Hermann Ney. 661-665 [doi]

Word-phrase-entity language models: getting more mileage out of n-gramsMichael Levit, Sarangarajan Parthasarathy, Shuangyu Chang, Andreas Stolcke, Benoît Dumoulin. 666-670 [doi]

A novel boosting algorithm for improved i-vector based speaker verification in noisy environmentsSourjya Sarkar, K. Sreenivasa Rao. 671-675 [doi]

Using deep belief networks for vector-based speaker recognitionWilliam M. Campbell. 676-680 [doi]

A deep neural network speaker verification system targeting microphone speechYun Lei, Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer. 681-685 [doi]

Application of convolutional neural networks to speaker recognition in noisy conditionsMitchell McLaren, Yun Lei, Nicolas Scheffer, Luciana Ferrer. 686-690 [doi]

SVM based speaker recognition: harnessing trials with multiple enrollment sessionsJason W. Pelecanos, Weizhong Zhu, Sibel Yaman. 691-695 [doi]

I-vector speaker verification based on phonetic information under transmission channel effectsLaura Fernandez Gallardo, Michael Wagner 0004, Sebastian Möller. 696-700 [doi]

A real-time MRI study of articulatory setting in second language speechAndrés Benítez, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan. 701-705 [doi]

Retroflex and bunched English /r/ with physical models of the human vocal tractTakayuki Arai. 706-710 [doi]

Parameterization of articulatory pattern in speakers with ALSPanying Rong, Yana Yunusova, James D. Berry, Lorne Zinman, Jordan R. Green. 711-715 [doi]

Missing samples estimation in electromagnetic articulography data using equality constrained kalman smootherP. Sujith, Prasanta Kumar Ghosh. 716-720 [doi]

Palate-referenced articulatory features for acoustic-to-articulator inversionAn Ji, Michael T. Johnson, Jeffrey Berry. 721-725 [doi]

A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulographyHidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi. 726-730 [doi]

High-level speech event analysis for cognitive load classificationClaude Montacié, Marie-José Caraty. 731-735 [doi]

On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levelsTin Lay Nwe, Trung Hieu Nguyen, Bin Ma. 736-740 [doi]

Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challengeMark Huckvale. 741-745 [doi]

The UNSW submission to INTERSPEECH 2014 compare cognitive load challengeJia Min Karen Kua, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah. 746-750 [doi]

Classification of cognitive load from speech using an i-vector frameworkMaarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, Shrikanth S. Narayanan. 751-755 [doi]

Using conditional random fields to predict focus word pair in spontaneous spoken EnglishXiao Zang, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai. 756-760 [doi]

Applications of maximum entropy rankers to problems in spoken language processingRichard Sproat, Keith B. Hall. 761-764 [doi]

Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme modelsXavi Gonzalvo, Monika Podsiadlo. 765-769 [doi]

Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesisDaiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi. 770-774 [doi]

Cross-lingual voice conversion-based polyglot speech synthesizer for indian languagesB. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan. 775-779 [doi]

An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesisQiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre. 780-784 [doi]

Chaotic mixed excitation source for speech synthesisHemant A. Patil, Tanvina B. Patel. 785-789 [doi]

Refined inter-segment joining in multi-form speech synthesisAlexander Sorin, Slava Shechtman, Vincent Pollet. 790-794 [doi]

A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis systemRan Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bing Liu, Xiaoyan Lou. 795-799 [doi]

Improving language-universal feature extraction with deep maxout and convolutional neural networksYajie Miao, Florian Metze. 800-804 [doi]

Exploiting vocal-source features to improve ASR accuracy for low-resource languagesRaul Fernandez, Jia Cui, Andrew Rosenberg, Bhuvana Ramabhadran, Xiaodong Cui. 805-809 [doi]

Data augmentation for low resource languagesAnton Ragni, Kate M. Knill, Shakti P. Rath, Mark J. F. Gales. 810-814 [doi]

About combining forward and backward-based decoders for selecting data for unsupervised training of acoustic modelsDenis Jouvet, Dominique Fohr. 815-819 [doi]

Combination of multilingual and semi-supervised training for under-resourced languagesFrantisek Grézl, Martin Karafiát. 820-824 [doi]

Investigating the learning effect of multilingual bottle-neck features for ASRNgoc Thang Vu, Jochen Weiner, Tanja Schultz. 825-829 [doi]

Distributed learning of multilingual DNN feature extractors using GPUsYajie Miao, Hao Zhang, Florian Metze. 830-834 [doi]

Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languagesShakti P. Rath, Kate M. Knill, Anton Ragni, Mark J. F. Gales. 835-839 [doi]

Recent improvements in neural network acoustic modeling for LVCSR in low resource languagesJia Cui, Bhuvana Ramabhadran, Xiaodong Cui, Andrew Rosenberg, Brian Kingsbury, Abhinav Sethy. 840-844 [doi]

Towards better performance with heterogeneous training data in acoustic modeling using deep neural networksYan Huang, Malcolm Slaney, Michael L. Seltzer, Yifan Gong. 845-849 [doi]

A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov modelsTakuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, Hirokazu Kameoka. 850-854 [doi]

Enhancing audio source separability using spectro-temporal regularization with NMFColin Vaz, Dimitrios Dimitriadis, Shrikanth S. Narayanan. 855-859 [doi]

Blind speech source localization, counting and separation for 2-channel convolutive mixtures in a reverberant environmentSayeh Mirzaei, Hugo Van Hamme, Yaser Norouzi. 860-864 [doi]

Discriminative NMF and its application to single-channel source separationFelix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe. 865-869 [doi]

Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape informationHideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino. 870-874 [doi]

A graph-based Gaussian component clustering approach to unsupervised acoustic modelingHaipeng Wang, Tan Lee, Cheung Chi Leung, Bin Ma, Haizhou Li. 875-879 [doi]

A speech system for estimating daily word countsAli Ziaei, Abhijeet Sangwan, John H. L. Hansen. 880-884 [doi]

Ensemble modeling of denoising autoencoder for speech spectrum restorationXugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori. 885-889 [doi]

Acoustic modeling with deep neural networks using raw time signal for LVCSRZoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney. 890-894 [doi]

Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditionsVikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena. 895-899 [doi]

Deep scattering spectra with deep neural networks for LVCSR tasksTara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo. 900-904 [doi]

Robust CNN-based speech recognition with Gabor filter kernelsShuo-Yiin Chang, Nelson Morgan. 905-909 [doi]

Probabilistic linear discriminant analysis with bottleneck features for speech recognitionLiang Lu, Steve Renals. 910-914 [doi]

Evaluating speech features with the minimal-pair ABX task (II): resistance to noiseThomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis R. Bach, Hynek Hermansky, Emmanuel Dupoux. 915-919 [doi]

Lateral formants in three central australian languagesMarija Tabain, Andrew Butcher, Gavan Breen, Richard Beare. 920-924 [doi]

Detecting articulatory compensation in acoustic data through linear regression modelingAlina Khasanova, Jennifer Cole, Mark Hasegawa-Johnson. 925-929 [doi]

The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakersJinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan, Steven M. Lulich. 930-934 [doi]

Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recordingNisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini, Prasanta Kumar Ghosh. 935-939 [doi]

Impact of age in the production of European Portuguese vowelsLuciana Albuquerque, Catarina Oliveira, António J. S. Teixeira, Pedro Sá Couto, João Freitas, Miguel Sales Dias. 940-944 [doi]

`houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modelingChengzhu Yu, John H. L. Hansen, Douglas W. Oard. 945-948 [doi]

Choosing useful word alternates for automatic speech recognition correction interfacesDavid Harwath, Alexander Gruenstein, Ian McGraw. 949-953 [doi]

An initial investigation of long-term adaptation for meeting transcriptionX. Chen, Mark J. F. Gales, Kate M. Knill, Catherine Breslin, Langzhou Chen, K. K. Chin, Vincent Wan. 954-958 [doi]

Progress in the BBN keyword search system for the DARPA RATS programTim Ng, Roger Hsiao, Le Zhang, Damianos Karakos, Sri Harish Reddy Mallidi, Martin Karafiát, Karel Veselý, Igor Szöke, Bing Zhang, Long Nguyen, Richard M. Schwartz. 959-963 [doi]

Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archiveJan Nouza, Petr Cerva, Jindrich Zdánský, Karel Blavka, Marek Bohac, Jan Silovský, Josef Chaloupka, Michaela Kucharová, Ladislav Seps, Jirí Málek, Michal Rott. 964-968 [doi]

Automatic assessment of children's reading with the FLaVoR decoding using a phone confusion modelEmre Yilmaz, Joris Pelemans, Hugo Van Hamme. 969-972 [doi]

RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and PortugueseM. Ali Basha Shaik, Zoltán Tüske, Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney. 973-977 [doi]

Single channel source separation with general stochastic networksMatthias Zöhrer, Franz Pernkopf. 978-982 [doi]

Large-margin conditional random fields for single-microphone speech separationYu Ting Yeung, Tan Lee, Cheung Chi Leung. 983-987 [doi]

On the use of the Watson mixture model for clustering-based under-determined blind source separationIngrid Jafari, Roberto Togneri, Sven Nordholm. 988-992 [doi]

Binary mask estimation based on frequency modulationsChung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi. 993-997 [doi]

Bayesian factorization and selection for speech and music separationPo-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien. 998-1002 [doi]

Self-adaption in single-channel source separationMichael Wohlmayr, Ludwig Mohr, Franz Pernkopf. 1003-1007 [doi]

Multichannel automatic recognition of voice command in a multi-room smart home: an experiment involving seniors and users with visual impairmentMichel Vacher, Benjamin Lecouteux, François Portet. 1008-1012 [doi]

An evaluation of unsupervised acoustic model training for a dysarthric speech interfaceOliver Walter, Vladimir Despotovic, Reinhold Haeb-Umbach, Jort F. Gemmeke, Bart Ons, Hugo Van Hamme. 1013-1017 [doi]

Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulographyJose A. Gonzalez, Lam A. Cheah, Jie Bai, Stephen R. Ell, James M. Gilbert, Roger K. Moore, Phil D. Green. 1018-1022 [doi]

Audio-visual signal processing in a multimodal assisted living environmentAlexey Karpov, Lale Akarun, Hülya Yalçin, Alexander L. Ronzhin, Baris Evrim Demiröz, Aysun Çoban, Milos Zelezný. 1023-1027 [doi]

On the selection of the impulse responses for distant-speech recognition based on contaminated speech trainingMirco Ravanelli, Maurizio Omologo. 1028-1032 [doi]

Adaptive speech recognition and dialogue management for users with speech disordersI. Casanueva, Heidi Christensen, Thomas Hain, Phil D. Green. 1033-1037 [doi]

Prediction of cognitive performance in an animal fluency task based on rate and articulatory markersBea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt. 1038-1042 [doi]

Analysis of laughter events in real science classes by using multiple environment sensor dataCarlos Toshinori Ishi, Hiroaki Hatano, Norihiro Hagita. 1043-1047 [doi]

Parallel deep neural network training for LVCSR tasks using blue gene/QTara N. Sainath, I.-Hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John A. Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra Chaudhari. 1048-1052 [doi]

Word embeddings for speech recognitionSamy Bengio, Georg Heigold. 1053-1057 [doi]

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNsFrank Seide, Hao Fu, Jasha Droppo, Gang Li, Dong Yu. 1058-1062 [doi]

Boundary contraction training for acoustic models based on discrete deep neural networksRyu Takeda, Naoyuki Kanda, Nobuo Nukaga. 1063-1067 [doi]

Restructuring output layers of deep neural networks using minimum risk parameter clusteringYotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura. 1068-1072 [doi]

Distributed asynchronous optimization of convolutional neural networksWilliam Chan, Ian Lane. 1073-1077 [doi]

Convolutional deep maxout networks for phone recognitionLászló Tóth. 1078-1082 [doi]

Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networksDongpeng Chen, Brian Mak, Sunil Sivadas. 1083-1087 [doi]

Improving semi-supervised deep neural network for keyword search in low resource languagesRoger Hsiao, Tim Ng, Le Zhang, Shivesh Ranjan, Stavros Tsakalidis, Long Nguyen, Richard M. Schwartz. 1088-1091 [doi]

Pruning deep neural networks by optimal brain damageChao Liu, Zhiyong Zhang, Dong Wang. 1092-1095 [doi]

Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systemsAnderson R. Avila, Milton Orlando Sarria Paja, Francisco J. Fraga, Douglas D. O'Shaughnessy, Tiago H. Falk. 1096-1100 [doi]

Clustering-based i-vector formulation for speaker recognitionHung-Shin Lee, Yu Tsao, Hsin-Min Wang, Shyh-Kang Jeng. 1101-1105 [doi]

Speaker recognition via fusion of subglottal features and MFCCsHarish Arsikere, Hitesh Anand Gupta, Abeer Alwan. 1106-1110 [doi]

The NIST SRE summed channel speaker recognition systemHanwu Sun, Bin Ma. 1111-1114 [doi]

Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCsLaura Fernández Gallardo, Michael Wagner 0004, Sebastian Möller. 1115-1119 [doi]

Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem featuresMing Li, Wenbo Liu. 1120-1124 [doi]

Feature Switching in the i-vector framework for speaker verificationT. Asha, M. S. Saranya, D. S. Karthik Pandia, Srikanth R. Madikeri, Hema A. Murthy. 1125-1129 [doi]

PLDA modeling in the fishervoice subspace for speaker verificationJinghua Zhong, Weiwu Jiang, Wei Rao, Man-Wai Mak, Helen M. Meng. 1130-1134 [doi]

Performance factor analysis for the 2012 NIST speaker recognition evaluationAlvin F. Martin, Craig S. Greenberg, Vincent M. Stanford, John M. Howard, George R. Doddington, John J. Godfrey. 1135-1138 [doi]

Simultaneous gender classification and voice activity detection using deep neural networksHiroshi Fujimura. 1139-1143 [doi]

Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptronsAhmed Hussen Abdelaziz, Dorothea Kolossa. 1144-1148 [doi]

Lipreading using convolutional neural networkKuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata. 1149-1153 [doi]

Lipreading approach for isolated digits recognition under whisper and neutral speechFei Tao, Carlos Busso. 1154-1158 [doi]

Multimodal exemplar-based voice conversion using lip features in noisy environmentsKenta Masaka, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki. 1159-1163 [doi]

Towards a practical silent speech recognition systemYunbin Deng, James T. Heaton, Geoffrey S. Meltzner. 1164-1168 [doi]

Enhancing multimodal silent speech interfaces with feature selectionJoão Freitas, Artur J. Ferreira, Mário A. T. Figueiredo, António Teixeira, Miguel Sales Dias. 1169-1173 [doi]

Opti-speech: a real-time, 3d visual feedback system for speech trainingWilliam Katz, Thomas F. Campbell, Jun Wang, Eric Farrar, J. Coleman Eubanks, Arvind Balasubramanian, Balakrishnan Prabhakaran, Rob Rennaker. 1174-1178 [doi]

Across-speaker articulatory normalization for speaker-independent silent speech recognitionJun Wang, Ashok Samal, Jordan R. Green. 1179-1183 [doi]

Conversion from facial myoelectric signals to speech: a unit selection approachMarlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz. 1184-1188 [doi]

Towards real-life application of EMG-based speech recognition by using unsupervised adaptationMichael Wand, Tanja Schultz. 1189-1193 [doi]

Simple gesture-based error correction interface for smartphone speech recognitionYuan Liang, Koji Iwano, Koichi Shinoda. 1194-1198 [doi]

Normalization of ASR confidence classifier scores via confidence mappingKshitiz Kumar, Chaojun Liu, Yifan Gong. 1199-1203 [doi]

Neural network phone duration model for speech recognitionTanel Alumäe. 1204-1208 [doi]

Sequence discriminative distributed training of long short-term memory recurrent neural networksHasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Z. Mao. 1209-1213 [doi]

Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognitionZhen Huang, Jinyu Li, Chao Weng, Chin-Hui Lee. 1214-1218 [doi]

A comparison of training approaches for discriminative segmental modelsHao Tang, Kevin Gimpel, Karen Livescu. 1219-1223 [doi]

Asynchronous stochastic optimization for sequence training of deep neural networks: towards big dataErik McDermott, Georg Heigold, Pedro J. Moreno, Andrew Senior, Michiel Bacchiani. 1224-1228 [doi]

Detection of children's paralinguistic events in interaction with caregiversHrishikesh Rao, Jonathan C. Kim, Mark A. Clements, Agata Rozga, Daniel S. Messinger. 1229-1233 [doi]

Age and rhythmic variations: a study on ItalianMassimo Pettorino, Elisa Pellegrino. 1234-1237 [doi]

Probabilistic acoustic volume analysis for speech affected by depressionNicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski. 1238-1242 [doi]

Exploring modulation spectrum features for speech-based depression level classificationElif Bozkurt, Orith Toledo-Ronen, Alexander Sorin, Ron Hoory. 1243-1247 [doi]

Automatic modelling of depressed speech: relevant features and relevance of genderFlorian Hönig, Anton Batliner, Elmar Nöth, Sebastian Schnieder, Jarek Krajewski. 1248-1252 [doi]

Excitation source features for discrimination of anger and happy emotionsP. Gangamohan, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana. 1253-1257 [doi]

Encoding linear models as weighted finite-state transducersKe Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark. 1258-1262 [doi]

Structured soft margin confidence weighted learning for grapheme-to-phoneme conversionKeigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura. 1263-1267 [doi]

Unsupervised language filtering using the latent dirichlet allocationWei Zhang, Robert A. J. Clark, Yongyuan Wang. 1268-1272 [doi]

Generating multiple-accent pronunciations for TTS using joint sequence model interpolationBalaKrishna Kolluru, Vincent Wan, Javier Latorre, Kayoko Yanagisawa, Mark J. F. Gales. 1273-1277 [doi]

Using a hybrid approach to build a pronunciation dictionary for Brazilian PortugueseGustavo Mendonça, Sandra M. Aluísio. 1278-1282 [doi]

A flexible front-end for HTSMatthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, Thomas Merritt. 1283-1287 [doi]

Cross-language perception of Japanese singleton and geminate consonants: preliminary data from non-native learners of Japanese and native speakers of Italian and australian EnglishKimiko Tsukada, Felicity Cox, John Hajek. 1288-1292 [doi]

Difficulty in discriminating non-native vowels: are Dutch vowels easier for australian English than Spanish listeners?Samra Alispahic, Paola Escudero, Karen E. Mulak. 1293-1296 [doi]

Acoustic properties of shared vowels in bilingual Mandarin-English childrenJing Yang, Robert Allen Fox. 1297-1301 [doi]

Generating segmental foreign accentMaria Luisa Garcia Lecumberri, Roberto Barra-Chicote, Rubén Pérez Ramón, Junichi Yamagishi, Martin Cooke. 1302-1306 [doi]

Differences of pitch profiles in Germanic and slavic languagesBistra Andreeva, Grazyna Demenko, Bernd Möbius, Frank Zimmerer, Jeanin Jügler, Magdalena Oleskowicz-Popiel. 1307-1311 [doi]

The obligatory contour principle in african and European varieties of FrenchMathieu Avanzi, Guri Bordal, Gélase Nimbona. 1312-1316 [doi]

Content matching for short duration speaker recognitionNicolas Scheffer, Yun Lei. 1317-1321 [doi]

Extended RSR2015 for text-dependent speaker verification over VHF channelAnthony Larcher, Kong-Aik Lee, Pablo Luis Sordo Martinez, Trung Hieu Nguyen, Bin Ma, Haizhou Li. 1322-1326 [doi]

Tandem deep features for text-dependent speaker verificationTianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu. 1327-1331 [doi]

In-domain versus out-of-domain training for text-dependent JFAPatrick Kenny, Themos Stafylakis, Md. Jahangir Alam, Pierre Ouellet, Marcel Kockmann. 1332-1336 [doi]

Domain adaptation for text dependent speaker verificationHagai Aronowitz, Asaf Rendel. 1337-1341 [doi]

Factor analysis with sampling methods for text dependent speaker recognitionAntonio Miguel, Jesús A. Villalba, Alfonso Ortega, Eduardo Lleida, Carlos Vaquero. 1342-1346 [doi]

Dictionary-based pitch tracking with dynamic programmingEwout van den Berg, Bhuvana Ramabhadran. 1347-1351 [doi]

Acoustic features for robust classification of Mandarin tonesHongbing Hu, Stephen A. Zahorian, Peter Guzewich, Jiang Wu. 1352-1356 [doi]

Preservation of lexical tones in singing in a tone languageAnastasia Karlsson, Håkan Lundström, Jan-Olof Svantesson. 1357-1360 [doi]

Emotional speech classification using adaptive sinusoidal modellingTheodora Yakoumaki, George P. Kafentzis, Yannis Stylianou. 1361-1365 [doi]

Formant enhancement based speech watermarking for tampering detectionShengbei Wang, Masashi Unoki, Nam Soo Kim. 1366-1370 [doi]

Modelling primitive streaming of simple tone sequences through factorisation of modulation pattern tensorsTom Barker, Hugo Van Hamme, Tuomas Virtanen. 1371-1375 [doi]

Detection of vowel onset points in voiced aspirated sounds of indian languagesBiswajit Dev Sarma, S. R. M. Prasanna. 1376-1380 [doi]

Accuracy evaluation of esophageal voice analysis based on automatic topology generated-voicing source HMMAkira Sasou. 1381-1385 [doi]

Audio watermarking based on multiple echoes hiding for FM radioXuejun Zhang, Xiang Xie. 1386-1390 [doi]

Development of bilingual ASR system for MediaParl corpusPetr Motlícek, David Imseng, Milos Cernak, Namhoon Kim. 1391-1394 [doi]

Investigation of cross-lingual bottleneck features in hybrid ASR systemsJie Li, Rong Zheng, Bo Xu. 1395-1399 [doi]

Language identification of individual words with joint sequence modelsOluwapelumi Giwa, Marelie H. Davel. 1400-1404 [doi]

Audio-to-text alignment for speech recognition with very limited resourcesXavier Anguera, Jordi Luque, Ciro Gracia. 1405-1409 [doi]

A minimal-resource transliteration framework for vietnameseHoang Gia Ngo, Nancy F. Chen, Sunil Sivadas, Bin Ma, Haizhou Li. 1410-1414 [doi]

Combining recurrent neural networks and factored language models during decoding of code-Switching speechHeike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff, Tanja Schultz. 1415-1419 [doi]

Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languagesZoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter, Hermann Ney. 1420-1424 [doi]

Mixture of latent words language models for domain adaptationRyo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi. 1425-1429 [doi]

Improving spoken document retrieval by unsupervised language model adaptation using utterance-based web searchRobert Herms, Marc Ritter, Thomas Wilhelm-Stein, Maximilian Eibl. 1430-1433 [doi]

The nested indian buffet process for flexible topic modelingJen-Tzung Chien, Ying-Lan Chang. 1434-1437 [doi]

Automated closed captioning for Russian live broadcastingKirill Levin, Irina Ponomareva, A. Bulusheva, German Chernykh, Ivan Medennikov, Nickolay Merkin, A. Prudnikov, Natalia A. Tomashenko. 1438-1442 [doi]

Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transferLei Wang 0020, Rong Tong. 1443-1447 [doi]

Pronunciation learning for named-entities through crowd-sourcingAttapol T. Rutherford, Fuchun Peng, Françoise Beaufays. 1448-1452 [doi]

Pronunciation variation in read and conversational austrian GermanBarbara Schuppler, Martine Adda-Decker, Juan Andres Morales-Cordovilla. 1453-1457 [doi]

Discriminative pronunciation modeling for dialectal speech recognitionMaider Lehr, Kyle Gorman, Izhak Shafran. 1458-1462 [doi]

The goodness of pronunciation algorithm applied to disordered speechThomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas, Marina Robert. 1463-1467 [doi]

Using deep neural networks to improve proficiency assessment for children English language learnersAngeliki Metallinou, Jian Cheng. 1468-1472 [doi]

Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM)Han Lu, Sheng-syun Shen, Sz-Rung Shiang, Hung-yi Lee, Lin-Shan Lee. 1473-1477 [doi]

A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learnersRicheng Duan, Jinsong Zhang, Wen Cao, Yanlu Xie. 1478-1481 [doi]

3d tongue motion visualization based on ultrasound image sequencesKele Xu, Yin Yang, A. Jaumard-Hakoun, Martine Adda-Decker, Angélique Amelot, Samer Al Kork, Lise Crevier-Buchman, P. Chawah, Gérard Dreyfus, Thibaut Fux, Claire Pillot-Loiseau, Pierre Roussel, M. Stone, Bruce Denby. 1482-1483 [doi]

Listen with your skin: aerotak speech perception enhancement systemDonald Derrick, Tom De Rybel, Greg A. O'Beirne, Jennifer Hay. 1484-1485 [doi]

Speech assistant systemLászló Czap. 1486-1487 [doi]

Spoken dialogue system for restaurant recommendation and reservationRafael E. Banchs, Seokhwan Kim. 1488-1489 [doi]

Interlingual map task corpus collectionHayakawa Akira, Nick Campbell, Saturnino Luz. 1490-1491 [doi]

A client mobile application for Chinese-Spanish statistical machine translationJordi Centelles, Marta R. Costa-Jussà, Rafael E. Banchs. 1492-1493 [doi]

LuciawebGL: a new WebGL-Based talking headAlberto Benin, Piero Cosi, Giuseppe Riccardo Leone, Giulio Paci. 1494-1495 [doi]

Crowdee: mobile crowdsourcing micro-task platform for celebrating the diversity of languagesBabak Naderi, Tim Polzehl, Andre Beyer, Tibor Pilz, Sebastian Möller. 1496-1497 [doi]

On the use of the `pure data' programming language for teaching and public outreach in speech processingRoger K. Moore. 1498-1499 [doi]

Syncwords: a platform for semi-automated closed captioning and subtitlesAleksandr Dubinsky. 1500-1501 [doi]

4allRobert A. J. Clark. 1502-1503 [doi]

Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speechGustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, Simon King. 1504-1508 [doi]

Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesisThomas Merritt, Tuomo Raitio, Simon King. 1509-1513 [doi]

Voice expression conversion with factorised HMM-TTS modelsJavier Latorre, Vincent Wan, Kayoko Yanagisawa. 1514-1518 [doi]

Noise-robust TTS speaker adaptation with statistics smoothingKayoko Yanagisawa, Langzhou Chen, Mark J. F. Gales. 1519-1523 [doi]

Speech synthesis in various communicative situations: impact of pronunciation variationsSandrine Brognaux, Benjamin Picart, Thomas Drugman. 1524-1528 [doi]

Formant-controlled speech synthesis using hidden trajectory modelMing-Qi Cai, Zhen-Hua Ling, Li-Rong Dai. 1529-1533 [doi]

Boosted deep neural networks and multi-resolution cochleagram features for voice activity detectionXiao-lei Zhang, DeLiang Wang. 1534-1538 [doi]

Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detectionAbhay Prasad, Prasanta Kumar Ghosh, Shrikanth S. Narayanan. 1539-1543 [doi]

Speech activity detection for NASA apollo space missions: challenges and solutionsAli Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen, Douglas W. Oard. 1544-1548 [doi]

Towards improving statistical model based voice activity detectionMing Tu, Xiang Xie, Yishan Jiao. 1549-1552 [doi]

The use of low-frequency ultrasound for voice activity detectionIan Vince McLoughlin. 1553-1557 [doi]

Improving the speech activity detection for the DARPA RATS phase-3 evaluationJeff Ma. 1558-1562 [doi]

Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitationDuc Le, Emily Mower Provost. 1563-1567 [doi]

Ranking severity of speech errors by their phonological impact in contextSofia Strömbergsson, Christina Tånnander, Jens Edlund. 1568-1572 [doi]

Automatic detection of parkinson's disease from words uttered in three different languagesJuan R. Orozco-Arroyave, Florian Hönig, Julián D. Arias-Londoño, Jesus Francisco Vargas Bonilla, Sabine Skodda, Jan Rusz, Elmar Nöth. 1573-1577 [doi]

Automating an objective measure of pediatric speech intelligibilityJason Lilley, Susan Nittrouer, H. Timothy Bunnell. 1578-1582 [doi]

A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speechMostafa Ali Shahin, Beena Ahmed, Jacqueline McKechnie, Kirrie J. Ballard, Ricardo Gutierrez-Osuna. 1583-1587 [doi]

Acoustic and kinematic characteristics of vowel production through a virtual vocal tract in dysarthriaJeff Berry, Andrew Kolb, Cassandra North, Michael T. Johnson. 1588-1592 [doi]

The EMG-UKA corpus for electromyographic speech processingMichael Wand, Matthias Janke, Tanja Schultz. 1593-1597 [doi]

A whispered Mandarin corpus for speech technology applicationsPei Xuan Lee, Darren Wee, Hilary Si Yin Toh, Boon Pang Lim, Nancy F. Chen, Bin Ma. 1598-1602 [doi]

Euronews: a multilingual benchmark for ASR and LIDRoberto Gretter. 1603-1607 [doi]

ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece)Antigoni Tsiami, Isidoros Rodomagoulakis, Panagiotis Giannoulis, Athanasios Katsamanis, Gerasimos Potamianos, Petros Maragos. 1608-1612 [doi]

The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphonesMarco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis, Mirco Ravanelli. 1613-1617 [doi]

Verbal description of LEGO blocksDiogo Henriques, Isabel Trancoso, Daniel Mendes, Alfredo Ferreira. 1618-1622 [doi]

Phase importance in speech processing applicationsPejman Mowlaee, Rahim Saeidi, Yannis Stylianou. 1623-1627 [doi]

Phase-based harmonic/percussive separationEstefanía Cano, Mark D. Plumbley, Christian Dittmar. 1628-1632 [doi]

Phase distortion statistics as a representation of the glottal source: application to the classification of voice qualitiesGilles Degottex, Nicolas Obin. 1633-1637 [doi]

A measure of phase randomness for the harmonic model in speech synthesisGilles Degottex, Daniel Erro. 1638-1642 [doi]

Enhancement of speech intelligibility in near-end noise conditions with phase modificationEmma Jokinen, Marko Takanen, Hannu Pulakka, Paavo Alku. 1643-1647 [doi]

A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimationS. Aswin Shanmugam, Hema Murthy. 1648-1652 [doi]

The importance of phase on voice quality assessmentMaria Koutsogiannaki, Olympia Simantiraki, Gilles Degottex, Yannis Stylianou. 1653-1657 [doi]

Feature extraction from analytic phase of speech signals for speaker verificationKarthika Vijayan, Vinay Kumar, K. Sri Rama Murty. 1658-1662 [doi]

A cross-vocoder study of speaker independent synthetic speech detection using phase informationJon Sánchez, Ibon Saratxaga, Inma Hernáez, Eva Navas, Daniel Erro. 1663-1667 [doi]

Investigating the effect of F0 and vocal intensity on harmonic magnitudes: data from high-speed laryngeal videoendoscopyGang Chen 0009, Soo-Jin Park, Jody Kreiman, Abeer Alwan. 1668-1672 [doi]

Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictationElisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo, Nelly Barbot, Olivier Rosec. 1673-1677 [doi]

The articulation of lexical and post-lexical palatalization in KoreanJae-Hyun Sung. 1678-1682 [doi]

Articulation and neutralization: a preliminary study of lenition in scottish gaelicDiana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond, Andrew Carnie. 1683-1687 [doi]

Nasality in speech and its contribution to speaker individualityKanae Amino, Hisanori Makinae, Tatsuya Kitamura. 1688-1692 [doi]

Is speech rhythm an intrinsic property of language?Jason Brown, Eden Matene. 1693-1697 [doi]

Where /ar/ the /r/s in standard austrian German?Anke Jackschina, Barbara Schuppler, Rudolf Muhr. 1698-1702 [doi]

Diphthongized vowels in the yi county hui Chinese dialectFang Hu, Minghui Zhang. 1703-1707 [doi]

Rhythmic variability between some asian languages: results from an automatic analysis of temporal characteristicsVolker Dellwo, Peggy Mok, Mathias Jenny. 1708-1711 [doi]

Listener estimation of speaker age based on whispered speechAngelika Braun, Daniela Decker. 1712-1716 [doi]

The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noiseBenjawan Kasisopa, Virginie Attina, Denis Burnham. 1717-1721 [doi]

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detectionPeng Yang, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li. 1722-1726 [doi]

Recent improvements in SRI's keyword detection system for noisy audioJulien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal, Horacio Franco. 1727-1731 [doi]

Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queriesMitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai. 1732-1736 [doi]

Unsupervised spoken word retrieval using Gaussian-bernoulli restricted boltzmann machinesRaghavendra Reddy Pappagari, Shekhar Nayak, K. Sri Rama Murty. 1737-1741 [doi]

Unsupervised query-by-example spoken term detection using bag of acoustic words and non-segmental dynamic time warpingBasil George, Abhijeet Saxena, Gautam Varma Mantena, Kishore Prahallad, B. Yegnanarayana. 1742-1746 [doi]

An empirical study of multilingual and low-resource spoken term detection using deep neural networksJie Li, Xiaorui Wang, Bo Xu. 1747-1751 [doi]

Diagnostic techniques for spoken keyword discoveryPeter F. Schulam, Murat Akbacak. 1752-1756 [doi]

Robust retrieval models for false positive errors in spoken documentsSho Kawasaki, Tomoyosi Akiba. 1757-1761 [doi]

Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual featuresYuan-ming Liou, Yi-Sheng Fu, Hung-yi Lee, Lin-Shan Lee. 1762-1766 [doi]

Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterionGuillaume Gravier, Nathan Souviraà-Labastie, Sebastien Campion, Frédéric Bimbot. 1767-1771 [doi]

Semantically based search in a social speech taskFernando García, Emilio Sanchis, Ferran Pla. 1772-1776 [doi]

Study of changes in glottal vibration characteristics during laughterVinay Kumar Mittal, B. Yegnanarayana. 1777-1781 [doi]

On predicting the unpleasantness level of a sound eventStavros Ntalampiras, Ilyas Potamitis. 1782-1785 [doi]

Predicting when to laugh with structured classificationBilal Piot, Olivier Pietquin, Matthieu Geist. 1786-1790 [doi]

Conversational structures affecting auditory likeabilityBenjamin Weiss, Katrin Schoenenberg. 1791-1795 [doi]

Towards the adaptation of prosodic models for expressive text-to-speech synthesisMathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth Delais-Roussarie, Nelly Barbot. 1796-1800 [doi]

Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpusSho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura. 1801-1805 [doi]

Learning L2 prosody is more difficult than you realize - F0 characteristics and chunking size of L1 English, TW L2 English and TW L1 MandarinChiu-yu Tseng, Chao-yu Su. 1806-1810 [doi]

Investigating prosodic relations between initiating and responding laughsKhiet P. Truong, Jürgen Trouvain. 1811-1815 [doi]

Application of image processing methods to filled pauses detection from spontaneous speechDmytro Prylipko, Olga Egorow, Ingo Siegert, Andreas Wendemuth. 1816-1820 [doi]

Perception of sentence stress in English infant directed speechSofoklis Kakouros, Okko Räsänen. 1821-1825 [doi]

Automatic recognition of attitudes in video blogs - prosodic and visual feature analysisNoor Alhusna Madzlan, Jing Guang Han, Francesca Bonin, Nick Campbell. 1826-1830 [doi]

"was that your mother on the phone?": classifying interpersonal relationships between dialog participants with lexical and acoustic propertiesDenys Katerenchuk, David-Guy Brizan, Andrew Rosenberg. 1831-1835 [doi]

Combining source and system information for limited data speaker verificationRohan Kumar Das, S. Abhiram, S. R. M. Prasanna, A. G. Ramakrishnan. 1836-1840 [doi]

New insight into the use of phone log-likelihood ratios as features for language recognitionMireia Díez, Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel. 1841-1845 [doi]

Robust language identification using convolutional neural network featuresSriram Ganapathy, Kyu Jeong Han, Samuel Thomas, Mohamed Kamal Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan. 1846-1850 [doi]

Acoustic feature transformation using UBM-based LDA for speaker recognitionChengzhu Yu, Gang Liu, John H. L. Hansen. 1851-1854 [doi]

SNR-dependent mixture of PLDA for noise robust speaker verificationMan-Wai Mak. 1855-1859 [doi]

Nearest neighbor discriminant analysis for robust speaker recognitionSeyed Omid Sadjadi, Jason W. Pelecanos, Weizhong Zhu. 1860-1864 [doi]

Enhanced language modeling for extractive speech summarization with sentence relatedness informationShih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu. 1865-1869 [doi]

I-vector based representation of highly imperfect automatic transcriptionsMohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Driss Matrouf, Renato de Mori. 1870-1874 [doi]

Incorporating lexical and prosodic information at different levels for meeting summarizationCatherine Lai, Steve Renals. 1875-1879 [doi]

Subspace Gaussian mixture models for dialogues classificationMohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato de Mori. 1880-1884 [doi]

Factor analysis based semantic variability compensation for automatic conversation representationMohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato de Mori. 1885-1889 [doi]

Speech cohesion for topic segmentation of spoken contentsAbdessalam Bouchekif, Géraldine Damnati, Delphine Charlet. 1890-1894 [doi]

A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov modelsYan Huang, Dong Yu, Chaojun Liu, Yifan Gong. 1895-1899 [doi]

Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognitionMichiel Bacchiani, Andrew Senior, Georg Heigold. 1900-1904 [doi]

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid modelsNavdeep Jaitly, Vincent Vanhoucke, Geoffrey E. Hinton. 1905-1909 [doi]

Learning small-size DNN with output-distribution-based criteriaJinyu Li, Rui Zhao, Jui-Ting Huang, Yifan Gong. 1910-1914 [doi]

Ensemble deep learning for speech recognitionLi Deng, John C. Platt. 1915-1919 [doi]

Learning conditional random field with hierarchical representations for dialogue act recognitionYucan Zhou, Qinghua Hu, Jie Liu, Yuan Jia. 1920-1923 [doi]

Can adolescents with autism perceive emotional prosody?Cristiane Hsu, Yi Xu. 1924-1928 [doi]

Age, hearing loss and the perception of affective utterances in conversational speechJuliane Schmidt, Esther Janse, Odette Scharenborg. 1929-1933 [doi]

Analysis of emotional effect on speech-body gesture interplayZhaojun Yang, Shrikanth S. Narayanan. 1934-1938 [doi]

When voices get emotional: a study of emotion-enhanced memory and impairment during emotional prosody exposureCyrielle Chappuis, Didier Grandjean. 1939-1943 [doi]

Perception of pitch tails at potential turn boundaries in SwedishMargaret Zellers. 1944-1948 [doi]

Towards a perceptual model of speech rhythm: integrating the influence of f0 on perceived durationRobert Fuchs. 1949-1953 [doi]

DNN-based stochastic postfilter for HMM-based speech synthesisLing-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, Zhen-Hua Ling. 1954-1958 [doi]

Statistical parametric speech synthesis using weighted multi-distribution deep belief networkShiyin Kang, Helen M. Meng. 1959-1963 [doi]

TTS synthesis with bidirectional LSTM based recurrent neural networksYuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong. 1964-1968 [doi]

Deep neural network based trainable voice source model for synthesis of speech with varying vocal effortTuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku. 1969-1973 [doi]

h/ lenition in brunei MandarinShufang Xu. 1974-1977 [doi]

Mapping emotions into acoustic space: the role of voice qualityTing Wang, Hongwei Ding, Jianjing Kuang, Qiuwu Ma. 1978-1982 [doi]

Principal components of auditory spectro-temporal receptive fieldsNagaraj Mahajan, Nima Mesgarani, Hynek Hermansky. 1983-1987 [doi]

Segmentation in singer turns with the Bayesian information criterionMarwa Thlithi, Thomas Pellegrini, Julien Pinquier, Régine André-Obrecht. 1988-1992 [doi]

Mappings between vocal tract area functions, vocal tract resonances and speech formants for multiple speakersCatherine I. Watson. 1993-1997 [doi]

A next step towards measuring perceived quality of speech through physiologySebastian Arndt, Markus Wenzel, Jan-Niklas Antons, Friedemann Köster, Sebastian Möller, Gabriel Curio. 1998-2001 [doi]

Effect of spectral degradation to the intelligibility of vowel sentencesFei Chen, Sharon W. K. Wong, Lena L. N. Wong. 2002-2005 [doi]

Consonant context effects on vowel sensorimotor adaptationJeffrey Berry, John Jaeger IV, Melissa Wiedenhoeft, Brittany Bernal, Michael T. Johnson. 2006-2010 [doi]

Assessing objective characterizations of phonetic convergenceGérard Bailly, Amélie Martin. 2011-2015 [doi]

Generalizing time-frequency importance functions across noises, talkers, and phonemesMichael I. Mandel, Sarah E. Yoho, Eric W. Healy. 2016-2020 [doi]

Does elderly speech recognition in noise benefit from spectral and visual cues?Yatin Mahajan, Jeesun Kim, Chris Davis. 2021-2025 [doi]

On the conversant-specificity of stochastic turn-taking modelsKornel Laskowski. 2026-2030 [doi]

Single-ended estimation of speech intelligibility using the ITU p.563 feature setToshihiro Sakano, Yosuke Kobayashi, Kazuhiro Kondo. 2031-2035 [doi]

Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speechEmma Jokinen, Ulpu Remes, Marko Takanen, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku. 2036-2040 [doi]

Analyzing perceptual dimensions of conversational speech qualityFriedemann Köster, Sebastian Möller. 2041-2045 [doi]

Interplay of informational content and energetic masking in speech perception in noiseVincent Aubanel, Chris Davis, Jeesun Kim. 2046-2049 [doi]

On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancementTudor-Catalin Zorila, Yannis Stylianou. 2050-2054 [doi]

Objective quality evaluation of noise-suppressed speech: effects of temporal envelope and fine-structure cuesFei Chen, Yi Hu. 2055-2058 [doi]

Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listenersDongmei Wang, Philipos C. Loizou, John H. L. Hansen. 2059-2062 [doi]

Using linguistic predictability and the lombard effect to increase the intelligibility of synthetic speech in noiseCassia Valentini-Botinhao, Mirjam Wester. 2063-2067 [doi]

Speech pre-enhancement using a discriminative microscopic intelligibility modelMaryam Al Dabel, Jon Barker. 2068-2072 [doi]

Least squares signal declipping for robust speech recognitionMark J. Harvilla, Richard M. Stern. 2073-2077 [doi]

Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systemsHaihua Xu, Hang Su, Chng Eng Siong, Haizhou Li. 2078-2082 [doi]

A big data approach to acoustic model training corpus selectionOlga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan. 2083-2087 [doi]

Recent advances in ASR applied to an Arabic transcription system for Al-JazeeraPatrick Cardinal, Ahmed M. Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel. 2088-2092 [doi]

rwthlm - the RWTH aachen university neural network language modeling toolkitMartin Sundermeyer, Ralf Schlüter, Hermann Ney. 2093-2097 [doi]

Language modeling with sum-product networksWei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, Kian Ming Adam Chai. 2098-2102 [doi]

Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel programXiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel. 2103-2107 [doi]

Cross-language transfer of semantic annotation via targeted crowdsourcingShammur Absar Chowdhury, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Ioannis Klasinas. 2108-2112 [doi]

Probabilistic enrichment of knowledge graph entities for relation detection in conversational understandingDilek Hakkani-Tür, Asli Çelikyilmaz, Larry P. Heck, Gökhan Tür, Geoffrey Zweig. 2113-2117 [doi]

Automatic speech recognition and translation of a Swiss German dialect: WalliserdeutschPhilip N. Garner, David Imseng, Thomas Meyer. 2118-2122 [doi]

Building resources for Algerian Arabic dialectsS. Harrat, Karima Meftouh, Mourad Abbas, Kamel Smaili. 2123-2127 [doi]

An educational platform to capture, visualize and analyze rare singingP. Chawah, Samer Al Kork, Thibaut Fux, Martine Adda-Decker, Angélique Amelot, Nicolas Audibert, Bruce Denby, Gérard Dreyfus, A. Jaumard-Hakoun, Claire Pillot-Loiseau, Pierre Roussel, M. Stone, Kele Xu, Lise Crevier-Buchman. 2128-2129 [doi]

Single-channel speech enhancement based on non-negative matrix factorization and online noise adaptationKwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, Myung Kyu Choi. 2130-2131 [doi]

Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese opera singerDieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo. 2132-2133 [doi]

Iterative refinement of amplitude and phase in single-channel speech enhancementPejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi. 2134-2135 [doi]

elite-HTS: a NLP tool for French HMM-based speech synthesisSophie Roekhaut, Sandrine Brognaux, Richard Beaufort, Thierry Dutoit. 2136-2137 [doi]

SARA - singapore's automated responsive assistant for the touristic domainAndreea I. Niculescu, Rafael E. Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar. 2138-2139 [doi]

The speech recognition virtual kitchen: launch partyAndrew R. Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, Rebecca Bates. 2140-2141 [doi]

System for automated speech and language analysis (SALSA)Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, S. Thomas Christie, Serguei V. S. Pakhomov. 2142-2143 [doi]

Pronunciation practice support system for children who have difficulty correctly pronouncing wordsIkuyo Masuda-Katsuse. 2144-2145 [doi]

Automated production of true-cased punctuated subtitles for weather and news broadcastsJoris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals. 2146-2147 [doi]

2r speech2singing perfects everyone's singingMinghui Dong, Siu Wa Lee, Haizhou Li, Paul Y. Chan, Xuejian Peng, Jochen Walter Ehnes, Dong-Yan Huang. 2148-2149 [doi]

Spoken language recognition based on senone posteriorsLuciana Ferrer, Yun Lei, Mitchell McLaren, Nicolas Scheffer. 2150-2154 [doi]

Automatic language identification using long short-term memory recurrent neural networksJavier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Hasim Sak, Joaquin Gonzalez-Rodriguez, Pedro J. Moreno. 2155-2159 [doi]

Robust language recognition via adaptive language factor extractionBrecht Desplanques, Kris Demuynck, Jean-Pierre Martens. 2160-2164 [doi]

Dialect levelling in Finnish: a universal speech attribute approachHamid Behravan, Ville Hautamäki, Sabato Marco Siniscalchi, Elie el Khoury, Tommi Kurki, Tomi Kinnunen, Chin-Hui Lee. 2165-2169 [doi]

Improving native accent identification using deep neural networksMingming Chen, Zhanlei Yang, Hao Zheng, Wenju Liu. 2170-2174 [doi]

Foreign accent recognition based on temporal information contained in lowpass-filtered speechMarie-José Kolly, Adrian Leemann, Volker Dellwo. 2175-2179 [doi]

Adaptation of deep neural network acoustic models using factorised i-vectorsPenny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland. 2180-2184 [doi]

Regularized feature-space discriminative adaptation for robust ASRTakashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, Vaibhava Goel. 2185-2188 [doi]

Towards speaker adaptive training of deep neural network acoustic modelsYajie Miao, Hao Zhang, Florian Metze. 2189-2193 [doi]

Component structuring and trajectory modeling for speech recognitionArseniy Gorin, Denis Jouvet. 2194-2198 [doi]

Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognitionRama Doddipatla, Madina Hasan, Thomas Hain. 2199-2203 [doi]

Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptationZhao You, Bo Xu. 2204-2208 [doi]

A sparse reconstruction method for speech source localization using partial dictionaries over a spherical microphone arrayKushagra Singhal, Rajesh M. Hegde. 2209-2213 [doi]

A robust TDOA estimation method for in-car-noise environmentsWeiwei Cui, Jaeyeon Cho, Seungyeol Lee. 2214-2217 [doi]

Robust low-resource sound localization in correlated noiseLorin Netsch, Jacek Stachurski. 2218-2222 [doi]

Direction-of-arrival estimation of multiple speakers using a planar arrayDongwen Ying, Ruohua Zhou, Junfeng Li, Jielin Pan, Yonghong Yan 0002. 2223-2227 [doi]

Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferencesWei Xue, Shan Liang, Wenju Liu. 2228-2232 [doi]

Multi-sources separation for sound source localizationMariem Bouafif, Zied Lachiri. 2233-2237 [doi]

Relating automatic vowel space estimates to talker intelligibilityYi Luan, Richard A. Wright, Mari Ostendorf, Gina-Anne Levow. 2238-2242 [doi]

Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensationHideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino. 2243-2247 [doi]

Sparse time-frequency representation of speech by the vandermonde transformChristian Fischer Pedersen, Tom Bäckström. 2248-2252 [doi]

Analysis and identification of human scream: implications for speaker recognitionMahesh Kumar Nandwana, John H. L. Hansen. 2253-2257 [doi]

F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classificationDongmei Wang, Philipos C. Loizou, John H. L. Hansen. 2258-2262 [doi]

The influence of pitch and noise on the discriminability of filterbank featuresMalcolm Slaney, Michael L. Seltzer. 2263-2267 [doi]

Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networksRaul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory. 2268-2272 [doi]

Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision treeXiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai. 2273-2277 [doi]

High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversionToru Nakashika, Tetsuya Takiguchi, Yasuo Ariki. 2278-2282 [doi]

Sequence error (SE) minimization training of neural network for voice conversionFeng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li. 2283-2287 [doi]

Robust articulatory speech synthesis using deep neural networks for BCI applicationsFlorent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin, Blaise Yvert. 2288-2292 [doi]

Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regressionDiandra Fabre, Thomas Hueber, Pierre Badin. 2293-2297 [doi]

Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture modelsPatrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti. 2298-2302 [doi]

Speech-driven head motion synthesis using neural networksChuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu. 2303-2307 [doi]

Text-independent voice conversion using speaker model alignment method from non-parallel speechPeng Song, Yun Jin, Wenming Zheng, Li Zhao. 2308-2312 [doi]

Voice conversion using generative trained deep neural networks with multiple frame spectral envelopesLing-Hui Chen, Zhen-Hua Ling, Li-Rong Dai. 2313-2317 [doi]

Hierarchical modeling of F0 contours for voice conversionGerard Sanchez, Hanna Silén, Jani Nurminen, Moncef Gabbouj. 2318-2321 [doi]

0 contoursKento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka. 2322-2326 [doi]

An iterative approach to decision tree training for context dependent speech synthesisXiayu Chen, Yang Zhang, Mark Hasegawa-Johnson. 2327-2331 [doi]

Prosodic phrasing modeling for vietnamese TTS using syntactic informationThi Thu Trang Nguyen, Albert Rilliard, Do Dat Tran, Christophe d'Alessandro. 2332-2336 [doi]

Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labelingTomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi. 2337-2341 [doi]

Reconstruction of mistracked articulatory trajectoriesQiang Fang, Jianguo Wei, Fang Hu. 2342-2345 [doi]

Phone classification by a hierarchy of invariant representation layersChiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso A. Poggio. 2346-2350 [doi]

A semi-Markov model for speech segmentation with an utterance-break priorMark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes. 2351-2355 [doi]

Speech detection in transient noisesG. Aneeja, B. Yegnanarayana. 2356-2360 [doi]

Evaluation of dictionary for sparse coding in speech processingYongjun He, Guanglu Sun, Guibin Zheng, Jiqing Han. 2361-2364 [doi]

Joint filtering and factorization for recovering latent structure from noisy speech dataColin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan. 2365-2369 [doi]

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesisAscensión Gallardo-Antolín, J. M. Montero, Simon King. 2370-2374 [doi]

Read and spontaneous speech classification based on variance of GMM supervectorsTaichi Asami, Ryo Masumura, Hirokazu Masataki, Sumitaka Sakauchi. 2375-2379 [doi]

Co-channel speech detection via spectral analysis of frequency modulated sub-bandsNavid Shokouhi, Seyed Omid Sadjadi, John H. L. Hansen. 2380-2384 [doi]

Word-level invariant representations from acoustic waveformsStephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio. 2385-2389 [doi]

On closed form calculation of line spectral frequencies (LSF)Paul Dalsgaard, Ove Andersen. 2390-2394 [doi]

Robust features for content-based audio copy detectionChahid Ouali, Pierre Dumouchel, Vishwa Gupta. 2395-2399 [doi]

Binaural deep neural network classification for reverberant speech segregationYi Jiang, DeLiang Wang, Runsheng Liu. 2400-2404 [doi]

Investigating NMF speech enhancement for neural network based acoustic modelsJürgen T. Geiger, Jort F. Gemmeke, Björn Schuller, Gerhard Rigoll. 2405-2409 [doi]

Automatic speech feature classification for children with cochlear implantsJason Lilley, James J. Mahshie, H. Timothy Bunnell. 2410-2414 [doi]

Sequential maximum mutual information linear discriminant analysis for speech recognitionYuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey. 2415-2419 [doi]

Model and feature based compensation for whispered speech recognitionShabnam Ghaffarzadegan, Hynek Boril, John H. L. Hansen. 2420-2424 [doi]

Post-masking: a hybrid approach to array processing for speech recognitionAmir R. Moghimi, Bhiksha Raj, Richard M. Stern. 2425-2429 [doi]

ASR feature extraction with morphologically-filtered power-normalized cochleogramsF. de-la-Calle-Silos, Francisco José Valverde Albacete, Ascensión Gallardo-Antolín, Carmen Peláez-Moreno. 2430-2434 [doi]

Should deep neural nets have ears? the role of auditory features in deep learning approachesAngel Mario Castro Martinez, Niko Moritz, Bernd T. Meyer. 2435-2439 [doi]

Extending Limabeam with discrimination and coarse gradientsCharles Fox, Thomas Hain. 2440-2444 [doi]

Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali languageSankar Mukherjee, Shyamal Kumar Das Mandal. 2445-2449 [doi]

Room localization for distant speech recognitionJuan Andres Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin. 2450-2453 [doi]

Posterior-based sparse representation for automatic speech recognitionSara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard. 2454-2458 [doi]

Query-by-example spoken term detection on multilingual unconstrained speechXavier Anguera, Luis Javier Rodríguez-Fuentes, Igor Szöke, Andi Buzo, Florian Metze, Mikel Peñagarikano. 2459-2463 [doi]

A comparison of multiple methods for rescoring keyword search lists for low resource languagesVictor Soto, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg. 2464-2468 [doi]

Subword and phonetic search for detecting out-of-vocabulary keywordsDamianos Karakos, Richard M. Schwartz. 2469-2473 [doi]

An in-depth comparison of keyword specific thresholding and sum-to-one score normalizationYun Wang, Florian Metze. 2474-2478 [doi]

Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languagesHung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass. 2479-2483 [doi]

Developing STT and KWS systems using limited language resourcesViet Bac Le, Lori Lamel, Abdelkhalek Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres, Anindya Roy. 2484-2488 [doi]

GMM-based bandwidth extension using sub-band basis spectrum modelYamato Ohtani, Masatsune Tamura, Masahiro Morita, Masami Akamine. 2489-2493 [doi]

A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speechKazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. 2494-2498 [doi]

A comparative study of spectral transformation techniques for singing voice synthesisS. W. Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li. 2499-2503 [doi]

Application of matrix variate Gaussian mixture model to statistical voice conversionDaisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose. 2504-2508 [doi]

Joint nonnegative matrix factorization for exemplar-based voice conversionZhizheng Wu, Chng Eng Siong, Haizhou Li. 2509-2513 [doi]

Statistical singing voice conversion with direct waveform modification based on the spectrum differentialKazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 2514-2518 [doi]

Detecting proximity from personal audio recordingsDaniel P. W. Ellis, Hiroyuki Satoh, Zhuo Chen. 2519-2523 [doi]

Acoustic event detection and localization with regression forestsHuy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins. 2524-2528 [doi]

Multi-source posteriors for speech activity detection on public talksMarc Ferras, Hervé Bourlard. 2529-2532 [doi]

Analysis of spectrogram image methods for sound event classificationJonathan Dennis, Tran Huy Dat, Chng Eng Siong. 2533-2537 [doi]

Speech-based automatic and robust detection of very early dementiaAharon Satt, Ron Hoory, Alexandra König, Pauline Aalten, Philippe H. Robert. 2538-2542 [doi]

On the acoustic environment of a neonatal intensive care unit: initial description, and detection of equipment alarmsGanna Raboshchuk, Climent Nadeu, Omid Ghahabi, Sergi Solvez, Blanca Muñoz Mahamud, Ana Riverola de Veciana, Santiago Navarro Hervas. 2543-2547 [doi]

Non-native perception of regionally accented speech in a multitalker contextRobert Allen Fox, Ewa Jacewicz, Florence Hardjono. 2548-2552 [doi]

A crosslinguistic and acquisitional perspective on intonational rises in FrenchGiuseppina Turco, Elisabeth Delais-Roussarie. 2553-2557 [doi]

Error patterns of Mandarin disyllabic tones by Japanese learnersJung-Yueh Tu, Yuwen Hsiung, Min-Da Wu, Yao-Ting Sung. 2558-2562 [doi]

Infant-directed speech enhances temporal rhythmic structure in the envelopeVictoria Leong, Marina Kalashnikova, Denis Burnham, Usha Goswami. 2563-2567 [doi]

Influences of tone sandhi on word recognition in preschool childrenDilu Wewalaarachchi, Leher Singh. 2568-2571 [doi]

Lexical representation of consonant, vowels and tones in early childhoodHwee Hwee Goh, Charlene Hu, Kheng Hui Yeo, Leher Singh. 2572-2574 [doi]

Audiovisual temporal sensitivity in typical and dyslexic adult readersAna A. Francisco, Alexandra Jesse, Margriet A. Groen, James M. McQueen. 2575-2579 [doi]

Aero-tactile integration in fricatives: converting audio to air flow information for speech perception enhancementDonald Derrick, Greg A. O'Beirne, Tom De Rybel, Jennifer Hay. 2580-2584 [doi]

Relative importance of AM and FM cues for speech comprehension: effects of speaking rate and their implications for neurophysiological processing of speechGuangting Mai. 2585-2589 [doi]

The effect of regional and non-native accents on word recognition processes: a comparison of EEG responses in quiet to speech recognition in noiseLouise Stringer, Paul Iverson. 2590-2594 [doi]

Towards a neural measure of perceptual distance - classification of electroencephalographic responses to synthetic vowelsManson C.-M. Fong, James W. Minett, Thierry Blu, William S.-Y. Wang. 2595-2599 [doi]

Collecting a corpus of Dutch noise-induced `slips of the ear'Odette Scharenborg, Eric Sanders, Bert Cranen. 2600-2604 [doi]

Lexical modeling for Arabic ASR: a systematic approachTuka Al Hanai, James R. Glass. 2605-2609 [doi]

Hybrid language models for speech transcriptionLuiza Orosanu, Denis Jouvet. 2610-2614 [doi]

Neural network language models for low resource languagesAnkur Gandhe, Florian Metze, Ian Lane. 2615-2619 [doi]

Feed forward pre-training for recurrent neural network language modelsSiva Reddy Gangireddy, Fergus McInnes, Steve Renals. 2620-2624 [doi]

Grounding language models in spatiotemporal contextBrandon C. Roy, Soroush Vosoughi, Deb Roy. 2625-2629 [doi]

Direct word graph rescoring using a* search and RNNLMShahab Jalalvand, Daniele Falavigna. 2630-2634 [doi]

One billion word benchmark for measuring progress in statistical language modelingCiprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson. 2635-2639 [doi]

Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenarioAndrea Schnall, Martin Heckmann. 2640-2644 [doi]

Backoff inspired features for maximum entropy language modelsFadi Biadsy, Keith Hall, Pedro J. Moreno, Brian Roark. 2645-2649 [doi]

BioKIT - real-time decoder for biosignal processingDominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz. 2650-2654 [doi]

Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systemsDavid F. Harwath, James R. Glass. 2655-2659 [doi]

A new auxiliary-vector algorithm with conjugate orthogonality for speech enhancementShengkui Zhao, Douglas L. Jones. 2660-2664 [doi]

Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancementNeehar Jathar, Preeti Rao. 2665-2669 [doi]

Dynamic noise aware training for speech enhancement based on deep neural networksYong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee. 2670-2674 [doi]

Microphone array post-filtering using supervised machine learning for speech enhancementPasi Pertilä, Joonas Nikunen. 2675-2679 [doi]

Novel speech duration modifier for packet based communication systemSenthil Kumar Mani, Jitendra Kumar Dhiman, K. Sri Rama Murty. 2680-2684 [doi]

Experiments on deep learning for speech denoisingDing Liu, Paris Smaragdis, Minje Kim. 2685-2689 [doi]

Single-channel dynamic exemplar-based speech enhancementNasser Mohammadiha, Simon Doclo. 2690-2694 [doi]

Using hidden Markov models for speech enhancementAkihiro Kato, Ben Milner. 2695-2699 [doi]

Blind source extraction based on a direction-dependent a-priori SNRLukas Pfeifenberger, Franz Pernkopf. 2700-2704 [doi]

Least squares phase estimation of mixed signalsCarlos Eduardo Cancino Chacón, Pejman Mowlaee. 2705-2709 [doi]

Speech enhancement from additive noise and channel distortion - a corpus-based approachJi Ming, Danny Crookes. 2710-2714 [doi]

Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppressionHyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern. 2715-2718 [doi]

Variable-component deep neural network for robust speech recognitionRui Zhao, Jinyu Li, Yifan Gong. 2719-2723 [doi]

Effective modulation spectrum factorization for robust speech recognitionYu-Chen Kao, Yi-ting Wang, Berlin Chen. 2724-2728 [doi]

Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASRSuman V. Ravuri. 2729-2733 [doi]

Robust speech recognition using temporal masking and thresholding algorithmChanwoo Kim, Kean K. Chin, Michiel Bacchiani, Richard M. Stern. 2734-2738 [doi]

Deep neural network bottleneck features for generalized variable parameter HMMsXurong Xie, Rongfeng Su, Xunying Liu, Lan Wang. 2739-2743 [doi]

A novel dynamic parameters calculation approach for model compensationSuliang Bu, Yanmin Qian, Kai Yu. 2744-2748 [doi]

Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorizationNaoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa. 2749-2753 [doi]

Noise robust speech recognition based on noise-adapted HMMs using speech feature compensationYong-joo Chung. 2754-2758 [doi]

Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognitionM. J. Alam, Patrick Kenny, Pierre Dumouchel, Douglas D. O'Shaughnessy. 2759-2763 [doi]

Comparing decoding strategies for subword-based keyword spotting in low-resourced languagesWilliam Hartmann, Viet Bac Le, Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain. 2764-2768 [doi]

Strategies for rescoring keyword search results using word-burst and acoustic featuresMin Ma, Justin Richards, Victor Soto, Julia Hirschberg, Andrew Rosenberg. 2769-2773 [doi]

Word-based probabilistic phonetic retrieval for low-resource spoken term detectionDi Xu, Florian Metze. 2774-2778 [doi]

A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modelingI.-Fan Chen, Nancy F. Chen, Chin-Hui Lee. 2779-2783 [doi]

Combination of FST and CN search in spoken term detectionJustin Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky. 2784-2788 [doi]

Low-resource open vocabulary keyword search using point process modelsChunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal, Sanjeev Khudanpur. 2789-2793 [doi]

Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrixTom Bäckström, Christian R. Helmrich. 2794-2798 [doi]

Stress and accent transmission in HMM-based syllable-context very low bit rate speech codingMilos Cernak, Alexandros Lazaridis, Philip N. Garner, Petr Motlicek. 2799-2803 [doi]

Subjective voice quality evaluation of artificial bandwidth extension: comparing different audio bandwidths and speech codecsHannu Pulakka, Anssi Rämö, Ville Myllylä, Henri Toukomaa, Paavo Alku. 2804-2808 [doi]

Stereo acoustic echo suppression using widely linear filtering in the frequency domainZhong-Hua Fu, Lei Xie. 2809-2813 [doi]

Enhanced muting method in packet loss concealment of ITU-t g.722 using sigmoid function with on-line optimized parametersBong-Ki Lee, Inyoung Hwang, Jihwan Park, Joon-Hyuk Chang. 2814-2818 [doi]

A robust step-size control algorithm for frequency domain acoustic echo cancellationChao Wu, Kaiyu Jiang, Yanmeng Guo, Qiang Fu, Yonghong Yan 0002. 2819-2823 [doi]

Multi-channel speech enhancement using sparse coding on local time-frequency structuresZhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao. 2824-2827 [doi]

Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applicationsSeyedmahdad Mirsamadi, John H. L. Hansen. 2828-2832 [doi]

Speech enhancement by low-rank and convolutive dictionary spectrogram decompositionZhuo Chen, Brian McFee, Daniel P. W. Ellis. 2833-2837 [doi]

Multiple-order non-negative matrix factorization for speech enhancementXabier Jaureguiberry, Emmanuel Vincent, Gaël Richard. 2838-2842 [doi]

NMF-based speech enhancement incorporating deep neural networkTae Gyoon Kang, Kisoo Kwon, Jong Won Shin, Nam Soo Kim. 2843-2846 [doi]

A data-driven approach to speech enhancement using Gaussian processSukanya Sonowal, Kisoo Kwon, Nam Soo Kim, Jong Won Shin. 2847-2851 [doi]

Error correction of automatic speech recognition based on normalized web distanceE. Byambakhishig, K. Tanaka, Ryo Aihara, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki. 2852-2856 [doi]

Unsupervised training methods for discriminative language modelingErinç Dikici, Murat Saraçlar. 2857-2861 [doi]

Building a vocabulary self-learning speech recognition systemLong Qin, Alexander I. Rudnicky. 2862-2866 [doi]

Methods for efficient semi-automatic pronunciation dictionary bootstrappingTim Schlippe, Matthias Merz, Tanja Schultz. 2867-2871 [doi]

Rapidly building domain-specific entity-centric language models using semantic web knowledge sourcesMurat Akbacak, Dilek Hakkani-Tür, Gökhan Tür. 2872-2876 [doi]

Context-dependent pronunciation error pattern discovery with limited annotationsAnn Lee, James R. Glass. 2877-2881 [doi]

Detecting speaker roles and topic changes in multiparty conversations using latent topic modelsAshtosh Sapru, Hervé Bourlard. 2882-2886 [doi]

A deep neural network approach for sentence boundary detection in broadcast newsChenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Engsiong Chng, Haizhou Li. 2887-2891 [doi]

Variable Span disfluency detection in ASR transcriptsRahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang, Shrikanth S. Narayanan. 2892-2896 [doi]

A CRF-based approach to automatic disfluency detection in a French call-centre corpusCamille Dutrey, Chloé Clavel, Sophie Rosset, Ioana Vasilescu, Martine Adda-Decker. 2897-2901 [doi]

Multi-pass sentence-end detection of lecture speechMadina Hasan, Rama Doddipatla, Thomas Hain. 2902-2906 [doi]

Multi-domain disfluency and repair detectionVictoria Zayats, Mari Ostendorf, Hannaneh Hajishirzi. 2907-2911 [doi]

Enabling controllability for continuous expression spaceLangzhou Chen, Norbert Braunschweiler. 2912-2916 [doi]

Analysis of spectral enhancement using global variance in HMM-based speech synthesisTakashi Nose, Akinori Ito. 2917-2921 [doi]

Intelligibility analysis of fast synthesized speechCassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi. 2922-2926 [doi]

Speech synthesis reactive to dynamic noise environmental conditionsSusana Palmaz López-Peláez, Robert A. J. Clark. 2927-2931 [doi]

Partial representations improve the prosody of incremental speech synthesisTimo Baumann. 2932-2936 [doi]

Dialogue context sensitive speech synthesis using factorized decision treesPirros Tsiakoulis, Catherine Breslin, Milica Gasic, Matthew Henderson, DongHo Kim, Steve J. Young. 2937-2941 [doi]

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesisXin Wang, Zhen-Hua Ling, Li-Rong Dai. 2942-2946 [doi]

On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speechDhananjaya N. Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle J. Palomäki, Mircea Giurgiu, Mikko Kurimo. 2947-2951 [doi]

Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergenceC.-T. Do, M. Evrard, A. Leman, Christophe d'Alessandro, Albert Rilliard, J.-L. Crebouw. 2952-2956 [doi]

Speech intonation for TTS: study on evaluation methodologyJavier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales. 2957-2961 [doi]

Speaker age estimation for elderly speech recognition in European PortugueseThomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen, Miguel Sales Dias. 2962-2966 [doi]

Unsupervised model selection for recognition of regional accented speechMaryam Najafian, Andrea DeMarco, Stephen J. Cox, Martin Russell. 2967-2971 [doi]

Speaker adaptation based on sparse and low-rank eigenphone matrix estimationWen-Lin Zhang, Dan Qu, Wei-Qiang Zhang, Bi-Cheng Li. 2972-2976 [doi]

Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptationYan Huang, Dong Yu, Chaojun Liu, Yifan Gong. 2977-2981 [doi]

A low complexity model adaptation approach involving sparse coding over multiple dictionariesS. Shahnawazuddin, Rohit Sinha. 2982-2986 [doi]

Effect of frequency weighting on MLP-based speaker canonicalizationYuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta. 2987-2991 [doi]

Feature space maximum a posteriori linear regression for adaptation of deep neural networksZhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I.-Fan Chen, Chao Weng, Chin-Hui Lee. 2992-2996 [doi]

Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processingNatalia A. Tomashenko, Yuri Y. Khokhlov. 2997-3001 [doi]

BUT 2014 Babel system: analysis of adaptation in NN based systemsMartin Karafiát, Frantisek Grézl, Karel Veselý, Mirko Hannemann, Igor Szöke, Jan Cernocký. 3002-3006 [doi]

Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?Mickael Rouvier, Benoît Favre. 3007-3011 [doi]

Task-aware deep bottleneck features for spoken language identificationBing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai. 3012-3016 [doi]

Virtual example for phonotactic language recognitionRong Tong, Bin Ma, Haizhou Li. 3017-3021 [doi]

Phonotactic language recognition based on time-gap-weighted lattice kernelsWeiwei Liu, Wei-Qiang Zhang, Jia Liu. 3022-3026 [doi]

UBM fused total variability modeling for language identificationMaarten Van Segbroeck, Ruchir Travadi, Shrikanth S. Narayanan. 3027-3031 [doi]

On the complementarity of short-time fourier analysis windows of different lengths for improved language recognitionMireia Díez, Mikel Peñagarikano, Germán Bordel, Amparo Varona, Luis Javier Rodríguez-Fuentes. 3032-3036 [doi]

Modified-prior i-vector estimation for language identification of short duration utterancesRuchir Travadi, Maarten Van Segbroeck, Shrikanth S. Narayanan. 3037-3041 [doi]

Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizersLuis Fernando D'Haro, Ricardo de Córdoba, Christian Salamea Palacios, Javier Ferreiros. 3042-3046 [doi]

PLLR features in language recognition system for RATSOldrich Plchot, Mireia Díez, Mehdi Soufifar, Lukás Burget. 3047-3051 [doi]

Language identification of code Switching sentences and multilingual sentences of under-resourced languages by using multi structural word informationYin-Lai Yeong, Tien Ping Tan. 3052-3055 [doi]

External Links

Cite Key

Statistics

PDF

Researchr

INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014

Abstract

Table of Contents