INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015

researchr

You are not signed in
Sign in
Sign up

INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015. ISCA, 2015. [doi]

Conference: interspeech2015

Abstract is missing.

The parkinson's condition sub-challenge: the dataJuan R. Orozco-Arroyave. [doi]

Voices of power, passion, and personalityKlaus Scherer. [doi]

The INTERSPEECH 2015 computational paralinguistics challenge: a summary of resultsStefan Steidl. [doi]

The degree of nativeness sub-challenge: the dataFlorian Hönig. [doi]

Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overviewRamón Fernandez Astudillo, Shinji Watanabe, Ahmed Hussen Abdelaziz, Dorothea Kolossa. [doi]

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk by the organizersZhizheng Wu, Tomi Kinnunen. [doi]

The technology powering personal digital assistantsRuhi Sarikaya. [doi]

The emergence of compositional structure in language evolution and developmentMary E. Beckman. [doi]

Biosignal-based spoken communication: panel and discussionMatthias Janke, Michael Wand. [doi]

The eating condition sub-challenge: the dataAnton Batliner. [doi]

Biosignal-based spoken communication: welcome and introductionMatthias Janke, Michael Wand. [doi]

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): open discussion and future plansJunichi Yamagishi, Nicholas W. D. Evans. [doi]

The HBP-atlas - concept, perspectives, and application for language and speech researchKatrin Amunts. [doi]

Advanced crowdsourcing for speech and beyond: introduction by the organizersTim Polzehl, Gina-Anne Levow. [doi]

Learning the speech front-end with raw waveform CLDNNsTara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin W. Wilson, Oriol Vinyals. 1-5 [doi]

Architectures for deep neural network based acoustic models defined over windowed speech waveformsMayank Bhargava, Richard Rose. 6-10 [doi]

Analysis of CNN-based speech recognition system using raw speech as inputDimitri Palaz, Mathew Magimai-Doss, Ronan Collobert. 11-15 [doi]

Bilinear map of filter-bank outputs for DNN-based speech recognitionTetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta. 16-20 [doi]

Speech recognition with temporal neural networksPayton Lin, Dau-Cheng Lyu, Yun-Fan Chang, Yu Tsao. 21-25 [doi]

Convolutional neural networks for acoustic modeling of raw time signal in LVCSRPavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney. 26-30 [doi]

Stable and unstable intervals as a basic segmentation procedure of the speech signalUlrike Glavitsch, Lei He, Volker Dellwo. 31-35 [doi]

Polysyllabic shortening and word-final lengthening in EnglishAndreas Windmann, Juraj Simko, Petra Wagner. 36-40 [doi]

The acoustics of word stress in English as a function of stress level and speaking styleAnders Eriksson, Mattias Heldner. 41-45 [doi]

Pitch accent distribution in German infant-directed speechKatharina Zahner, Muna Pohl, Bettina Braun. 46-50 [doi]

Acoustic correlates of perceived syllable prominence in GermanHansjörg Mixdorff, Christian Cossio-Mercado, Angelika Hönemann, Jorge A. Gurlekian, Diego Evin, Humberto M. Torres. 51-55 [doi]

Cross-modality matching of linguistic and emotional prosodySimone Simonetti, Jeesun Kim, Chris Davis. 56-59 [doi]

A fast algorithm for improved intelligibility of speech-in-noise based on frequency and time domain energy reallocationTudor-Catalin Zorila, Yannis Stylianou. 60-64 [doi]

Intelligibility enhancement of casual speech for reverberant environments inspired by clear speech propertiesMaria Koutsogiannaki, Petko N. Petkov, Yannis Stylianou. 65-69 [doi]

Intelligibility enhancement of vocal announcements for public address systems: a design for all through a presbycusis pre-compensation filterAmira Ben Jemaa, N. Mechergui, G. Courtois, A. Mudry, S. Djaziri Larbi, M. Turki, H. Lissek, Meriem Jaïdane. 70-74 [doi]

Model-based integration of reverberation for noise-adaptive near-end listening enhancementHenning F. Schepker, David Hülsmeier, Jan Rennies, Simon Doclo. 75-79 [doi]

Online Lombard adaptation in incremental speech synthesisSebastian Rottschäfer, Hendrik Buschmeier, Herwin van Welbergen, Stefan Kopp. 80-84 [doi]

Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speechEmma Jokinen, Ulpu Remes, Paavo Alku. 85-89 [doi]

A discriminative reliability-aware classification model with applications to intelligibility classification in pathological speechNaveen Kumar 0004, Shrikanth S. Narayanan. 90-94 [doi]

Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's diseaseJuan R. Orozco-Arroyave, Florian Hönig, Julián D. Arias-Londoño, Jesus Francisco Vargas Bonilla, Sabine Skodda, Jan Rusz, Elmar Nöth. 95-99 [doi]

Low-frequency components analysis in running speech for the automatic detection of parkinson's diseaseT. Villa-Cañas, Julián D. Arias-Londoño, Juan R. Orozco-Arroyave, Jesus Francisco Vargas Bonilla, Elmar Nöth. 100-104 [doi]

Automatic detection of parkinson's disease from continuous speech recorded in non-controlled noise conditionsJ. C. Vásquez-Correa, T. Arias-Vergara, Juan R. Orozco-Arroyave, Jesus Francisco Vargas Bonilla, Julián D. Arias-Londoño, Elmar Nöth. 105-109 [doi]

Relevance vector machine for depression predictionNicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski. 110-114 [doi]

Typicality and emotion in the voice of children with autism spectrum condition: evidence across three languagesErik Marchi, Björn W. Schuller, Simon Baron-Cohen, Ofer Golan, Sven Bölte, Prerna Arora, Reinhold Häb-Umbach. 115-119 [doi]

Deep contextual language understanding in spoken dialogue systemsChunxi Liu, Puyang Xu, Ruhi Sarikaya. 120-124 [doi]

RNN-based labeled data generation for spoken language understandingYik-Cheung Tam, Yangyang Shi, Hunk Chen, Mei-Yuh Hwang. 125-129 [doi]

Is it time to Switch to word embedding and recurrent neural networks for spoken language understanding?Vedran Vukotic, Christian Raymond, Guillaume Gravier. 130-134 [doi]

Recurrent neural network and LSTM models for lexical utterance classificationSuman V. Ravuri, Andreas Stolcke. 135-139 [doi]

Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectorsHung-tsung Lu, Yuan-ming Liou, Hung-yi Lee, Lin-Shan Lee. 140-144 [doi]

A comparison of normalization techniques applied to latent space representations for speech analyticsMohamed Morchid, Richard Dufour, Driss Matrouf. 145-149 [doi]

The effect of soft, modal and loud voice levels on entrainment in noisy conditionsÉva Székely, Mark T. Keane, Julie Carson-Berndsen. 150-154 [doi]

Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue?Benjamin R. Cowan, Holly P. Branigan. 155-159 [doi]

Exploiting top-down source models to improve binaural localisation of multiple sources in reverberant environmentsNing Ma, Guy J. Brown, José A. González. 160-164 [doi]

Binaural sound source localisation and tracking using a dynamic spherical head modelChristopher Schymura, Fiete Winter, Dorothea Kolossa, Sascha Spors. 165-169 [doi]

The role of temporal resolution in modulation-based speech segregationTobias May, Thomas Bentsen, Torsten Dau. 170-174 [doi]

Improving automatic speech recognition in spatially-aware hearing aidsHendrik Kayser, Constantin Spille, Daniel Marquardt, Bernd T. Meyer. 175-179 [doi]

Dereverberation for active human-robot communication robust to speaker's face orientationRandy Gomez, Levko Ivanchuk, Keisuke Nakamura, Takeshi Mizumoto, Kazuhiro Nakadai. 180-184 [doi]

Multi-task learning for text-dependent speaker verificationNanxin Chen, Yanmin Qian, Kai Yu. 185-189 [doi]

JFA for speaker recognition with random digit stringsThemos Stafylakis, Patrick Kenny, Md. Jahangir Alam, Marcel Kockmann. 190-194 [doi]

Structured prediction for speaker identification in TV seriesElena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon. 195-199 [doi]

Speaker recognition by means of acoustic and phonetically informed GMMsSandro Cumani, Pietro Laface, Farzana Kulsoom. 200-204 [doi]

A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noiseAshish Panda. 205-209 [doi]

Blind score normalization method for PLDA based speaker recognitionDanila Doroshin, Nikolay Lubimov, Marina Nastasenko, Mikhail Kotov. 210-213 [doi]

Non-linear PLDA for i-vector speaker verificationSergey Novoselov, Timur Pekhovsky, Oleg Kudashev, Valentin S. Mendelev, Alexey Prudnikov. 214-218 [doi]

On the need of template protection for voice authenticationCarlos Vaquero, Patricia Rodríguez. 219-223 [doi]

Evaluation and calibration of short-term aging effects in speaker verificationFinnian Kelly, John H. L. Hansen. 224-228 [doi]

Phone-centric local variability vector for text-constrained speaker verificationLiping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai. 229-233 [doi]

Cosine distance features for robust speaker verificationKuruvachan K. George, C. Santhosh Kumar, K. I. Ramachandran, Ashish Panda. 234-238 [doi]

Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verificationSayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, Tomoko Matsui. 239-243 [doi]

Noise robust speaker recognition with convolutive sparse codingAntti Hurmalainen, Rahim Saeidi, Tuomas Virtanen. 244-248 [doi]

Combining amplitude and phase-based features for speaker verification with short duration utterancesMd. Jahangir Alam, Patrick Kenny, Themos Stafylakis. 249-253 [doi]

Phase perception of the glottal excitation of vocoded speechTuomo Raitio, Lauri Juvela, Antti Suni, Martti Vainio, Paavo Alku. 254-258 [doi]

Using acoustics to improve pronunciation for synthesis of low resource languagesSunayana Sitaram, Serena Jeblee, Alan W. Black. 259-263 [doi]

Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrumTadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno. 264-268 [doi]

Pruning redundant synthesis units based on static and delta unit appearance frequencyHeng Lu, Wei Zhang, Xu Shao, Quan Zhou, Wenhui Lei, Hongbin Zhou, Andrew P. Breen. 269-273 [doi]

Emotional transplant in statistical speech synthesis based on emotion additive modelYamato Ohtani, Yu Nasu, Masahiro Morita, Masami Akamine. 274-278 [doi]

Generalized variable parameter HMMs based acoustic-to-articulatory inversionXurong Xie, Xunying Liu, Lan Wang, Rongfeng Su. 279-283 [doi]

Semi-supervised training of a voice conversion mapping function using a joint-autoencoderSeyed Hamidreza Mohammadi, Alexander Kain. 284-288 [doi]

On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis systemStefan Huber, Axel Roebel. 289-293 [doi]

Fluent personalized speech synthesis with prosodic word-level spontaneous speech generationYi-Chin Huang, Chung-Hsien Wu, Ming-Ge Shie. 294-298 [doi]

Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristicsYuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 299-303 [doi]

Evaluation of state mapping based foreign accent conversionMarkus Toman, Michael Pucher. 304-308 [doi]

Minimum trajectory error training for deep neural networks, combined with stacked bottleneck featuresZhizheng Wu, Simon King. 309-313 [doi]

Anomaly-based annotation errors detection in TTS corporaJindrich Matousek, Daniel Tihelka. 314-318 [doi]

Analysing automatic descriptions of intonation with ICARUSKatrin Schweitzer, Markus Gärtner, Arndt Riester, Ina Rösiger, Kerstin Eckart, Jonas Kuhn, Grzegorz Dogil. 319-323 [doi]

iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descentNancy F. Chen, Rong Tong, Darren Wee, Pei Xuan Lee, Bin Ma, Haizhou Li. 324-328 [doi]

Development of a Cantonese dysarthric speech corpusKa-Ho Wong, Yu Ting Yeung, Edwin H. Y. Chan, Patrick C. M. Wong, Gina-Anne Levow, Helen M. Meng. 329-333 [doi]

Stylex: a corpus of educational videos for research on speaking styles and their impact on engagement and learningHarish Arsikere, Sonal Patil, Ranjeet Kumar, Kundan Shrivastava, Om Deshmukh. 334-338 [doi]

A dialog act tagging approach to behavioral coding: a case study of addiction counseling conversationsDogan Can, David C. Atkins, Shrikanth S. Narayanan. 339-343 [doi]

Analysing rhythm in ritual discourse in yucatec maya using automatic speech alignmentValentina Vapnarsky, Claude Barras, Cédric Becquey, David Doukhan, Martine Adda-Decker, Lori Lamel. 344-348 [doi]

Noise-matched training of CRF based sentence end detection modelsMadina Hasan, Rama Doddipatla, Thomas Hain. 349-353 [doi]

The effect of spectral slope on pitch perceptionJianjing Kuang, Mark Liberman. 354-358 [doi]

Combined cine- and tagged-MRI for tracking landmarks on the tongue surfaceHonghao Bao, Wenhuan Lu, Kiyoshi Honda, Jianguo Wei, Qiang Fang, Jianwu Dang. 359-363 [doi]

Human vocal tract growth: a longitudinal study of the development of various anatomical structuresGuillaume Barbier, Louis-Jean Boë, Guillaume Captier, Rafael Laboissière. 364-368 [doi]

Analysis of coarticulated speech using estimated articulatory trajectoriesGanesh Sivaraman, Vikramjit Mitra, Mark K. Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson. 369-373 [doi]

Speech planning in 4-year-old children versus adults: acoustic and articulatory analysesGuillaume Barbier, Pascal Perrier, Lucie Ménard, Yohan Payan, Mark K. Tiede, Joseph S. Perkell. 374-378 [doi]

Morphological and acoustic analysis of the vocal tract using a multi-speaker volumetric MRI datasetTokihiko Kaburagi. 379-383 [doi]

Experimental assessment of the tongue incompressibility hypothesis during speech productionZisis Iason Skordilis, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan. 384-388 [doi]

Multilingual bottleneck features for language recognitionRadek Fér, Pavel Matejka, Frantisek Grézl, Oldrich Plchot, Jan Cernocký. 389-393 [doi]

DNN senone MAP multinomial i-vectors for phonotactic language recognitionAlan McCree, Daniel Garcia-Romero. 394-397 [doi]

Deep bottleneck network based i-vector representation for language identificationYan Song, Xinhai Hong, Bing Jiang, Ruilian Cui, Ian Vince McLoughlin, Li-Rong Dai. 398-402 [doi]

An end-to-end approach to language identification in short utterances using convolutional neural networksAlicia Lozano-Diez, Rubén Zazo-Candil, Javier Gonzalez-Dominguez, Doroteo Torre Toledano, Joaquín González Rodríguez. 403-407 [doi]

Boosting universal speech attributes classification with deep neural network for foreign accent characterizationVille Hautamäki, Sabato Marco Siniscalchi, Hamid Behravan, Valerio Mario Salerno, Ivan Kukanov. 408-412 [doi]

Multilingual tandem bottleneck feature for language identificationWang Geng, Jie Li, Shanshan Zhang, Xinyuan Cai, Bo Xu. 413-417 [doi]

On compressibility of neural network phonological features for low bit rate speech codingAfsaneh Asaei, Milos Cernak, Hervé Bourlard. 418-422 [doi]

Robust and accurate LSF location with laguerre methodMichal Lenarczyk. 423-427 [doi]

Interactivity-aware playout adaptationJochen Issing, Nikolaus Färber, Reinhard German. 428-432 [doi]

Advanced time shrinking using a drop classifier based on codec featuresJochen Issing, Nikolaus Färber, Reinhard German. 433-437 [doi]

Measuring and monitoring speech quality for voice over IP with POLQA, viSQOL and p.563Andrew Hines, Eoin Gillen, Naomi Harte. 438-442 [doi]

Towards the prediction of human speaker identification performance from measured speech qualityLaura Fernández Gallardo, Sebastian Möller. 443-447 [doi]

Personalization of word-phrase-entity language modelsMichael Levit, Andreas Stolcke, R. Subba, Sarangarajan Parthasarathy, Shuangyu Chang, S. Xie, T. Anastasakos, Benoît Dumoulin. 448-452 [doi]

Discriminative bilinear language modeling for broadcast transcriptionsAkio Kobayashi, Manon Ichiki, Takahiro Oku, Kazuo Onoe, Shoei Sato. 453-457 [doi]

Recognize foreign low-frequency words with similar pairsXi Ma, Xiaoxi Wang, Dong Wang, Zhiyong Zhang. 458-462 [doi]

Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognitionRyo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito. 463-467 [doi]

Bringing contextual information to google speech recognitionPetar S. Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith B. Hall, Brian Roark, David Rybach, Pedro J. Moreno. 468-472 [doi]

Sequence-based class tagging for robust transcription in ASRLucy Vasserman, Vlad Schogol, Keith Hall. 473-477 [doi]

The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating conditionBjörn W. Schuller, Stefan Steidl, Anton Batliner, Simone Hantke, Florian Hönig, Juan R. Orozco-Arroyave, Elmar Nöth, Yue Zhang, Felix Weninger. 478-482 [doi]

Phrase accentuation verification and phonetic variation measurement for the degree of nativeness sub-challengeClaude Montacié, Marie-José Caraty. 483-487 [doi]

Combining multiple approaches to predict the degree of nativenessEugénio Ribeiro, Jaime Ferreira, Julia Olcoz, Alberto Abad, Helena Moniz, Fernando Batista, Isabel Trancoso. 488-492 [doi]

Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scalesMatthew P. Black, Daniel Bone, Zisis Iason Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, Maarten Van Segbroeck, Jangwon Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 493-497 [doi]

Estimating the severity of parkinson's disease from speech using linear regression and database partitioningDavid Sztahó, Gábor Kiss, Klára Vicsi. 498-502 [doi]

Random forest-based prediction of parkinson's disease progression using acoustic, ASR and intelligibility featuresAlexander Zlotnik, Juan Manuel Montero, Rubén San Segundo, Ascensión Gallardo-Antolín. 503-507 [doi]

Automatic recognition of unified parkinson's disease rating from speech with acoustic, i-vector and phonotactic featuresGuozhen An, David-Guy Brizan, Min Ma, Michelle Morales, Ali Raza Syed, Andrew Rosenberg. 508-512 [doi]

Parkinson's condition estimation using speech acoustic and inversely mapped articulatory dataSeongjun Hahm, Jun Wang. 513-517 [doi]

Segment-dependent dynamics in predicting parkinson's diseaseJames R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Joseph Perricone, Satrajit S. Ghosh, Gregory Ciccarelli, Daryush D. Mehta. 518-522 [doi]

Recognition of voiced sounds with a continuous state HMMS. M. Houghton, Colin J. Champion, Philip Weber. 523-527 [doi]

Learning speech rate in speech recognitionXiangyu Zeng, Shi Yin, Dong Wang. 528-532 [doi]

Pronunciation and silence probability modeling for ASRGuoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur. 533-537 [doi]

Exploring minimal pronunciation modeling for low resource languagesMarelie H. Davel, Etienne Barnard, Charl Johannes van Heerden, William Hartmann, Damianos Karakos, Richard M. Schwartz, Stavros Tsakalidis. 538-542 [doi]

Attribute knowledge integration for speech recognition based on multi-task learning neural networksHao Zheng, Zhanlei Yang, Liwei Qiao, Jianping Li, Wenju Liu. 543-547 [doi]

Detecting audio-visual synchrony using deep neural networksEtienne Marcheret, Gerasimos Potamianos, Josef Vopicka, Vaibhava Goel. 548-552 [doi]

Cross database training of audio-visual hidden Markov models for phone recognitionShahram Kalantari, David Dean, Houman Ghaemmaghami, Sridha Sridharan, Clinton Fookes. 553-557 [doi]

Incorporating visual information for spoken term detectionShahram Kalantari, David Dean, Sridha Sridharan. 558-562 [doi]

Integration of deep bottleneck features for audio-visual speech recognitionHiroshi Ninomiya, Norihide Kitaoka, Satoshi Tamura, Yurie Iribe, Kazuya Takeda. 563-567 [doi]

Automatic detection of sentence prominence in speech using predictability of word-level acoustic featuresSofoklis Kakouros, Okko Räsänen. 568-572 [doi]

An empirical model of emphatic word detectionMilos Cernak, Pierre-Edouard Honnet. 573-577 [doi]

Using tilt for automatic emphasis detection with Bayesian networksYishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen M. Meng, Jia Jia, Lianhong Cai. 578-582 [doi]

Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamicsLinxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber. 583-587 [doi]

Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversionHidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose. 588-592 [doi]

Analysis of features from analytic representation of speech using MP-ABX measuresRaghavendra Reddy Pappagari, Karthika Vijayan, K. Sri Rama Murty. 593-597 [doi]

Source-filter separation of speech signal in the phase domainErfan Loweimi, Jon Barker, Thomas Hain. 598-602 [doi]

A maximum likelihood approach to the detection of moments of maximum excitation and its application to high-quality speech parameterizationRanniery Maia, Yannis Stylianou, Masami Akamine. 603-607 [doi]

SABR: sparse, anchor-based representation of the speech signalChristopher Liberatore, Sandesh Aryal, Zelun Wang, Seth Polsley, Ricardo Gutierrez-Osuna. 608-612 [doi]

Automatic transformation of irregular to regular voice by residual analysis and synthesisTamás Gábor Csapó, Géza Németh. 613-617 [doi]

Optical sensor calibration for electro-optical stomatographySimon Preuß, Peter Birkholz. 618-622 [doi]

From text to formants - indirect model for trajectory prediction based on a multi-speaker parallel speech databaseKálmán Abari, Tamás Gábor Csapó, Bálint Pál Tóth, Gábor Olaszy. 623-627 [doi]

Layered nonnegative matrix factorization for speech separationChung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi. 628-632 [doi]

Robust tongue tracking in ultrasound images: a multi-hypothesis approachCatherine Laporte, Lucie Ménard. 633-637 [doi]

Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitationDanny Websdale, Thomas Le Cornu, Ben Milner. 638-642 [doi]

Mispronunciation detection without nonnative training dataAnn Lee, James R. Glass. 643-647 [doi]

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilitiesRamya Rasipuram, Milos Cernak, Alexandre Nachen, Mathew Magimai-Doss. 648-652 [doi]

Using F0 contours to assess nativeness in a sentence repeat taskMin Ma, Keelan Evanini, Anastassia Loukina, Xinhao Wang, Klaus Zechner. 653-657 [doi]

Using linguistic indicators of difficulty to identify mild cognitive impairmentRebecca Lunsford, Peter A. Heeman. 658-662 [doi]

Automatic intelligibility measures applied to speech signals simulating age-related hearing lossLionel Fontan, Jérôme Farinas, Isabelle Ferrané, Julien Pinquier, Xavier Aumont. 663-667 [doi]

Assessing empathy using static and dynamic behavior models based on therapist's language in addiction counselingSandeep Nallan Chakravarthula, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou. 668-672 [doi]

SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speechYuzong Liu, Rishabh K. Iyer, Katrin Kirchhoff, Jeff A. Bilmes. 673-677 [doi]

Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian modelHerman Kamper, Aren Jansen, Sharon Goldwater. 678-682 [doi]

LSTM for punctuation restoration in speech transcriptsOttokar Tilk, Tanel Alumäe. 683-687 [doi]

Noise robust exemplar matching for speech enhancement: applications to automatic speech recognitionEmre Yilmaz, Deepak Baby, Hugo Van Hamme. 688-692 [doi]

A study on robust detection of pronunciation erroneous tendency based on deep neural networkYingming Gao, Yanlu Xie, Wen Cao, Jinsong Zhang. 693-696 [doi]

Vowel mispronunciation detection using DNN acoustic models with cross-lingual trainingShrikant Joshi, Nachiket Deo, Preeti Rao. 697-701 [doi]

Confidence-features and confidence-scores for ASR applications in arbitration and DNN speaker adaptationKshitiz Kumar, Ziad Al Bawab, Yong Zhao, Chaojun Liu, Benoît Dumoulin, Yifan Gong. 702-706 [doi]

Topic modeling for conference analyticsPengfei Liu, Shoaib Jameel, Wai Lam, Bin Ma, Helen M. Meng. 707-711 [doi]

Sparse coding based features for speech units classificationPulkit Sharma, Vinayak Abrol, A. D. Dileep, Anil Kumar Sao. 712-715 [doi]

Smarter driving with IDA, the intelligent driving assistant for singaporeAndreea I. Niculescu, Ngoc Thuy Huong Thai, Chongjia Ni, Boon Pang Lim, Kheng Hui Yeo, Rafael E. Banchs. 716-717 [doi]

Talk it out: adding speech interaction to support informational and transactional applications on public touch-screen kiosksKheng Hui Yeo, Rafael E. Banchs. 718-719 [doi]

Conversational agent and management tools for conference and tourism domainLuis Fernando D'Haro, Seokhwan Kim, Rafael E. Banchs. 720-721 [doi]

Latvian speech-to-text transcription serviceAskars Salimbajevs, Jevgenijs Strigins. 722-723 [doi]

System supporting speaker identification in emergency call centerJakub Galka, Joanna Grzybowska, Magdalena Igras, Pawel Jaciów, Kamil Wajda, Marcin Witkowski, Mariusz Ziólko. 724-725 [doi]

2 - the QCRI advanced transcription and translation systemAhmed Abdelali, Ahmed M. Ali, Francisco Guzmán, Felix Stahlberg, Stephan Vogel, Yifan Zhang. 726-727 [doi]

Implementation of a live dialectal media subtitling systemMichael Stadtschnitzer, Christoph Schmidt. 728-729 [doi]

A system for automatic broadcast news summarisation, geolocation and translationPeter Bell 0001, Catherine Lai, Clare Llewellyn, Alexandra Birch, Mark Sinclair. 730-731 [doi]

Media monitoring system for latvian radio and TV broadcastsArturs Znotins, Kaspars Polis, Roberts Dargis. 732-733 [doi]

Meeting assistant applicationMichel Assayag, Jonathan Huang, Jonathan Mamou, Oren Pereg, Saurav Sahay, Oren Shamir, Georg Stemmer, Moshe Wasserblat. 734-735 [doi]

Bayesian integration of sound source separation and speech recognition: a new approach to simultaneous speech recognitionKousuke Itakura, Izaya Nishimuta, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii. 736-740 [doi]

Channel selection in the short-time modulation domain for distant speech recognitionIvan Himawan, Petr Motlícek, Sridha Sridharan, David Dean, Dian Tjondronegoro. 741-745 [doi]

A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired usersGert Dekkers, Toon van Waterschoot, Bart Vanrumste, Bert Van Den Broeck, Jort F. Gemmeke, Hugo Van Hamme, Peter Karsmakers. 746-750 [doi]

Sound source separation algorithm using phase difference and angle distribution modeling near the targetChanwoo Kim, Kean K. Chin. 751-755 [doi]

Contaminated speech training methods for robust DNN-HMM distant speech recognitionMirco Ravanelli, Maurizio Omologo. 756-760 [doi]

Distance-aware DNNs for robust speech recognitionYajie Miao, Florian Metze. 761-765 [doi]

Vocal tremor analysis via AM-FM decomposition of empirical modes of the glottal cycle length time seriesChristophe Mertens, Francis Grenez, François Viallet, Alain Ghio, Sabine Skodda, Jean Schoentgen. 766-770 [doi]

Estimating lower vocal tract features with closed-open phase spectral analysesElizabeth Godoy, Nicolas Malyska, Thomas F. Quatieri. 771-775 [doi]

Inductive implementation of segmental HMMs as CS-HMMsS. M. Houghton, Colin J. Champion. 776-780 [doi]

A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple indian languagesG. Nisha Meenakshi, Prasanta Kumar Ghosh. 781-785 [doi]

Aligning meeting recordings via adaptive fingerprintingT. J. Tsai, Andreas Stolcke. 786-790 [doi]

On representation learning for artificial bandwidth extensionMatthias Zöhrer, Robert Peharz, Franz Pernkopf. 791-795 [doi]

Perception and production of vowel contrasts in German learners of EnglishHelena Levy. 796-800 [doi]

Goodness of tone (GOT) for non-native Mandarin tone recognitionRong Tong, Nancy F. Chen, Bin Ma, Haizhou Li. 801-805 [doi]

The effect of high-variability training on the perception and production of French stops by German native speakersJeanin Jügler, Frank Zimmerer, Bernd Möbius, Christoph Draxler. 806-810 [doi]

Perception of Mandarin tones by native tibetan speakersWenfu Bao, Hui Feng, Jianwu Dang, Zhilei Liu, Yang Yu, Siyu Wang. 811-814 [doi]

Study of acoustic correlates of English lexical stress produced by native (L1) bengali speakers compared to native (L1) English speakersShambhu Nath Saha, Shyamal Kr. Das Mandal. 815-819 [doi]

Prosodic phrasing unique to the acquisition of L2 intonation - an analysis of L2 Japanese intonation by L1 Swedish learnersYasuko Nagano-Madsen. 820-823 [doi]

Fusion of LVCSR and posteriorgram based keyword searchLeda Sari, Batuhan Gündogdu, Murat Saraçlar. 824-828 [doi]

Improving speech recognition and keyword search for low resource languages using web dataGideon Mendels, Erica Cooper, Victor Soto, Julia Hirschberg, Mark J. F. Gales, Kate M. Knill, Anton Ragni, Haipeng Wang. 829-833 [doi]

Two-step spoken term detection using SVM classifier trained with pre-indexed keywords based on ASR resultKentaro Domoto, Takehito Utsuro, Naoki Sawada, Hiromitsu Nishizaki. 834-838 [doi]

Enhancing low resource keyword spotting with automatically retrieved web documentsLe Zhang, Damianos Karakos, William Hartmann, Roger Hsiao, Richard M. Schwartz, Stavros Tsakalidis. 839-843 [doi]

A comparison between a DNN and a CRF disfluency detection and reconstruction systemDario Bertero, Linlin Wang, Ho-Yin Chan, Pascale Fung. 844-848 [doi]

Recurrent neural networks for incremental disfluency detectionJulian Hough, David Schlangen. 849-853 [doi]

Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learningQiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, Ranniery Maia. 854-858 [doi]

An investigation of recurrent neural network architectures for statistical parametric speech synthesisSivanand Achanta, Tejas Godambe, Suryakanth V. Gangashetty. 859-863 [doi]

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesisYuchen Fan, Yao Qian, Frank K. Soong, Lei He. 864-868 [doi]

Towards minimum perceptual error training for DNN-based speech synthesisCassia Valentini-Botinhao, Zhizheng Wu, Simon King. 869-873 [doi]

Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation modelEunwoo Song, Hong-Goo Kang. 874-878 [doi]

A study of speaker adaptation for DNN-based speech synthesisZhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King. 879-883 [doi]

Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiersAbhay Prasad, Prasanta Kumar Ghosh. 884-888 [doi]

Combining hierarchical classification with frequency weighting for the recognition of eating conditionsJohannes Wagner, Andreas Seiderer, Florian Lingenfelser, Elisabeth André. 889-893 [doi]

Acoustic group feature selection using wrapper method for automatic eating condition recognitionDara Pir, Theodore Brown. 894-898 [doi]

Comparing SVM, softmax, and shallow neural networks for eating condition classificationThomas Pellegrini. 899-903 [doi]

Using representation learning and out-of-domain data for a paralinguistic speech taskBenjamin Milde, Chris Biemann. 904-908 [doi]

Fisher vectors with cascaded normalization for paralinguistic analysisHeysem Kaya, Alexey Karpov, Albert Ali Salah. 909-913 [doi]

Automatic estimation of parkinson's disease severity from diverse speech tasksJangwon Kim, Md. Nasir, Rahul Gupta, Maarten Van Segbroeck, Daniel Bone, Matthew P. Black, Zisis Iason Skordilis, Zhaojun Yang, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 914-918 [doi]

Assessing the degree of nativeness and parkinson's condition using Gaussian processes and deep rectifier neural networksTamás Grósz, Róbert Busa-Fekete, Gábor Gosztolya, László Tóth. 919-923 [doi]

Pitch scaling as a perceptual cue for questions in GermanJan Michalsky. 924-928 [doi]

Parameterization of prosodic headednessUwe D. Reichel, Katalin Mády, Stefan Benus. 929-933 [doi]

Detection of mizo tonesBiswajit Dev Sarma, Priyankoo Sarmah, Wendy Lalhminghlui, S. R. Mahadeva Prasanna. 934-937 [doi]

The intonation of echo wh-questionsSophie Repp, Lena Rosin. 938-942 [doi]

Immediately postverbal questions in urduFarhat Jabeen, Tina Bögel, Miriam Butt. 943-947 [doi]

Prosodic (non-)realisation of broad, narrow and contrastive focus in Hungarian: a production and a perception studyKatalin Mády. 948-952 [doi]

F0 discontinuity as a marker of prosodic boundary strength in lombard speechStefan Benus, Uwe D. Reichel, Juraj Simko. 953-957 [doi]

Comparing journalistic and spontaneous speech: prosodic and spectral analysisCédric Gendrot, Martine Adda-Decker, Yaru Wu. 958-962 [doi]

Rhythm influences the tonal realisation of focusNadja Schauffler, Katrin Schweitzer. 963-967 [doi]

Linguistic measures of pitch range in slavic and Germanic languagesBistra Andreeva, Bernd Möbius, Grazyna Demenko, Frank Zimmerer, Jeanin Jügler. 968-972 [doi]

The effect of stress on vowel space in daxi hakka ChineseChunan Qiu, Jie Liang. 973-977 [doi]

Declination, peak height and pitch level in declaratives and questions of south connaught irishMaria O'Reilly, Ailbhe Ní Chasaide. 978-982 [doi]

Contextual variation of tones in mizoPriyankoo Sarmah, Leena Dihingia, Wendy Lalhminghlui. 983-986 [doi]

The prosodic marking of rhetorical questions in GermanDaniela Wochner, Jana Schlegel, Nicole Dehé, Bettina Braun. 987-991 [doi]

High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identificationYannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee. 992-996 [doi]

Phonemes frequency based PLLR dimensionality reduction for language recognitionSaad Irtza, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah, Haizhou Li. 997-1001 [doi]

Exploiting i-vector posterior covariances for short-duration language recognitionSandro Cumani, Oldrich Plchot, Radek Fér. 1002-1006 [doi]

Using the beat histogram for speech rhythm description and language identificationAthanasios Lykartsis, Stefan Weinzierl. 1007-1011 [doi]

Speaker recognition for speech under face coverRahim Saeidi, Tuija Niemi, Hanna Karppelin, Jouni Pohjalainen, Tomi Kinnunen, Paavo Alku. 1012-1016 [doi]

Dataset-invariant covariance normalization for out-domain PLDA speaker verificationMd. Hafizur Rahman, Ahilan Kanagasundaram, David Dean, Sridha Sridharan. 1017-1021 [doi]

Sparse coding of total variability matrixLongting Xu, Kong-Aik Lee, Haizhou Li, Zhen Yang. 1022-1026 [doi]

Duration dependent covariance regularization in PLDA modeling for speaker verificationWeicheng Cai, Ming Li, Lin Li, Qingyang Hong. 1027-1031 [doi]

Exploiting supervector structure for speaker recognition trained on a small development setHagai Aronowitz. 1032-1036 [doi]

Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition systemQingyang Hong, Lin Li, Ming Li, Ling Huang, Lihong Wan, Jun Zhang. 1037-1041 [doi]

Speaker verification using Gaussian posteriorgrams on fixed phrase short utterancesSarfaraz Jelil, Rohan Kumar Das, Rohit Sinha 0003, S. R. Mahadeva Prasanna. 1042-1046 [doi]

Importance of intelligible phonemes for human speaker recognition in different channel bandwidthsLaura Fernández Gallardo, Sebastian Möller, Michael Wagner 0004. 1047-1051 [doi]

Denoising autoencoder-based speaker feature restoration for utterances of short durationHitoshi Yamamoto, Takafumi Koshinaka. 1052-1056 [doi]

Full multicondition training for robust i-vector based speaker recognitionDayana Ribas, Emmanuel Vincent, José Ramón Calvo de Lara. 1057-1061 [doi]

SARMATA 2.0 automatic Polish language speech recognition systemBartosz Ziólko, Tomasz Jadczyk, Dawid Skurzok, Piotr Zelasko, Jakub Galka, Tomasz Pedzimaz, Ireneusz Gawlik, Szymon Piotr Palka. 1062-1063 [doi]

Remeeting - get more out of meetingsArlo Faria, Korbinian Riedhammer. 1064-1065 [doi]

Web application system for pronunciation practice by children with disabilities and to support cooperation of teachers and medical workersIkuyo Masuda-Katsuse. 1066-1067 [doi]

PATSY - it's all about pronunciation!Caroline Kaufhold, Vadim Gamidov, Andreas Kießling, Klaus Reinhard, Elmar Nöth. 1068-1069 [doi]

Real-time pitch modification system for speech and singing voiceElias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander A. Petrovsky. 1070-1071 [doi]

Nao is doing humour in the CHIST-ERA joker projectGuillaume Dubuisson Duplessis, Lucile Bechade, Mohamed A. Sehili, Agnes Delaborde, Vincent Letard, Anne-Laure Ligozat, Paul Deléglise, Yannick Estève, Sophie Rosset, Laurence Devillers. 1072-1073 [doi]

ABIMS - auditory bewildered interaction measurement systemLisa Lange, Bartholomäus Pfeiffer, Daniel Duran. 1074-1075 [doi]

Maximum a posteriori adaptation of network parameters in deep modelsZhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jinyu Li, Jiadong Wu, Chin-Hui Lee. 1076-1080 [doi]

Regularized sequence-level deep neural network model adaptationYan Huang, Yifan Gong. 1081-1085 [doi]

Modeling speaker variability using long short-term memory networks for speech recognitionXiangang Li, Xihong Wu. 1086-1090 [doi]

Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptationKshitiz Kumar, Chaojun Liu, Kaisheng Yao, Yifan Gong. 1091-1095 [doi]

Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMMMurali Karthick B, Prateek Kolhar, S. Umesh. 1096-1100 [doi]

On speaker adaptation of long short-term memory recurrent neural networksYajie Miao, Florian Metze. 1101-1105 [doi]

Automatic identification of received language in MEGEmilio Parisotto, Youness Aliyari Ghassabeh, Matt J. MacDonald, Adelina Cozma, Elizabeth W. Pang, Frank Rudzicz. 1106-1110 [doi]

Detection of cardiovascular reactivity in speechLaurens van der Werff, Jón Guðnason, Kamilla Rún Jóhannsdóttir. 1111-1115 [doi]

Lateralization in emotional speech perception following transcranial direct current stimulationAlex Francois-Nienaber, Jed A. Meltzer, Frank Rudzicz. 1116-1120 [doi]

Speech reconstruction from human auditory cortex with deep neural networksMinda Yang, Sameer A. Sheth, Catherine A. Schevon, Guy M. McKhann II, Nima Mesgarani. 1121-1125 [doi]

Temporal dynamics of the speech readiness potential, and its use in a neural decoder of speech-motor intentionJonathan S. Brumberg, Nichol Castro, Akshatha Rao. 1126-1130 [doi]

Continuous speech recognition from ECoGDominic Heger, Christian Herff, Adriana de Pesters, Dominic Telaar, Peter Brunner, Gerwin Schalk, Tanja Schultz. 1131-1135 [doi]

Locally-connected and convolutional neural networks for small footprint speaker recognitionYu-Hsin Chen, Ignacio Lopez-Moreno, Tara N. Sainath, Mirkó Visontai, Raziel Alvarez, Carolina Parada. 1136-1140 [doi]

Insights into deep neural networks for speaker recognitionDaniel Garcia-Romero, Alan McCree. 1141-1145 [doi]

A unified deep neural network for speaker and language recognitionFred Richardson, Douglas A. Reynolds, Najim Dehak. 1146-1150 [doi]

Investigation of bottleneck features and multilingual deep neural networks for speaker verificationYao Tian, Meng Cai, Liang He, Jia Liu. 1151-1155 [doi]

Frequency offset correction in single sideband (SSB) speech by deep neural network for speaker verificationHua Xing, Gang Liu, John H. L. Hansen. 1156-1160 [doi]

Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditionsHao Zheng, Shanshan Zhang, Wenju Liu. 1161-1165 [doi]

AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environmentsDhananjaya N. Gowda, Rahim Saeidi, Paavo Alku. 1166-1170 [doi]

Fast and accurate phase unwrappingThomas Drugman, Yannis Stylianou. 1171-1175 [doi]

Sparse representation with temporal max-smoothing for acoustic event detectionXugang Lu, Peng Shen, Yu Tsao, Chiori Hori, Hisashi Kawai. 1176-1180 [doi]

Estimation of glottal closure instants from telephone speech using a group delay-based approach that considers speech signal as a spectrumAnushiya Rachel G., Vijayalakshmi P. Vijayalakshmi P., Nagarajan T. 1181-1185 [doi]

The role of prosody and voice quality in text-dependent categories of storytelling across languagesRaúl Montaño, Francesc Alías. 1186-1190 [doi]

Neuromorphic based oscillatory device for incremental syllable boundary detectionAlexandre Hyafil, Milos Cernak. 1191-1195 [doi]

Simultaneous optimization of multiple tree structures for factor analyzed HMM-based speech synthesisTakenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda. 1196-1200 [doi]

HMM training strategy for incremental speech synthesisMaël Pouget, Thomas Hueber, Gérard Bailly, Timo Baumann. 1201-1205 [doi]

Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesisShinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura. 1206-1210 [doi]

Random forests for statistical speech synthesisAlan W. Black, Prasanna Kumar Muthukumar. 1211-1215 [doi]

Speaker adaptation using relevance vector regression for HMM-based expressive TTSDoo Hwa Hong, Joun Yeop Lee, Se-young Jang, Nam Soo Kim. 1216-1220 [doi]

Towards a linear dynamical model based speech synthesizerVassilios Tsiaras, Ranniery Maia, Vassilios Diakoloukas, Yannis Stylianou, Vassilios Digalakis. 1221-1225 [doi]

Providing objective metrics of team communication skills via interpersonal coordination mechanismsCéline De Looze, Brian Vaughan, Finnian Kelly, Alison Kay. 1226-1230 [doi]

Dialog act modeling for virtual personal assistant applications using a small volume of labeled data and domain knowledgeDonghyeon Lee, Jinsik Lee, Eun-Kyoung Kim, Jaewon Lee. 1231-1235 [doi]

A polyglot domain optimised text-to-speech system for railway station announcementsCsaba Zainkó, Mátyás Bartalis, Géza Németh, Gábor Olaszy. 1236-1240 [doi]

Development of hindi speech recognition system of agricultural commodities using deep neural networkPartho Mandal, Shalini Jain, Gaurav Ojha, Anupam Shukla. 1241-1245 [doi]

Real-time audio signal enhancement for hands-free speech applicationsThomas Fehér, Michael Freitag, Christian Gruber. 1246-1250 [doi]

Personalized synthetic voices for speaking impaired: website and appDaniel Erro, Inma Hernáez, Agustín Alonso, D. García-Lorenzo, Eva Navas, Jianpei Ye, H. Arzelus, Igor Jauk, Nguyen Quy Hy, C. Magariños, R. Pérez-Ramón, M. Sulír, Xiaohai Tian, X. Wang. 1251-1254 [doi]

Under-resourced speech recognition based on the speech manifoldReza Sahraeian, Dirk Van Compernolle, Febe de Wet. 1255-1259 [doi]

Multilingual features based keyword search for very low-resource languagesPavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney. 1260-1264 [doi]

Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme setsXiaoyun Wang, Seiichi Yamamoto. 1265-1269 [doi]

Using resources from a closely-related language to develop ASR for a very under-resourced language: a case study for ibanSarah Samson Juan, Laurent Besacier, Benjamin Lecouteux, Mohamed Dyab. 1270-1274 [doi]

Prediction of speech recognition accuracy for utterance classificationMaxim L. Korenevsky, Andrey B. Smirnov, Valentin S. Mendelev. 1275-1279 [doi]

Error bounds for context reduction and feature omissionEugen Beck, Ralf Schlüter, Hermann Ney. 1280-1284 [doi]

A metric for evaluating speech recognizer output based on human-perception modelNobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana, Masafumi Nishimura. 1285-1288 [doi]

How to evaluate ASR output for named entity recognition?Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset. 1289-1293 [doi]

Acoustic-prosodic analysis of attitudinal expressions in GermanHansjörg Mixdorff, Angelika Hönemann, Albert Rilliard. 1294-1298 [doi]

Continuous emotion tracking using total variability spaceHossein Khaki, Engin Erzin. 1299-1303 [doi]

An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perceptionChi-Chun Lee, Daniel Bone, Shrikanth S. Narayanan. 1304-1308 [doi]

Morphology of vocal affect bursts: exploring expressive interjections in Japanese conversationHiroki Mori. 1309-1313 [doi]

Emotion clustering based on probabilistic linear discriminant analysisMahnoosh Mehrabani, Ozlem Kalinli, Ruxin Chen. 1314-1318 [doi]

Objective study of the performance degradation in emotion recognition through the AMR-WB+ codecAaron Albin, Elliot Moore. 1319-1323 [doi]

Analysis of excitation source features of speech for emotion recognitionSudarsana Reddy Kadiri, P. Gangamohan, Suryakanth V. Gangashetty, Bayya Yegnanarayana. 1324-1328 [doi]

An investigation of emotion change detection from speechZhaocheng Huang, Julien Epps, Eliathamby Ambikairajah. 1329-1333 [doi]

Crosslinguistic comparison on the perception of Mandarin attitudinal speechWentao Gu, Ping Tang, Keikichi Hirose, Véronique Aubergé. 1334-1338 [doi]

Conflict intensity estimation from speech using Greedy forward-backward feature selectionGábor Gosztolya. 1339-1343 [doi]

Study of entity-topic models for OOV proper name retrievalImran A. Sheikh, Irina Illina, Dominique Fohr. 1344-1348 [doi]

Audio quotation marks for natural language understandingSimon Boutin, Réal Tremblay, Patrick Cardinal, Doug Peters, Pierre Dumouchel. 1349-1352 [doi]

Using word confusion networks for slot filling in spoken language understandingXiaohao Yang, Jia Liu. 1353-1357 [doi]

Distributed representation-based spoken word sense inductionJustin Chiu, Yajie Miao, Alan W. Black, Alexander I. Rudnicky. 1358-1362 [doi]

Structuring lectures in massive open online courses (MOOCs) for efficient learning by linking similar sections and predicting prerequisitesSheng-syun Shen, Hung-yi Lee, Shang-wen Li, Victor Zue, Lin-Shan Lee. 1363-1367 [doi]

News talk-show chaptering with journalistic genresDelphine Charlet, Géraldine Damnati, Jérémy Trione. 1368-1372 [doi]

An analysis of time-aggregated and time-series features for scoring different aspects of multimodal presentation dataVikram Ramanarayanan, Lei Chen 0004, Chee Wee Leong, Gary Feng, David Suendermann-Oeft. 1373-1377 [doi]

Incorporating prosodic prominence evidence into term weights for spoken content retrievalDavid Nicolas Racca, Gareth J. F. Jones. 1378-1382 [doi]

Leveraging word embeddings for spoken document summarizationKuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen. 1383-1387 [doi]

Mutually exclusive grounding for weakly supervised non-negative matrix factorisationVincent Renkens, Hugo Van Hamme. 1388-1392 [doi]

Using semantic maps for robust natural language interaction with robotsEmanuele Bastianelli, Danilo Croce, Roberto Basili, Daniele Nardi. 1393-1397 [doi]

Efficient learning for spoken language understanding tasks with word embedding based pre-trainingYi Luan, Shinji Watanabe, Bret Harsham. 1398-1402 [doi]

Zero-shot semantic parser for spoken language understandingEmmanuel Ferreira, Bassam Jabaian, Fabrice Lefèvre. 1403-1407 [doi]

Adapting lexical representation and OOV handling from written to spoken language with word embeddingJeremie Tafforeau, Thierry Artières, Benoît Favre, Frédéric Béchet. 1408-1412 [doi]

Multi-stream long short-term memory neural network language modelEbru Arisoy, Murat Saraçlar. 1413-1417 [doi]

Composition-based on-the-fly rescoring for salient n-gram biasingKeith B. Hall, Eunjoon Cho, Cyril Allauzen, Françoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang. 1418-1422 [doi]

Learning phrase patterns for ASR name error detection using semantic similarityAlex Marin, Mari Ostendorf, Ji He. 1423-1427 [doi]

Sparse non-negative matrix language modeling for skip-gramsNoam Shazeer, Joris Pelemans, Ciprian Chelba. 1428-1432 [doi]

Pruning sparse non-negative matrix n-gram language modelsJoris Pelemans, Noam Shazeer, Ciprian Chelba. 1433-1437 [doi]

Geo-location for voice search language modelingCiprian Chelba, Xuedong Zhang, Keith B. Hall. 1438-1442 [doi]

On efficient training of word classes and their application to recurrent neural network language modelsRami Botros, Kazuki Irie, Martin Sundermeyer, Hermann Ney. 1443-1447 [doi]

Deep semantic encodings for language modelingAli Orkan Bayer, Giuseppe Riccardi. 1448-1452 [doi]

Learning OOV through semantic relatedness in spoken dialog systemsMing Sun, Yun-Nung Chen, Alexander I. Rudnicky. 1453-1457 [doi]

TDTO language modeling with feedforward neural networksTze Yuang Chong, Rafael E. Banchs, Engsiong Chng, Haizhou Li. 1458-1462 [doi]

Improvements to the pruning behavior of DNN acoustic modelsMatthias Paulik. 1463-1467 [doi]

Fast and accurate recurrent neural network acoustic models for speech recognitionHasim Sak, Andrew W. Senior, Kanishka Rao, Françoise Beaufays. 1468-1472 [doi]

Compressing deep neural networks using a rank-constrained topologyPreetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada. 1473-1477 [doi]

Convolutional neural networks for small-footprint keyword spottingTara N. Sainath, Carolina Parada. 1478-1482 [doi]

Efficient GPU implementation of convolutional neural networks for speech recognitionEwout van den Berg, Daniel Brand, Rajesh Bordawekar, Leonid Rachevsky, Bhuvana Ramabhadran. 1483-1487 [doi]

Scalable distributed DNN training using commodity GPU cloud computingNikko Strom. 1488-1492 [doi]

Joint source localization and separation in spherical harmonic domain using a sparsity based methodSachin N. Kalkur, Sandeep Reddy C, Rajesh M. Hegde. 1493-1497 [doi]

Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separationShaofei Zhang, Dong-Yan Huang, Lei Xie, Engsiong Chng, Haizhou Li, Minghui Dong. 1498-1502 [doi]

Two-stage multi-target joint learning for monaural speech separationShuai Nie, Shan Liang, Wei Xue, Xueliang Zhang, Wenju Liu, Like Dong, Hong Yang. 1503-1507 [doi]

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancementYong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee. 1508-1512 [doi]

Discriminative nonnegative matrix factorization using cross-reconstruction error for source separationKisoo Kwon, Jong Won Shin, Hyung Yong Kim, Nam Soo Kim. 1513-1516 [doi]

Using audio and visual information for single channel speaker separationFaheem Khan, Ben Milner. 1517-1521 [doi]

Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotionsChee Seng Chong, Jeesun Kim, Chris Davis. 1522-1526 [doi]

Valence, arousal and dominance estimation for English, German, Greek, Portuguese and Spanish lexica using semantic modelsElisavet Palogiannidi, Elias Iosif, Polychronis Koutsakis, Alexandros Potamianos. 1527-1531 [doi]

Dimensionality reduction for speech emotion features by multiscale kernelsXinzhou Xu, Jun Deng, Wenming Zheng, Li Zhao, Björn W. Schuller. 1532-1536 [doi]

High-level feature representation using recurrent neural network for speech emotion recognitionJinKyu Lee, Ivan Tashev. 1537-1540 [doi]

Speech emotion classification using tree-structured sparse logistic regressionMyung Jong Kim, Joohong Yoo, Younggwan Kim, Hoirin Kim. 1541-1545 [doi]

Annotators' agreement and spontaneous emotion classification performanceBogdan Vlasenko, Andreas Wendemuth. 1546-1550 [doi]

On the nature of the features generated in the human auditory pathway for phone recognitionHarald Höge. 1551-1555 [doi]

How the slope of the speech spectrum affects the perception of speaker sizeKodai Yamamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara, Roy D. Patterson. 1556-1560 [doi]

Weakly-supervised word learning is improved by an active online algorithmHeikki Rasilo, Okko Räsänen. 1561-1565 [doi]

The effect of cochlear implant processing on speaker intelligibility: a perceptual study and computer modelLin Lin, Jon Barker, Guy J. Brown. 1566-1570 [doi]

Phonetic-phonological feature emerges by associating phonetic with semantic information - a GSOM-based modeling studyMengxue Cao, Aijun Li, Qiang Fang, Bernd J. Kröger. 1571-1575 [doi]

DIANA: towards computational modeling reaction times in lexical decision in north American EnglishLouis ten Bosch, Lou Boves, Benjamin V. Tucker, Mirjam Ernestus. 1576-1580 [doi]

Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributionsQian Chen, Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai. 1581-1585 [doi]

A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesisManuel Sam Ribeiro, Junichi Yamagishi, Robert A. J. Clark. 1586-1590 [doi]

Duration prediction using multi-level model for GPR-based speech synthesisDecha Moungsri, Tomoki Koriyama, Takao Kobayashi. 1591-1595 [doi]

Data-driven foot-based intonation generator for text-to-speech synthesisMahsa Sadat Elyasi Langarani, Jan P. H. van Santen, Seyed Hamidreza Mohammadi, Alexander Kain. 1596-1600 [doi]

Weighted correlation based atom decomposition intonation modellingBranislav Gerazov, Pierre-Edouard Honnet, Aleksandar Gjoreski, Philip N. Garner. 1601-1605 [doi]

Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech systemRaul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory. 1606-1610 [doi]

Large vocabulary automatic speech recognition for childrenHank Liao, Golan Pundak, Olivier Siohan, Melissa K. Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew W. Senior, Françoise Beaufays, Michiel Bacchiani. 1611-1615 [doi]

Acoustic-prosodic correlates of `awkward' prosody in story retellings from adolescents with autismDaniel Bone, Matthew P. Black, Anil Ramakrishna, Ruth B. Grossman, Shrikanth S. Narayanan. 1616-1620 [doi]

Evidence of phonological processes in automatic recognition of children's speechEva Fringi, Jill Fain Lehman, Martin J. Russell. 1621-1624 [doi]

Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio gamesMichael Pucher, Markus Toman, Dietmar Schabus, Cassia Valentini-Botinhao, Junichi Yamagishi, Bettina Zillinger, Erich Schmid. 1625-1629 [doi]

Low-memory fast on-line adaptation for acoustically mismatched children's speech recognitionS. Shahnawazuddin, Rohit Sinha 0003. 1630-1634 [doi]

Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modelingDiego Giuliani, Bagher BabaAli. 1635-1639 [doi]

HMM adaptation for child speech synthesisAvashna Govender, Febe de Wet, Jules-Raymond Tapamo. 1640-1644 [doi]

Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory studyJaebok Kim, Khiet P. Truong, Vicky Charisi, Cristina Zaga, Manja Lohse, Dirk Heylen, Vanessa Evers. 1645-1649 [doi]

Towards an automated screening tool for pediatric speech delayRoozbeh Sadeghian, Stephen A. Zahorian. 1650-1654 [doi]

Children's reading aloud performance: a database and automatic detection of disfluenciesJorge Proença, Dirce Celorico, Sara Candeias, Carla Lopes, Fernando Perdigão. 1655-1659 [doi]

Keyword spotting in multi-player voice driven games for childrenHarshavardhan Sundar, Jill Fain Lehman, Rita Singh. 1660-1664 [doi]

Age-dependent height estimation and speaker normalization for children's speech using the first three subglottal resonancesJinxi Guo, Rohit Paturi, Gary Yeung, Steven M. Lulich, Harish Arsikere, Abeer Alwan. 1665-1669 [doi]

The effect of speakers' regional varieties on listeners' decision-makingAdrian Leemann, Camilla Bernardasci, Francis Nolan. 1670-1674 [doi]

Word-initial glottal stop insertion, hiatus resolution and linking in British EnglishRobert Fuchs. 1675-1679 [doi]

Acoustic analysis of Mandarin affricatesShanpeng Li, Wentao Gu. 1680-1684 [doi]

Homophonous phonotactic and morphonotactic consonant clusters in word-final positionHannah Leykum, Sylvia Moosmüller, Wolfgang U. Dressler. 1685-1689 [doi]

Consonant duration and VOT as a function of syllable complexity and voicing in a sub-set of Spanish clustersMark Gibson, Ana María Fernández Planas, Adamantios Gafos, Emily Remirez. 1690-1694 [doi]

Hands-on tool producing front vowels for phonetic education: aiming for pronunciation training with tactile sensationTakayuki Arai. 1695-1699 [doi]

Acoustics of articulatory constraints: vowel classification and nasalizationIndranil Dutta, Ayushi Pandey. 1700-1704 [doi]

Voice-conditioned allophones of MOUTH and PRICE in bahamian creoleJanina Kraus. 1705-1709 [doi]

Analysis of spatial variation with app-based crowdsourced audio dataMarie-José Kolly, Adrian Leemann, Florian Matter. 1710-1714 [doi]

Confusability in L2 vowels: analyzing the role of different featuresMátyás Jani, Catia Cucchiarini, Roeland Van Hout, Helmer Strik. 1715-1719 [doi]

Perception of French speakers' German vowelsFrank Zimmerer, Jürgen Trouvain. 1720-1724 [doi]

Unintuitive phonetic behavior in tswana post-nasal stopsJagoda Bruni, Daniel Duran, Grzegorz Dogil. 1725-1729 [doi]

Modeling temporal dependency for robust estimation of LP model parameters in speech enhancementChun Hoy Wong, Tan Lee, Yu Ting Yeung, Pak-Chung Ching. 1730-1734 [doi]

Learning a speech manifold for signal subspace speech denoisingColin Vaz, Shrikanth S. Narayanan. 1735-1739 [doi]

An iterative speech model-based a priori SNR estimatorSamy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt. 1740-1744 [doi]

Multi-resolution stacking for speech separation based on boosted DNNXiao-lei Zhang, DeLiang Wang. 1745-1749 [doi]

Least squares estimate of the initial phases in STFT based speech enhancementSidsel Marie Nørholm, Martin Krawczyk-Becker, Timo Gerkmann, Steven van de Par, Jesper Rindom Jensen, Mads Græsbøll Christensen. 1750-1754 [doi]

Enhancement of non-stationary speech using harmonic chirp filtersSidsel Marie Nørholm, Jesper Rindom Jensen, Mads Græsbøll Christensen. 1755-1759 [doi]

Text-informed speech enhancement with deep neural networksKeisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani. 1760-1764 [doi]

Complex tensor factorization in modulation frequency domain for single-channel speech enhancementShogo Masaya, Masashi Unoki. 1765-1769 [doi]

Systematic integration of acoustic echo canceller and noise reduction modules for voice communication systemsHyeonjoo Kang, JeeSok Lee, Soonho Baek, Hong-Goo Kang. 1770-1774 [doi]

DNN-based residual echo suppressionChul-Min Lee, Jong Won Shin, Nam Soo Kim. 1775-1779 [doi]

Codebook-based speech enhancement using Markov process and speech-presence probabilityQi He, Changchun Bao, Feng Bao. 1780-1784 [doi]

On optimal smoothing in minimum statistics based noise trackingAleksej Chinaev, Reinhold Haeb-Umbach. 1785-1789 [doi]

A data-driven speech enhancement method based on modeled long-range temporal dynamicsYue Hao, Changchun Bao, Feng Bao, Feng Deng. 1790-1794 [doi]

Improved phase reconstruction in single-channel speech separationFlorian Mayer, Pejman Mowlaee. 1795-1799 [doi]

Dialog state tracking using long short-term memory neural networksXiaohao Yang, Jia Liu. 1800-1804 [doi]

Detecting repetitions in spoken dialogue systems using phonetic distancesJosé Lopes, Giampiero Salvi, Gabriel Skantze, Alberto Abad, Joakim Gustafson, Fernando Batista, Raveesh Meena, Isabel Trancoso. 1805-1809 [doi]

Multi-language hypotheses ranking and domain tracking for open domain dialogue systemsPaul A. Crook, Jean-Philippe Robichaud, Ruhi Sarikaya. 1810-1814 [doi]

Measuring mimicry in task-oriented conversations: degree of mimicry is related to task difficultyVijay Solanki, Alessandro Vinciarelli, Jane Stuart-Smith, Rachel Smith. 1815-1819 [doi]

Auto-imputing radial basis functions for neural-network turn-taking modelsKornel Laskowski. 1820-1824 [doi]

Effect of gender and call duration on customer satisfaction in call center big dataQuim Llimona, Jordi Luque, Xavier Anguera, Zoraida Hidalgo, Souneil Park, Nuria Oliver. 1825-1829 [doi]

Using profile similarity to measure agreement in personality perceptionZoraida Callejas, David Griol. 1830-1834 [doi]

Relieving mental stress of speakers using a tele-operated robot in foreign language speech educationShizuka Nakamura, Miki Watanabe, Yuichiro Yoshikawa, Kohei Ogawa, Hiroshi Ishiguro. 1835-1838 [doi]

Backward mimicry and forward influence in prosodic contour choice in standard American EnglishAgustín Gravano, Stefan Benus, Rivka Levitan, Julia Hirschberg. 1839-1843 [doi]

The role of speakers and context in classifying competition in overlapping speechShammur Absar Chowdhury, Morena Danieli, Giuseppe Riccardi. 1844-1848 [doi]

Automatic detection and annotation of disfluencies in spoken French corporaGeorge Christodoulides, Mathieu Avanzi. 1849-1853 [doi]

Clustering novel intents in a conversational interaction system with semantic parsingDilek Hakkani-Tür, Yun-Cheng Ju, Geoffrey Zweig, Gökhan Tür. 1854-1858 [doi]

Semantic analysis of spoken input using Markov logic networksVladimir Despotovic, Oliver Walter, Reinhold Haeb-Umbach. 1859-1863 [doi]

Hierarchical discriminative model for spoken language understanding based on convolutional neural networkJan Svec, Adam Chýlek, Lubos Smídl. 1864-1868 [doi]

Learning semantic hierarchy with distributed representations for unsupervised spoken language understandingYun-Nung Chen, William Yang Wang, Alexander I. Rudnicky. 1869-1873 [doi]

Phontasia - a game for training German orthographyKay Berkling, Nadine Pflaumer, Alexei Coyplove. 1874-1875 [doi]

E-commu-book: an assistive technology for users with speech impairmentsKa-Ho Wong, Wai-Kim Leung, Helen M. Meng. 1876-1877 [doi]

Swiss graphogame: concept and design presentation of a computerised reading intervention for children with high risk for poor reading outcomesMartina Röthlisberger, Iliana I. Karipidis, Georgette Pleisch, Volker Dellwo, Ulla Richardson, Silvia Brem. 1878-1879 [doi]

Neolexon - a therapy app for patients with aphasiaJakob Pfab, Hanna Jakob, Mona Späth, Christoph Draxler. 1880-1881 [doi]

Acoustic stress detection for improved navigation of educational videosSonal Patil, Harish Arsikere, Om Deshmukh. 1882-1883 [doi]

Multimodal read-aloud ebooks for language learningXavier Anguera. 1884-1885 [doi]

Speech technologies for african languages: example of a multilingual calculator for educationLaurent Besacier, Elodie Gauthier, Mathieu Mangeot, Philippe Bretier, Paul C. Bagshaw, Olivier Rosec, Thierry Moudenc, François Pellegrino, Sylvie Voisin, Egidio Marsico, Pascal Nocera. 1886-1887 [doi]

Time-frequency kernel-based CNN for speech recognitionTuo Zhao, Yunxin Zhao, Xin Chen. 1888-1892 [doi]

Consonant recognition with continuous-state hidden Markov models and perceptually-motivated featuresPhilip Weber, Colin J. Champion, S. M. Houghton, Peter Jancovic, Martin J. Russell. 1893-1897 [doi]

Investigating factor analysis features for deep neural networks in noisy speech recognitionSriram Ganapathy, Samuel Thomas, Dimitrios Dimitriadis, Steven J. Rennie. 1898-1902 [doi]

Ensemble of Gaussian mixture localized neural networks with application to phone recognitionRuchir Travadi, Shrikanth S. Narayanan. 1903-1907 [doi]

DNN derived filters for processing of modulation spectrum of speechJan Pesán, Lukás Burget, Hynek Hermansky, Karel Veselý. 1908-1911 [doi]

Exploring how deep neural networks form phonemic categoriesTasha Nagamine, Michael L. Seltzer, Nima Mesgarani. 1912-1916 [doi]

Pronunciation accuracy and intelligibility of non-native speechAnastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Alexei V. Ivanov, Klaus Zechner. 1917-1921 [doi]

Productions of /h/ in German: French vs. German speakersFrank Zimmerer, Jürgen Trouvain. 1922-1926 [doi]

German non-native realizations of French voiced fricatives in final position of a group of wordsAnne Bonneau, Martine Cadot. 1927-1931 [doi]

From newcastle MOUTH to aussie ears: australians' perceptual assimilation and adaptation for newcastle UK vowelsCatherine T. Best, Jason A. Shaw, Gerard Docherty, Bronwen G. Evans, Paul Foulkes, Jennifer Hay, Jalal Al-Tamimi, Katharine Mair, Karen E. Mulak, Sophie Wood. 1932-1936 [doi]

Wubuy coronal stop perception by speakers of three dialects of banglaRikke Louise Bundgaard-Nielsen, Brett Baker, Olga Maxwell, Janet Fletcher. 1937-1941 [doi]

Using melody metrics to compare English speech read by native speakers and by L2 Chinese speakers from shanghaiDaniel Hirst, Hongwei Ding. 1942-1946 [doi]

Predicting therapist empathy in motivational interviews using language features inspired by psycholinguistic normsJames Gibson, Nikolaos Malandrakis, Francisco Romero, David C. Atkins, Shrikanth S. Narayanan. 1947-1951 [doi]

Therapy language analysis using automatically generated psycholinguistic normsNikolaos Malandrakis, Shrikanth S. Narayanan. 1952-1956 [doi]

A dynamic model for behavioral analysis of couple interactions using acoustic featuresWei Xia, James Gibson, Bo Xiao, Brian R. Baucom, Panayiotis G. Georgiou. 1957-1961 [doi]

Analysis and modeling of the role of laughter in motivational interviewing based psychotherapy conversationsRahul Gupta, Theodora Chaspari, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan. 1962-1966 [doi]

The discourse value of social signals at topic change momentsFrancesca Bonin, Nick Campbell, Carl Vogel. 1967-1971 [doi]

Automatic detection of uncertainty in spontaneous German dialogueTobias Schrank, Barbara Schuppler. 1972-1976 [doi]

Face reading from speech - predicting facial action units from audio cuesFabien Ringeval, Erik Marchi, Marc Mehu, Klaus R. Scherer, Björn W. Schuller. 1977-1981 [doi]

A new front-end for classification of non-speech sounds: a study on human whistleMahesh Kumar Nandwana, Hynek Boril, John H. L. Hansen. 1982-1986 [doi]

Robust features for sonorant segmentation in continuous speechSri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, Bayya Yegnanarayana. 1987-1991 [doi]

Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signalsSebastian Gergen, Anil M. Nagathil, Rainer Martin. 1992-1996 [doi]

Spiking neural networks and the generalised hough transform for speech pattern detectionJonathan William Dennis, Tran Huy Dat, Haizhou Li. 1997-2001 [doi]

Acoustic event recognition using dominant spectral basis vectorsWoohyun Choi, Sangwook Park, David K. Han, Hanseok Ko. 2002-2006 [doi]

Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systemsPei-hao Su, David Vandyke, Milica Gasic, DongHo Kim, Nikola Mrksic, Tsung-Hsien Wen, Steve J. Young. 2007-2011 [doi]

A framework to develop context-aware adaptive dialogue systemDavid Griol, Zoraida Callejas, Ramón López-Cózar. 2012-2016 [doi]

A proposal to develop domain and subtask-adaptive dialog management modelsDavid Griol, Zoraida Callejas. 2017-2021 [doi]

Hypotheses ranking and state tracking for a multi-domain dialog system using multiple ASR alternatesOmar Zia Khan, Jean-Philippe Robichaud, Paul A. Crook, Ruhi Sarikaya. 2022-2026 [doi]

An entropy minimization framework for goal-driven dialogue managementJi Wu, Miao Li, Chin-Hui Lee. 2027-2031 [doi]

Context-dependent error correction of spoken referring expressionsIngrid Zukerman, Andisheh Partovi, Su Nam Kim. 2032-2036 [doi]

ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challengeZhizheng Wu, Tomi Kinnunen, Nicholas W. D. Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, Aleksandr Sizov. 2037-2041 [doi]

The AHOLAB RPS SSD spoofing challenge 2015 submissionJon Sánchez, Ibon Saratxaga, Inma Hernáez, Eva Navas, Daniel Erro. 2042-2046 [doi]

Human vs machine spoofing detection on wideband and narrowband dataMirjam Wester, Zhizheng Wu, Junichi Yamagishi. 2047-2051 [doi]

Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challengeXiong Xiao, Xiaohai Tian, Steven Du, Haihua Xu, Engsiong Chng, Haizhou Li. 2052-2056 [doi]

Classifiers for synthetic speech detection: a comparisonCemal Hanilçi, Tomi Kinnunen, Md. Sahidullah, Aleksandr Sizov. 2057-2061 [doi]

Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speechTanvina B. Patel, Hemant A. Patil. 2062-2066 [doi]

Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challengeJesús A. Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida. 2067-2071 [doi]

Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015Md. Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Themos Stafylakis. 2072-2076 [doi]

Spoofing countermeasure based on analysis of linear prediction errorArtur Janicki. 2077-2081 [doi]

Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofingYi Liu, Yao Tian, Liang He, Jia Liu, Michael T. Johnson. 2082-2086 [doi]

A comparison of features for synthetic speech detectionMd. Sahidullah, Tomi Kinnunen, Cemal Hanilçi. 2087-2091 [doi]

Relative phase information for detecting human speech and spoofed speechLongbiao Wang, Yohei Yoshida, Yuta Kawakami, Seiichi Nakagawa. 2092-2096 [doi]

Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challengeNanxin Chen, Yanmin Qian, Heinrich Dinkel, Bo Chen, Kai Yu. 2097-2101 [doi]

Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSRKyungmin Lee, Chiyoun Park, Ilhwan Kim, Namhoon Kim, Jaewon Lee. 2102-2106 [doi]

Real-time integration of dynamic context information for improving automatic speech recognitionYoussef Oualil, Marc Schulder, Hartmut Helmke, Anna Schmidt, Dietrich Klakow. 2107-2111 [doi]

Rapid vocabulary addition to context-dependent decoder graphsCyril Allauzen, Michael Riley. 2112-2116 [doi]

Modeling phonetic context with non-random forests for speech recognitionHainan Xu, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur. 2117-2121 [doi]

Ant colony algorithm applied to automatic speech recognition graph decodingBenjamin Lecouteux, Didier Schwab. 2122-2126 [doi]

Garbage modeling for on-device speech recognitionChristophe Van Gysel, Leonid Velikovich, Ian McGraw, Françoise Beaufays. 2127-2131 [doi]

A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognitionHaihua Xu, Van Hai Do, Xiong Xiao, Engsiong Chng. 2132-2136 [doi]

Neural higher-order factors in conditional random fields for phoneme classificationMartin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf. 2137-2141 [doi]

Stacked auto-encoder for ASR error detection and word error rate predictionShahab Jalalvand, Daniele Falavigna. 2142-2146 [doi]

Estimation of the air-tissue boundaries of the vocal tract in the mid-sagittal plane from electromagnetic articulograph dataSatyabrata Parida, Ashok Kumar Pattem, Prasanta Kumar Ghosh. 2147-2151 [doi]

A new Italian dataset of parallel acoustic and articulatory dataClaudia Canevari, Leonardo Badino, Luciano Fadiga. 2152-2156 [doi]

Error analysis of extracted tongue contours from 2d ultrasound imagesTamás Gábor Csapó, Steven M. Lulich. 2157-2161 [doi]

Accuracy of a markerless acquisition technique for studying speech articulatorsAndrea Bandini, Slim Ouni, Piero Cosi, Silvia Orlandi, Claudia Manfredi. 2162-2166 [doi]

Measuring oral and nasal airflow in production of Chinese plosiveYujie Chi, Kiyoshi Honda, Jianguo Wei, Hui Feng, Jianwu Dang. 2167-2171 [doi]

Enhanced videokymographic data analysis based on vocal folds dynamics modelingCarlo Drioli, Gian Luca Foresti. 2172-2176 [doi]

Interpolation of tongue fleshpoint kinematics from combined EMA position and orientation dataAndrew J. Kolb, Michael T. Johnson, Jeffrey Berry. 2177-2181 [doi]

A new technique for assessing glottal dynamics in speech and singing by means of optical-flow computationGustavo Andrade-Miranda, Nathalie Henrich Bernardoni, Juan Ignacio Godino-Llorente. 2182-2186 [doi]

On the incompatibility of trilling and palatalization: a single-subject study of sustained apical and uvular trillsAlexei Kochetov, Phil Howson. 2187-2191 [doi]

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddingsPengcheng Zhu, Lei Xie, Yunlin Chen. 2192-2196 [doi]

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesisYang Wang, Minghao Yang, Zhengqi Wen, Jianhua Tao. 2197-2201 [doi]

F0 parameterization of glottalized tones for HMM-based vietnamese TTSDuy Khanh Ninh, Yoichi Yamashita. 2202-2206 [doi]

Deep neural network context embeddings for model selection in rich-context HMM synthesisThomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, Simon King. 2207-2211 [doi]

An investigation of context clustering for statistical speech synthesis with deep neural networkBo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu. 2212-2216 [doi]

Sentence-level control vectors for deep neural network speech synthesisOliver Watts, Zhizheng Wu, Simon King. 2217-2221 [doi]

Micro-structure of disfluencies: basics for conversational speech synthesisSimon Betz, Petra Wagner, David Schlangen. 2222-2226 [doi]

Using automatic stress extraction from audio for improved prosody modelling in speech synthesisGyörgy Szaszák, András Beke, Gábor Olaszy, Bálint Pál Tóth. 2227-2231 [doi]

Reconstructing voices within the multiple-average-voice-model frameworkPierre Lanchantin, Christophe Veaux, Mark J. F. Gales, Simon King, Junichi Yamagishi. 2232-2236 [doi]

HMM based myanmar text to speech systemYe Kyaw Thu, Win Pa Pa, Jinfu Ni, Yoshinori Shiga, Andrew M. Finch, Chiori Hori, Hisashi Kawai, Eiichiro Sumita. 2237-2241 [doi]

Multiple feed-forward deep neural networks for statistical parametric speech synthesisShinji Takaki, Sangjin Kim, Junichi Yamagishi, Jongjin Kim. 2242-2246 [doi]

Adapting machine translation models toward misrecognized speech with text-to-speech pronunciation rules and acoustic confusabilityNicholas Ruiz, Qin Gao, William Lewis, Marcello Federico. 2247-2251 [doi]

"speech is silver, but silence is golden": improving speech-to-speech translation performance by slashing users inputFrédéric Béchet, Benoît Favre, Mickael Rouvier. 2252-2256 [doi]

A study on the stability and effectiveness of features in quality estimation for spoken language translationRaymond W. M. Ng, Kashif Shah, Lucia Specia, Thomas Hain. 2257-2261 [doi]

Efficient language model adaptation for automatic speech recognition of spoken translationsJoris Pelemans, Tom Vanallemeersch, Kris Demuynck, Hugo Van Hamme, Patrick Wambacq. 2262-2266 [doi]

Speed or accuracy? a study in evaluation of simultaneous speech translationTakashi Mieno, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura. 2267-2271 [doi]

Large scale speech-to-text translation with out-of-domain corpora using better context-based models and domain adaptationMarcin Junczys-Dowmunt, Pawel Przybysz, Arleta Staszuk, Eun-Kyoung Kim, Jaewon Lee. 2272-2276 [doi]

A statistical model-based voice activity detection using multiple DNNs and noise awarenessInyoung Hwang, Jaeseong Sim, Sang-Hyeon Kim, Kwang sub Song, Joon-Hyuk Chang. 2277-2281 [doi]

A universal VAD based on jointly trained deep neural networksQing Wang, Jun Du, Xiao-bao, Zi-Rui Wang, Li-Rong Dai, Chin-Hui Lee. 2282-2286 [doi]

Spectrographic speech mask estimation using the time-frequency correlation of speech presenceGe Zhan, Zhaoqiong Huang, Dongwen Ying, Jielin Pan, Yonghong Yan 0002. 2287-2291 [doi]

Complete-linkage clustering for voice activity detection in audio and visual speechHouman Ghaemmaghami, David Dean, Shahram Kalantari, Sridha Sridharan, Clinton Fookes. 2292-2296 [doi]

A model based voice activity detector for noisy environmentsKaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah. 2297-2301 [doi]

An unsupervised visual-only voice activity detection approach using temporal orofacial featuresFei Tao, John H. L. Hansen, Carlos Busso. 2302-2306 [doi]

An i-vector backend for speaker verificationPatrick Kenny, Themos Stafylakis, Md. Jahangir Alam, Marcel Kockmann. 2307-2311 [doi]

Multi-channel speaker verification based on total variability modellingMaria Joana Correia, Alessio Brutti, Alberto Abad. 2312-2316 [doi]

SNR-invariant PLDA modeling for robust speaker verificationNa Li, Man-Wai Mak. 2317-2321 [doi]

Investigating in-domain data requirements for PLDA trainingMd. Hafizur Rahman, David Dean, Ahilan Kanagasundaram, Sridha Sridharan. 2322-2326 [doi]

Migrating i-vectors between speaker recognition systems using regression neural networksOndrej Glembek, Pavel Matejka, Oldrich Plchot, Jan Pesán, Lukás Burget, Petr Schwarz. 2327-2331 [doi]

Improving PLDA speaker verification using WMFD and linear-weighted approaches in limited microphone data conditionsAhilan Kanagasundaram, David Dean, Sridha Sridharan. 2332-2336 [doi]

The relationship between voice source parameters and the maxima dispersion quotient (MDQ)Christer Gobl, Irena Yanushevskaya, Ailbhe Ní Chasaide. 2337-2341 [doi]

Glottal inverse filtering based on quadratic programmingManu Airaksinen, Tom Bäckström, Paavo Alku. 2342-2346 [doi]

Automatic detection of creaky voice using epoch parametersN. P. Narendra, K. Sreenivasa Rao. 2347-2351 [doi]

Perception of voicing in the absence of native voicing experienceRikke Louise Bundgaard-Nielsen, Brett Baker. 2352-2356 [doi]

The relationship between acoustic and perceived intraspeaker variability in voice qualityJody Kreiman, Soo-Jin Park, Patricia A. Keating, Abeer Alwan. 2357-2360 [doi]

Perceptual cues of whispered tones: are they really special?Li Jiao, Qiuwu Ma, Ting Wang, Yi Xu. 2361-2365 [doi]

Multiscale recurrent neural network based language modelTsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi. 2366-2370 [doi]

Bag-of-words input for long history representation in neural network-based language models for speech recognitionKazuki Irie, Ralf Schlüter, Hermann Ney. 2371-2375 [doi]

Efficient machine translation decoding with slow language modelsAhmad Emami. 2376-2379 [doi]

Latent words recurrent neural network language modelsRyo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito. 2380-2384 [doi]

Combining multiple-type input units using recurrent neural network for LVCSR language modelingVataya Chunwijitra, Ananlada Chotimongkol, Chai Wutiwiwatchai. 2385-2389 [doi]

Prosodically-enhanced recurrent neural network language modelsSiva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee. 2390-2394 [doi]

A comprehensive 3d biomechanically-driven vocal tract model including inverse dynamics for speech researchPeter Anderson, Negar M. Harandi, Scott Moisik, Ian Stavness, Sidney Fels. 2395-2399 [doi]

Low frequency ultrasonic voice activity detection using convolutional neural networksIan Vince McLoughlin, Yan Song. 2400-2404 [doi]

Real-time control of a DNN-based articulatory synthesizer for silent speech conversion: a pilot studyFlorent Bocquelet, Thomas Hueber, Laurent Girin, Christophe Savariaux, Blaise Yvert. 2405-2409 [doi]

Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networksDiandra Fabre, Thomas Hueber, Florent Bocquelet, Pierre Badin. 2410-2414 [doi]

Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive trainingJun Wang, Seongjun Hahm. 2415-2419 [doi]

Codebook clustering for unit selection based EMG-to-speech conversionLorenz Diener, Matthias Janke, Tanja Schultz. 2420-2424 [doi]

Flexible tracking of auditory attentionMajid Mirbagheri, Bradley Ekin, Les Atlas, Adrian K. C. Lee. 2425-2429 [doi]

A study on deep neural network acoustic model adaptation for robust far-field speech recognitionSeyedmahdad Mirsamadi, John H. L. Hansen. 2430-2434 [doi]

Speech dereverberation using long short-term memoryMasato Mimura, Shinsuke Sakai, Tatsuya Kawahara. 2435-2439 [doi]

Reverberation robust acoustic modeling using i-vectors with time delay neural networksVijayaditya Peddinti, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur. 2440-2444 [doi]

Delta-melspectra features for noise robustness to DNN-based ASR systemsKshitiz Kumar, Chaojun Liu, Yifan Gong. 2445-2448 [doi]

Combating reverberation in large vocabulary continuous speech recognitionVikramjit Mitra, Julien van Hout, Mitchell McLaren, Wen Wang, Martin Graciarena, Dimitra Vergyri, Horacio Franco. 2449-2453 [doi]

Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challengeMartin Karafiát, Frantisek Grézl, Lukás Burget, Igor Szöke, Jan Cernocký. 2454-2458 [doi]

Robust parameter estimation for audio declipping in noiseMark J. Harvilla, Richard M. Stern. 2459-2463 [doi]

Multi-task learning deep neural networks for speech feature denoisingBin Huang, Dengfeng Ke, Hao Zheng, Bo Xu, Yanyan Xu, Kaile Su. 2464-2468 [doi]

Time-frequency masking for large scale robust speech recognitionYuxuan Wang, Ananya Misra, Kean K. Chin. 2469-2473 [doi]

Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognitionRongfeng Su, Xurong Xie, Xunying Liu, Lan Wang. 2474-2478 [doi]

Investigating modulation spectrogram features for deep neural network-based automatic speech recognitionDeepak Baby, Hugo Van Hamme. 2479-2483 [doi]

Deep neural network based spectral feature mapping for robust speech recognitionKun Han, Yanzhang He, Deblin Bagchi, Eric Fosler-Lussier, DeLiang Wang. 2484-2488 [doi]

Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counselingBo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan. 2489-2493 [doi]

Agreement and disagreement utterance detection in conversational speech by extracting and integrating local featuresAtsushi Ando, Taichi Asami, Manabu Okamoto, Hirokazu Masataki, Sumitaka Sakauchi. 2494-2498 [doi]

Still together?: the role of acoustic features in predicting marital outcomeMd. Nasir, Wei Xia, Bo Xiao, Brian R. Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou. 2499-2503 [doi]

On evaluation metrics for social signal detectionGábor Gosztolya. 2504-2508 [doi]

Laughter and filler detection in naturalistic audioLakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen. 2509-2513 [doi]

Automatic formatted transcripts for videosAasish Pappu, Amanda Stent. 2514-2518 [doi]

Does my speech rock? automatic assessment of public speaking skillsLucas Azaïs, Adrien Payan, Tianjiao Sun, Guillaume Vidal, Tina Zhang, Eduardo Coutinho, Florian Eyben, Björn W. Schuller. 2519-2523 [doi]

Verbal intelligence identification based on text classificationRoman B. Sergienko, Alexander Schmitt. 2524-2528 [doi]

A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training programShan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Hsin-Chih Lin, Chi-Chun Lee. 2529-2533 [doi]

Are you TED talk material? comparing prosody in professors and TED speakersT. J. Tsai. 2534-2538 [doi]

Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systemsHayakawa Akira, Fasih Haider, Loredana Cerrato, Nick Campbell, Saturnino Luz. 2539-2543 [doi]

Perceptual speech quality dimensions in a conversational situationFriedemann Köster, Sebastian Möller. 2544-2548 [doi]

Multidimensional evaluation and predicting overall speech qualityJens Berger, Anna Llagostera. 2549-2552 [doi]

On speech intelligibility estimation of phase-aware single-channel speech enhancementAndreas Gaich, Pejman Mowlaee. 2553-2557 [doi]

A framework for the evaluation of microscopic intelligibility modelsRicard Marxer, Martin Cooke, Jon Barker. 2558-2562 [doi]

A binaural short time objective intelligibility measure for noisy and enhanced speechAsger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen. 2563-2567 [doi]

A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditionsYan Tang, Martin Cooke, Bruno M. Fazenda, Trevor J. Cox. 2568-2572 [doi]

Improving the prediction power of the speech transmission index to account for non-linear distortions introduced by noise-reduction algorithmsFei Chen. 2573-2577 [doi]

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speechKehuang Li, Zhen Huang, Yong Xu, Chin-Hui Lee. 2578-2582 [doi]

Speech quality evaluation of artificial bandwidth extension: comparing subjective judgments and instrumental predictionsHannu Pulakka, Ville Myllylä, Anssi Rämö, Paavo Alku. 2583-2587 [doi]

Synchronous overlap and add of spectra for enhancement of excitation in artificial bandwidth extension of speechM. A. Tugtekin Turan, Engin Erzin. 2588-2592 [doi]

Speech bandwidth expansion based on deep neural networksYingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, Jingming Kuang. 2593-2597 [doi]

A novel method of artificial bandwidth extension using deep architectureBin Liu, Jianhua Tao, Zhengqi Wen, Ya Li, Danish Bukhari. 2598-2602 [doi]

The reddots platform for mobile crowd-sourcing of speech dataKong-Aik Lee, Guangsen Wang, Kam Pheng Ng, Hanwu Sun, Trung Hieu Nguyen, Ngoc Thuy Huong Thai, Bin Ma, Haizhou Li. 2603-2604 [doi]

Two extensions of umeda and teranishi's physical models of the human vocal tractTakayuki Arai. 2605-2606 [doi]

Collaborative annotation for person identification in TV showsMateusz Budnik, Laurent Besacier, Johann Poignant, Hervé Bredin, Claude Barras, Mickaël Stefas, Pierrick Bruneau, Thomas Tamisier. 2607-2608 [doi]

Phonetic/linguistic web services at BASThomas Kisler, Florian Schiel, Uwe D. Reichel, Christoph Draxler. 2609-2610 [doi]

Managing speech databases with emur and the EMU-webappRaphael Winkelmann. 2611-2612 [doi]

Visual comparison of speaker groupsSebastian Wankerl, Florian Hönig, Anton Batliner, Juan R. Orozco-Arroyave, Elmar Nöth. 2613-2614 [doi]

Tools for rapid customization of S2s systems for emergent domainsRohit Kumar 0001, Matthew E. Roy, Sanjika Hewavitharana, Dennis N. Mehay, Nina Zinovieva. 2615-2616 [doi]

The speech recognition virtual kitchen turns oneFlorian Metze, Eric Riebling, Eric Fosler-Lussier, Andrew R. Plummer, Rebecca Bates. 2617-2618 [doi]

Model-based adaptive pre-processing of speech for enhanced intelligibility in noise and reverberationJan Rennies, Andreas Volgenandt, Henning F. Schepker, Simon Doclo. 2619-2620 [doi]

Experiences with and new application ideas for the interspeech appSebastian Möller, Tilo Westermann. 2621-2622 [doi]

Traditional IVR and visual IVR - killing two birds with one stoneDmitry Sityaev, Praphul Kumar, Rajesh Ramchander. 2623-2624 [doi]

Annotating large lattices with the exact word errorRogier C. van Dalen, Mark J. F. Gales. 2625-2629 [doi]

Semi-supervised maximum mutual information training of deep neural network acoustic modelsVimal Manohar, Daniel Povey, Sanjeev Khudanpur. 2630-2634 [doi]

Rectified linear neural networks with tied-scalar regularization for LVCSRShiliang Zhang, Hui Jiang, Si Wei, Li-Rong Dai. 2635-2639 [doi]

Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognitionYanzhang He, Eric Fosler-Lussier. 2640-2644 [doi]

Distinct triphone acoustic modeling using deep neural networksDongpeng Chen, Brian Mak. 2645-2649 [doi]

Minimum word error training of RNN-based voice activity detectionGregory Gelly, Jean-Luc Gauvain. 2650-2654 [doi]

Classification of place-of-articulation of stop consonants using temporal analysisA. P. Prathosh, A. G. Ramakrishnan, T. V. Ananthapadmanabha. 2655-2659 [doi]

The emergence of nasal velar codas in Brazilian Portuguese: an rt-MRI studyMarissa S. Barlaz, Maojing Fu, Zhi-Pei Liang, Ryan Shosted, Bradley P. Sutton. 2660-2664 [doi]

Salient dimensions in implicit phonotactic learningElise Michon, Emmanuel Dupoux, Alejandrina Cristia. 2665-2669 [doi]

An acoustic examination of the three-way sibilant contrast in lower sorbianPhil Howson. 2670-2674 [doi]

Investigating consonant reduction in Mandarin Chinese with improved forced alignmentJiahong Yuan, Mark Liberman. 2675-2678 [doi]

Durational characteristics and timing patterns of Russian onset clusters at two speaking ratesMarianne Pouplier, Stefania Marin, Alexei Kochetov. 2679-2683 [doi]

Vocal biomarkers to discriminate cognitive load in a working memory taskThomas F. Quatieri, James R. Williamson, Christopher J. Smalt, Tejash Patel, Joseph Perricone, Daryush D. Mehta, Brian S. Helfer, Gregory Ciccarelli, Darrell Ricke, Nicolas Malyska, Jeff Palmer, Kristin Heaton, Marianna Eddy, Joseph Moran. 2684-2688 [doi]

I-vector based physical task stress detection with different fusion strategiesChunlei Zhang, Gang Liu, Chengzhu Yu, John H. L. Hansen. 2689-2693 [doi]

Automatic detection of mild cognitive impairment from spontaneous speech using ASRLászló Tóth, Gábor Gosztolya, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Edit Biró, Fruzsina Zsura, Magdolna Pákáski, János Kálmán. 2694-2698 [doi]

Contemporary stochastic feature selection algorithms for speech-based emotion recognitionMaxim Sidorov, Christina Brester, Alexander Schmitt. 2699-2703 [doi]

Effect of different jitter-induced glottal pulse shape changes in periodicity perturbation measuresCarlos A. Ferrer, Diana Torres, Eduardo González-Moreira, José Ramón Calvo de Lara, Eduardo Castillo. 2704-2708 [doi]

Automatic audio sentiment extraction using keyword spottingLakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen. 2709-2713 [doi]

Unsupervised relation detection using automatic alignment of query patterns extracted from knowledge graphs and query click logsPanupong Pasupat, Dilek Hakkani-Tür. 2714-2718 [doi]

A latent variable model for joint pause prediction and dependency parsingThe Tung Nguyen, Graham Neubig, Hiroyuki Shindo, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura. 2719-2723 [doi]

Extractive meeting summarization through speaker zone detectionMohammad Hadi Bokaei, Hossein Sameti, Yang Liu. 2724-2728 [doi]

Positional language modeling for extractive broadcast news speech summarizationShih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu. 2729-2733 [doi]

Speech-based location estimation of first responders in a simulated search and rescue scenarioSaeid Mokaram, Roger K. Moore. 2734-2738 [doi]

Constructive feedback, thinking process and cooperation: assessing the quality of classroom interactionTahir Sousa, Lucie Flekova, Margot Mieskes, Iryna Gurevych. 2739-2743 [doi]

A real-time variable-q non-stationary Gabor transform for pitch shiftingDong-Yan Huang, Minghui Dong, Haizhou Li. 2744-2748 [doi]

Many-to-many voice conversion based on multiple non-negative matrix factorizationRyo Aihara, Tetsuya Takiguchi, Yasuo Ariki. 2749-2753 [doi]

Statistical singing voice conversion based on direct waveform modification with global varianceKazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 2754-2758 [doi]

System fusion for high-performance voice conversionXiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Minghui Dong, Engsiong Chng. 2759-2763 [doi]

Speaker adaptation using only vocalic segments via frequency warpingAgustín Alonso, Daniel Erro, Eva Navas, Inma Hernáez. 2764-2768 [doi]

Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environmentsYusuke Tajiri, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 2769-2773 [doi]

Transcribing continuous speech using mismatched crowdsourcingPreethi Jyothi, Mark Hasegawa-Johnson. 2774-2778 [doi]

Selection and aggregation techniques for crowdsourced semantic annotation taskShammur Absar Chowdhury, Marcos Calvo, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Fernando García 0001, Emilio Sanchis Arnal. 2779-2783 [doi]

Controlling quality and handling fraud in large scale crowdsourcing speech data collectionsSpencer Rothwell, Ahmad Elshenawy, Steele Carter, Daniela Braga, Faraz Romani, Michael Kennewick, Bob Kennewick. 2784-2788 [doi]

Data collection and annotation for state-of-the-art NER using unmanaged crowdsSpencer Rothwell, Steele Carter, Ahmad Elshenawy, Vladislavs Dovgalecs, Safiyyah Saleem, Daniela Braga, Bob Kennewick. 2789-2793 [doi]

Robustness in speech quality assessment and temporal training expiry in mobile crowdsourcing environmentsTim Polzehl, Babak Naderi, Friedemann Köster, Sebastian Möller. 2794-2798 [doi]

Effect of trapping questions on the reliability of speech quality judgments in a crowdsourcing paradigmBabak Naderi, Tim Polzehl, Ina Wechsung, Friedemann Köster, Sebastian Möller. 2799-2803 [doi]

Voice Äpp: a mobile app for crowdsourcing Swiss German dialect dataAdrian Leemann, Marie-José Kolly, Jean Philippe Goldman, Volker Dellwo, Ingrid Hove, Ibrahim Almajai, Sarah Grimm, Sylvain Robert, Daniel Wanitsch. 2804-2808 [doi]

Expert and crowdsourced annotation of pronunciation errors for automatic scoring systemsAnastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Klaus Zechner. 2809-2813 [doi]

Capcap: an output-agreement game for video captioningHernisa Kacorri, Kaoru Shinkawa, Shin Saito. 2814-2818 [doi]

Auris populi: crowdsourced native transcriptions of Dutch vowels spoken by adult Spanish learnersPepi Burgos, Eric Sanders, Catia Cucchiarini, Roeland Van Hout, Helmer Strik. 2819-2823 [doi]

Crowdsource a little to label a lot: labeling a speech corpus of dialectal ArabicSamantha Wray, Ahmed Ali. 2824-2828 [doi]

Using keyword spotting to help humans correct captioning fasterYashesh Gaur, Florian Metze, Yajie Miao, Jeffrey P. Bigham. 2829-2833 [doi]

Validating and optimizing a crowdsourced method for gradient measures of child speechTara McAllister Byun, Elaine Hitchcock, Daphna Harel. 2834-2838 [doi]

Joint training of speech separation, filterbank and acoustic model for robust automatic speech recognitionZhong-qiu Wang, DeLiang Wang. 2839-2843 [doi]

Joint environment and speaker normalization using factored front-end CMLLRShakti Rath, Sunil Sivadas, Bin Ma. 2844-2848 [doi]

Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtractionAkihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa. 2849-2853 [doi]

Robust i-vector extraction for neural network adaptation in noisy environmentChengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen. 2854-2857 [doi]

Spectrally selective dithering for distorted speech recognitionMichal Borsky, Petr Mizera, Petr Pollák. 2858-2861 [doi]

Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic modelsLiang Lu, Steve Renals. 2862-2866 [doi]

Speaker adaptation using the i-vector technique for bottleneck featuresPatrick Cardinal, Najim Dehak, Yu Zhang, James R. Glass. 2867-2871 [doi]

I-vector estimation using informative priors for adaptation of deep neural networksPenny Karanasou, Mark J. F. Gales, Philip C. Woodland. 2872-2876 [doi]

Robust i-vector based adaptation of DNN acoustic model for speech recognitionSri Garimella, Arindam Mandal, Nikko Strom, Björn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi. 2877-2881 [doi]

GMM-derived features for effective unsupervised adaptation of deep neural network acoustic modelsNatalia A. Tomashenko, Yuri Y. Khokhlov. 2882-2886 [doi]

Unsupervised adaptation for deep neural network using linear least square methodRoger Hsiao, Tim Ng, Stavros Tsakalidis, Long Nguyen, Richard M. Schwartz. 2887-2891 [doi]

Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptationSheng Li, Xugang Lu, Yuya Akita, Tatsuya Kawahara. 2892-2896 [doi]

Data-selective transfer learning for multi-domain speech recognitionMortaza Doulaty, Oscar Saz, Thomas Hain. 2897-2901 [doi]

Automatic detection of equipment alarms in a neonatal intensive care unit environment: a knowledge-based approachGanna Raboshchuk, Peter Jancovic, Climent Nadeu, Alex Peiró Lilja, Münevver Köküer, Blanca Muñoz Mahamud, Ana Riverola de Veciana. 2902-2906 [doi]

"multilingual" deep neural network for music genre classificationJia Dai, Wenju Liu, Chongjia Ni, Like Dong, Hong Yang. 2907-2911 [doi]

Accurate endpointing with expected pause durationBaiyang Liu, Björn Hoffmeister, Ariya Rastrow. 2912-2916 [doi]

Locality constrained transitive distance clustering on speech dataWenbo Liu, Zhiding Yu, Bhiksha Raj, Ming Li. 2917-2921 [doi]

Feature extraction strategies in deep learning based acoustic event detectionMiquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, Tomohiro Nakatani. 2922-2926 [doi]

An acoustic event detection framework and evaluation metric for surveillance in carsPeter Transfeld, Simon Receveur, Tim Fingscheidt. 2927-2931 [doi]

Diachronic semantic cohesion for topic segmentation of TV broadcast newsAbdessalam Bouchekif, Géraldine Damnati, Yannick Estève, Delphine Charlet, Nathalie Camelin. 2932-2936 [doi]

Comparison of forced-alignment speech recognition and humans for generating reference VADIvan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri. 2937-2941 [doi]

Improving voice activity detection in moviesBernhard Lehner, Gerhard Widmer, Reinhard Sonnleitner. 2942-2946 [doi]

Language-independent method for analysis of German stuttering recordingsTomas Lustyk, Petr Bergl, Tino Haderlein, Elmar Nöth, Roman Cmejla. 2947-2951 [doi]

An investigation of MDVP parameters for voice pathology detection on three different databasesAhmed Y. Al-nasheri, Zulfiqar Ali, Ghulam Muhammad, Mansour Alsulaiman. 2952-2956 [doi]

Energy distribution analysis and nonlinear dynamical analysis of adductor spasmodic dysphoniaJiantao Wu, Ping Yu, Nan Yan, Lan Wang, Xiaohui Yang, Manwa L. Ng. 2957-2961 [doi]

Auditory-visual tone perception in hearing impaired Thai listenersBenjawan Kasisopa, Nittayapa Klangpornkun, Denis Burnham. 2962-2966 [doi]

Speech intelligibility decline in individuals with fast and slow rates of ALS progressionPanying Rong, Yana Yunusova, Jordan R. Green. 2967-2971 [doi]

Latency analysis of speech shadowing reveals processing differences in Japanese adults who do and do not stutterRong Na A, Koichi Mori, Naomi Sakai. 2972-2976 [doi]

A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populationsBrigitte Bigi, Katarzyna Klessa, Laurianne Georgeton, Christine Meunier. 2977-2981 [doi]

Autonomous measurement of speech intelligibility utilizing automatic speech recognitionBernd T. Meyer, Birger Kollmeier, Jasper Ooster. 2982-2986 [doi]

Can you hear me? acoustic modifications in speech directed to foreigners and hearing-impaired peopleMonja Angelika Knoll, Melissa Johnstone, Charlene Blakely. 2987-2990 [doi]

Improving automatic forced alignment for dysarthric speech transcriptionYu Ting Yeung, Ka-Ho Wong, Helen M. Meng. 2991-2995 [doi]

The reddots data collection for speaker recognitionKong-Aik Lee, Anthony Larcher, Guangsen Wang, Patrick Kenny, Niko Brümmer, David A. van Leeuwen, Hagai Aronowitz, Marcel Kockmann, Carlos Vaquero, Bin Ma, Haizhou Li, Themos Stafylakis, Md. Jahangir Alam, Albert Swart, Javier Perez. 2996-3000 [doi]

Noise-robust speaker recognition based on morphological component analysisYongjun He, Chen Chen, Jiqing Han. 3001-3005 [doi]

Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalizationAndreas Nautsch, Rahim Saeidi, Christian Rathgeb, Christoph Busch. 3006-3010 [doi]

Robustness to additive noise of locally-normalized cepstral coefficients in speaker verificationJosué Fredes, José Novoa, Víctor Poblete, Simon King, Richard M. Stern, Néstor Becerra Yoma. 3011-3015 [doi]

Probabilistic linear discriminant analysis for robust speaker identification in co-channel speechNavid Shokouhi, John H. L. Hansen. 3016-3020 [doi]

Community detection with manifold learning on speaker i-vector space for ChineseHongcui Wang, Di Jin, Lantian Li, Jianwu Dang. 3021-3025 [doi]

A comparison of neural network feature transforms for speaker diarizationSree Harsha Yella, Andreas Stolcke. 3026-3030 [doi]

Clustering short push-to-talk segmentsIlya Shapiro, Neta Rabin, Irit Opher, Itshak Lapidot. 3031-3035 [doi]

Exploring ANN back-ends for i-vector based speaker age estimationAnna Fedorova, Ondrej Glembek, Tomi Kinnunen, Pavel Matejka. 3036-3040 [doi]

Analysis of the second phase of the 2013-2014 i-vector machine learning challengeDésiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Jaime Hernandez-Cordero, John M. Howard, Alvin F. Martin, Lisa P. Mason, Alan McCree, Douglas A. Reynolds. 3041-3045 [doi]

NIST language recognition evaluation - plans for 2015Alvin F. Martin, Craig S. Greenberg, John M. Howard, Désiré Bansé, George R. Doddington, Jaime Hernandez-Cordero, Lisa P. Mason. 3046-3050 [doi]

Communicative needs and respiratory constraintsMarcin Wlodarczak, Mattias Heldner, Jens Edlund. 3051-3055 [doi]

Analysis and classification of cooperative and competitive dialogsUwe D. Reichel, Nina Pörner, Dianne Nowack, Jennifer Cole. 3056-3060 [doi]

Towards automatic detection of reported speech in dialogue using prosodic cuesAlessandra Cervone, Catherine Lai, Silvia Pareti, Peter Bell 0001. 3061-3065 [doi]

Modeling phrasing and prominence using deep recurrent learningAndrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran. 3066-3070 [doi]

Pitch declination and reset as a function of utterance duration in conversational speech dataCéline De Looze, Irena Yanushevskaya, Andy Murphy, Eoghan O'Connor, Christer Gobl. 3071-3075 [doi]

Investigating the role of `yeah' in stance-dense conversationValerie Freeman, Gina-Anne Levow, Richard A. Wright, Mari Ostendorf. 3076-3080 [doi]

Factor analysis for speaker segmentation and improved speaker diarizationBrecht Desplanques, Kris Demuynck, Jean-Pierre Martens. 3081-3085 [doi]

Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversationsKoji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Katsuya Takanashi, Tatsuya Kawahara. 3086-3090 [doi]

Novel clustering selection criterion for fast binary key speaker diarizationHéctor Delgado, Xavier Anguera, Corinne Fredouille, Javier Serrano. 3091-3095 [doi]

Speaker diarization with i-vectors from DNN senone posteriorsGregory Sell, Daniel Garcia-Romero, Alan McCree. 3096-3099 [doi]

Using voice-quality measurements with prosodic and spectral features for speaker diarizationAbraham Woubie, Jordi Luque, Javier Hernando. 3100-3104 [doi]

Integrating online i-vector extractor with information bottleneck based speaker diarization systemSrikanth R. Madikeri, Ivan Himawan, Petr Motlícek, Marc Ferras. 3105-3109 [doi]

Enhanced processing of a lost language: linguistic knowledge or linguistic skill?Jiyoun Choi, Mirjam Broersma, Anne Cutler. 3110-3114 [doi]

Production inconsistencies delay adaptation to foreign accentsAnn-Kathrin Grohe, Gregory J. Poarch, Adriana Hanulíková, Andrea Weber. 3115-3119 [doi]

Acquisition of English speech rhythm by monolingual childrenMikhail Ordin, Leona Polyanskaya. 3120-3124 [doi]

Durational information in word-initial lexical embeddings in spoken DutchOdette Scharenborg. 3125-3129 [doi]

The development of categorical perception of lexical tones in Mandarin-speaking preschoolersFei Chen, Nan Yan, Lan Wang, Tao Yang, Jiantao Wu, Han Zhao, Gang Peng. 3130-3134 [doi]

Perception of Italian liquids by Japanese listeners: comparisons to Spanish liquidsTomohiko Ooigawa. 3135-3139 [doi]

The IBM 2015 English conversational telephone speech recognition systemGeorge Saon, Hong-Kwang Jeff Kuo, Steven J. Rennie, Michael Picheny. 3140-3144 [doi]

The cambridge university 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translationXunying Liu, Federico Flego, Linlin Wang, C. Zhang, Mark J. F. Gales, Philip C. Woodland. 3145-3149 [doi]

The IBM BOLT speech transcription systemSamuel Thomas, George Saon, Hong-Kwang Jeff Kuo, Lidia Mangu. 3150-3153 [doi]

Improvements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and ArabicM. Ali Basha Shaik, Zoltán Tüske, Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney. 3154-3158 [doi]

Active learning based data selection for limited resource STT and KWSThiago Fraga-Silva, Jean-Luc Gauvain, Lori Lamel, Antoine Laurent, Viet Bac Le, Abdelkhalek Messaoudi. 3159-3163 [doi]

Improved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledgePreethi Jyothi, Mark Hasegawa-Johnson. 3164-3168 [doi]

The zero resource speech challenge 2015Maarten Versteegh, Roland Thiollière, Thomas Schatz, Xuan-Nga Cao, Xavier Anguera, Aren Jansen, Emmanuel Dupoux. 3169-3173 [doi]

Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encodersLeonardo Badino, Alessio Mereta, Lorenzo Rosasco. 3174-3178 [doi]

A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modelingRoland Thiollière, Ewan Dunbar, Gabriel Synnaeve, Maarten Versteegh, Emmanuel Dupoux. 3179-3183 [doi]

Automatic segmentation and clustering of speech using sparse coding and metaheuristic searchWiehan Agenbag, Thomas Niesler. 3184-3188 [doi]

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility studyHongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li. 3189-3193 [doi]

Using articulatory features and inferred phonological segments in zero resource speech processingPallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, Alan W. Black. 3194-3198 [doi]

A comparison of neural network methods for unsupervised representation learning on the zero resource speech challengeDaniel Renshaw, Herman Kamper, Aren Jansen, Sharon Goldwater. 3199-3203 [doi]

Unsupervised word discovery from speech using automatic segmentation into syllable-like unitsOkko Räsänen, Gabriel Doyle, Michael C. Frank. 3204-3208 [doi]

An evaluation of graph clustering methods for unsupervised term discoveryVince Lyzinski, Gregory Sell, Aren Jansen. 3209-3213 [doi]

A time delay neural network architecture for efficient modeling of long temporal contextsVijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur. 3214-3218 [doi]

Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognitionXiangang Li, Xihong Wu. 3219-3223 [doi]

Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modellingChao Zhang, Philip C. Woodland. 3224-3228 [doi]

Discriminative template learning in group-convolutional networks for invariant speech representationsChiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso A. Poggio. 3229-3233 [doi]

Investigation of parametric rectified linear units for noise robust speech recognitionSunil Sivadas, Zhenzhou Wu, Ma Bin. 3234-3238 [doi]

Multi-softmax deep neural network for semi-supervised trainingHang Su, Haihua Xu. 3239-3243 [doi]

A multi-region deep neural network model in speech recognitionJia Cui, George Saon, Bhuvana Ramabhadran, Brian Kingsbury. 3244-3248 [doi]

A study of the recurrent neural network encoder-decoder for large vocabulary speech recognitionLiang Lu, Xingxing Zhang, KyungHyun Cho, Steve Renals. 3249-3253 [doi]

Gaussian free cluster tree construction using deep neural networkLinchen Zhu, Kevin Kilgour, Sebastian Stüker, Alex Waibel. 3254-3258 [doi]

Very deep convolutional neural networks for LVCSRMengxiao Bi, Yanmin Qian, Kai Yu. 3259-3263 [doi]

Transferring knowledge from a RNN to a DNNWilliam Chan, Nan Rosemary Ke, Ian Lane. 3264-3268 [doi]

SVD-based universal DNN modeling for multiple scenariosChangliang Liu, Jinyu Li, Yifan Gong. 3269-3273 [doi]

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networksZhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey. 3274-3278 [doi]

Speaker-dependent multipitch tracking using deep neural networksYuzhou Liu, DeLiang Wang. 3279-3283 [doi]

An error correction scheme for GCI detection algorithms using pitch smoothness criterionP. Sujith, A. P. Prathosh, A. G. Ramakrishnan, Prasanta Kumar Ghosh. 3284-3288 [doi]

Robust pitch estimation in noisy speech using ZTW and group delay functionRaviShankar Prasad, Bayya Yegnanarayana. 3289-3292 [doi]

Robust localization of single sound source based on phase difference regressionZhaoqiong Huang, Ge Zhan, Dongwen Ying, Yonghong Yan 0002. 3293-3297 [doi]

Frequency map selection using a RBFN-based classifier in the MVDR beamformer for speaker localization in reverberant roomsDaniele Salvati, Carlo Drioli, Gian Luca Foresti. 3298-3301 [doi]

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditionsNing Ma, Guy J. Brown, Tobias May. 3302-3306 [doi]

Joint optimization of recurrent networks exploiting source auto-regression for source separationShuai Nie, Wei Xue, Shan Liang, Xueliang Zhang, Wenju Liu, Liwei Qiao, Jianping Li. 3307-3311 [doi]

Real-time audio-to-score alignment of singing voice based on melody and lyric informationRong Gong, Philippe Cuvillier, Nicolas Obin, Arshia Cont. 3312-3316 [doi]

Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fittingJun-Yong Lee, Hye-Seung Cho, Hyoung-Gook Kim. 3317-3320 [doi]

A two-stage singing voice separation algorithm using spectro-temporal modulation featuresFrederick Z. Yen, Mao-Chang Huang, Tai-Shih Chi. 3321-3324 [doi]

Robust sound event classification using LBP-HOG based bag-of-audio-words feature representationHyungjun Lim, Myung Jong Kim, Hoirin Kim. 3325-3329 [doi]

Sequence-to-sequence neural net models for grapheme-to-phoneme conversionKaisheng Yao, Geoffrey Zweig. 3330-3334 [doi]

Knowledge versus data in TTS: evaluation of a continuum of synthesis systemsRosie Kay, Oliver Watts, Roberto Barra-Chicote, Cassie Mayo. 3335-3339 [doi]

Improving G2p from wiktionary and other (web) resourcesSteffen Eger. 3340-3344 [doi]

BLSTM neural networks for speech driven head motion synthesisChuang Ding, Pengcheng Zhu, Lei Xie. 3345-3349 [doi]

Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differentialPatrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. 3350-3354 [doi]

Reconstructing intelligible audio speech from visual speech featuresThomas Le Cornu, Ben Milner. 3355-3359 [doi]

Universal grapheme-based speech synthesisSunayana Sitaram, Alok Parlikar, Gopala Krishna Anumanchipalli, Alan W. Black. 3360-3364 [doi]

Artificial personality and disfluencyMirjam Wester, Matthew P. Aylett, Marcus Tomalin, Rasmus Dall. 3365-3369 [doi]

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesisMarc Evrard, Samuel Delalez, Christophe d'Alessandro, Albert Rilliard. 3370-3374 [doi]

A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controlsLuc Ardaillon, Gilles Degottex, Axel Roebel. 3375-3379 [doi]

Creating expressive synthetic voices by unsupervised clustering of audiobooksIgor Jauk, Antonio Bonafonte, Paula Lopez-Otero, Laura Docío Fernández. 3380-3384 [doi]

Articulatory-based conversion of foreign accents with deep neural networksSandesh Aryal, Ricardo Gutierrez-Osuna. 3385-3389 [doi]

Action planning and congruency effect between articulation and graspingMikko Tiainen, Lari Vainio, Kaisa Tiippana, Naeem Komeilipoor, Martti Vainio. 3390-3393 [doi]

Cognitive workload and vocabulary sparseness: theory and practiceRon M. Hecht, Aharon Bar-Hillel, Stas Tiomkin, Hadar Levi, Omer Tsimhoni, Naftali Tishby. 3394-3398 [doi]

Counting competing speakers in a timeframe - human versus computerValentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu. 3399-3403 [doi]

Segmental contribution to the intelligibility of ideal binary-masked sentencesFei Chen, Alexander Siu Tai Kwok. 3404-3407 [doi]

Perception of an existing and non-existing L2 English phoneme behind noise by Japanese native speakersMako Ishida, Takayuki Arai. 3408-3411 [doi]

Viseme comparison based on phonetic cues for varying speech accentsChitralekha Bhat, Sunil Kumar Kopparapu. 3412-3416 [doi]

Quantifying difference in vocalizations of bird populationsColm O'Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte. 3417-3421 [doi]

Reverberation-robust acoustic indoor localizationJae Choi, Jeunghun Kim, Shin Jae Kang, Nam Soo Kim. 3422-3425 [doi]

An alternating optimization approach for phase retrievalHuaiping Ming, Dong-Yan Huang, Lei Xie, Haizhou Li, Minghui Dong. 3426-3430 [doi]

Learning to estimate reverberation time in noisy and reverberant roomsXiong Xiao, Shengkui Zhao, Xionghu Zhong, Douglas L. Jones, Engsiong Chng, Haizhou Li. 3431-3435 [doi]

Direction of arrival estimation based on reverberation weighting and noise error estimatorCheng Pang, Jie Zhang, Hong Liu. 3436-3440 [doi]

Representing nonspeech audio signals through speech classification modelsHuy Phan, Lars Hertel, Marco Maaß, Radoslaw Mazur, Alfred Mertins. 3441-3445 [doi]

Mitigating the effects of non-stationary unseen noises on language recognition performanceLuciana Ferrer, Mitchell McLaren, Aaron Lawson, Martin Graciarena. 3446-3450 [doi]

An information theory based data-homogeneity measure for voice comparisonMoez Ajili, Jean-François Bonastre, Solange Rossato, Juliette Kahn, Itshak Lapidot. 3451-3455 [doi]

The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognitionDavid Dean, Ahilan Kanagasundaram, Houman Ghaemmaghami, Md. Hafizur Rahman, Sridha Sridharan. 3456-3460 [doi]

Score stabilization for speaker recognition trained on a small development setHagai Aronowitz. 3461-3465 [doi]

Anti-spoofing system: an investigation of measures to detect synthetic and human speechAbhinav Misra, Shivesh Ranjan, Chunlei Zhang, John H. L. Hansen. 3466-3470 [doi]

A likelihood ratio-based forensic voice comparison in microphone vs. mobile mismatched conditions using Japanese /ai/Michael J. Carne. 3471-3475 [doi]

Are we using enough listeners? no! - an empirically-supported critique of interspeech 2014 TTS evaluationsMirjam Wester, Cassia Valentini-Botinhao, Gustav Eje Henter. 3476-3480 [doi]

How to compare TTS systems: a new subjective evaluation methodology focused on differencesJonathan Chevelu, Damien Lolive, Sébastien Le Maguer, David Guennec. 3481-3485 [doi]

Double-ended prediction of the naturalness ratings of the blizzard challenge 2008-2013Lukas Latacz, Werner Verhelst. 3486-3490 [doi]

Entropy-based sentence selection for speech synthesis using phonetic and prosodic contextsTakashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito. 3491-3495 [doi]

A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training dataTomoki Koriyama, Takao Kobayashi. 3496-3500 [doi]

Objective intelligibility assessment of text-to-speech systems through utterance verificationRaphael Ullmann, Ramya Rasipuram, Mathew Magimai-Doss, Hervé Bourlard. 3501-3505 [doi]

Continuous word representation using neural networks for proper name retrieval from diachronic documentsDominique Fohr, Irina Illina. 3506-3510 [doi]

Recurrent neural network language model adaptation for multi-genre broadcast speech recognitionX. Chen, T. Tan, Xunying Liu, Pierre Lanchantin, M. Wan, Mark J. F. Gales, Philip C. Woodland. 3511-3515 [doi]

Paragraph vector based topic model for language model adaptationWengong Jin, Tianxing He, Yanmin Qian, Kai Yu. 3516-3520 [doi]

Personalized speech recognizer with keyword-based personalized lexicon and language model using word vector representationsChing-feng Yeh, Yuan-ming Liou, Hung-yi Lee, Lin-Shan Lee. 3521-3525 [doi]

Discriminative data selection for lightly supervised training of acoustic model using closed caption textsSheng Li, Yuya Akita, Tatsuya Kawahara. 3526-3530 [doi]

Cross-lingual transfer learning during supervised training in low resource scenariosAmit Das, Mark Hasegawa-Johnson. 3531-3535 [doi]

Uncertainty propagation for noise robust speaker recognition: the case of NIST-SREDayana Ribas González, Emmanuel Vincent, José Ramón Calvo de Lara. 3536-3540 [doi]

Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced featuresYuuki Tachioka, Shinji Watanabe. 3541-3545 [doi]

Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputationRahim Saeidi, Paavo Alku. 3546-3550 [doi]

Autoencoder based multi-stream combination for noise robust speech recognitionSri Harish Reddy Mallidi, Tetsuji Ogawa, Karel Veselý, Phani S. Nidadavolu, Hynek Hermansky. 3551-3555 [doi]

Uncertainty decoding for DNN-HMM hybrid systems based on numerical samplingChristian Huemmer, Roland Maas, Andreas Schwarz, Ramón Fernandez Astudillo, Walter Kellermann. 3556-3560 [doi]

Uncertainty propagation through deep neural networksAhmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, Dorothea Kolossa. 3561-3565 [doi]

Handling derivative filterbank features in bounded-marginalization-based missing data automatic speech recognitionMarco Kühne. 3566-3570 [doi]

Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASRArun Narayanan, Ananya Misra, Kean K. Chin. 3571-3575 [doi]

Integration of DNN based speech enhancement and ASRRamón Fernandez Astudillo, Maria Joana Correia, Isabel Trancoso. 3576-3580 [doi]

A general artificial neural network extension for HTKChao Zhang, Philip C. Woodland. 3581-3585 [doi]

Audio augmentation for speech recognitionTom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur. 3586-3589 [doi]

A diversity-penalizing ensemble training method for deep learningXiaohui Zhang, Daniel Povey, Sanjeev Khudanpur. 3590-3594 [doi]

Deep neural network training emphasizing central framesGakuto Kurata, Daniel Willett. 3595-3599 [doi]

Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approachKai Chen, Zhi-Jie Yan, Qiang Huo. 3600-3604 [doi]

Structured output layer with auxiliary targets for context-dependent acoustic modellingPawel Swietojanski, Peter Bell 0001, Steve Renals. 3605-3609 [doi]

Complementary tasks for context-dependent deep neural network acoustic modelsPeter Bell 0001, Steve Renals. 3610-3614 [doi]

Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networksJie Li, Heng Zhang, Xinyuan Cai, Bo Xu. 3615-3619 [doi]

Improving deep neural networks based multi-accent Mandarin speech recognition using i-vectors and accent-specific top layerMingming Chen, Zhanlei Yang, Jizhong Liang, Yanpeng Li, Wenju Liu. 3620-3624 [doi]

Rapid adaptation for deep neural networks through multi-task learningZhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Ji Wu, Chin-Hui Lee. 3625-3629 [doi]

fMLLR based feature-space speaker adaptation of DNN acoustic modelsSree Hari Krishnan Parthasarathi, Björn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Strom, Sri Garimella. 3630-3634 [doi]

I-vector dependent feature space transformations for adaptive speech recognitionXiangang Li, Xihong Wu. 3635-3639 [doi]

Unsupervised domain discovery using latent dirichlet allocation for acoustic modelling in speech recognitionMortaza Doulaty, Oscar Saz, Thomas Hain. 3640-3644 [doi]

Training data selection for acoustic modeling via submodular optimization of joint kullback-leibler divergenceTaichi Asami, Ryo Masumura, Hirokazu Masataki, Manabu Okamoto, Sumitaka Sakauchi. 3645-3649 [doi]

Combination of NN and CRF models for joint detection of punctuation and disfluenciesEunah Cho, Kevin Kilgour, Jan Niehues, Alex Waibel. 3650-3654 [doi]

Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword searchTze Siong Lau, I-Fan Chen, Chin-Hui Lee. 3655-3659 [doi]

Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languagesHaipeng Wang, Anton Ragni, Mark J. F. Gales, Kate M. Knill, Philip C. Woodland, Chao Zhang. 3660-3664 [doi]

Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMsQuoc Truong Do, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura. 3665-3669 [doi]

Phonology-augmented statistical transliteration for low-resource languagesHoang Gia Ngo, Nancy F. Chen, Binh Minh Nguyen, Bin Ma, Haizhou Li. 3670-3674 [doi]

Evaluation of re-ranking by prioritizing highly ranked documents in spoken term detectionKazuki Oouchi, Ryota Kon'no, Takahiro Akyu, Kazuma Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh 0001. 3675-3679 [doi]

Distinctive feature based representation of speech for query-by-example spoken term detectionAbhijeet Saxena, B. Yegnanarayana. 3680-3684 [doi]

Combination of diverse subword units in spoken term detectionShi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh 0001. 3685-3689 [doi]

Sparse modeling of posterior exemplars for keyword detectionDhananjay Ram, Afsaneh Asaei, Pranay Dighe, Hervé Bourlard. 3690-3694 [doi]

Stress level detection using double-layer subband filterTin Lay Nwe, Qianli Xu, Cuntai Guan, Bin Ma. 3695-3699 [doi]

Prosodic characteristics of read speech before and after treadmill runningJürgen Trouvain, Khiet P. Truong. 3700-3704 [doi]

A database for analysis of speech under physical stress: detection of exercise intensity while running and talkingKhiet P. Truong, Arne Nieuwenhuys, Peter Beek, Vanessa Evers. 3705-3709 [doi]

Stressed out: what speech tells us about stressWill Paul, Cecilia Ovesdotter Alm, Reynold J. Bailey, Joe Geigel, Linwei Wang. 3710-3714 [doi]

Prediction of heart rate changes from speech features during interaction with a misbehaving dialog systemAndreas Tsiartas, Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti, Adrian Willoughby. 3715-3719 [doi]

Acoustic correlates for perceived effort levels in expressive speechMary Pietrowicz, Mark Hasegawa-Johnson, Karrie Karahalios. 3720-3724 [doi]

Pitch-based speech perturbation measures using a novel GCI detection algorithm: application to pathological voice classificationKhalid Daoudi, Ashwini Jaya Kumar. 3725-3728 [doi]

Speech-based assessment of PTSD in a military population using diverse feature classesDimitra Vergyri, Bruce Knoth, Elizabeth Shriberg, Vikramjit Mitra, Mitchell McLaren, Luciana Ferrer, Pablo Garcia, Charles Marmar. 3729-3733 [doi]

Cognitive impairment prediction in the elderly based on vocal biomarkersBea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt. 3734-3738 [doi]

Automatic age detection in normal and pathological voiceJorge Andrés Gómez García, L. Moro-Velázquez, Juan Ignacio Godino-Llorente, Germán Castellanos-Domínguez. 3739-3743 [doi]

Wrapping up: the story of the compare challenges, what we learned and where to goAnton Batliner. 4105 [doi]

External Links

Cite Key

Statistics

PDF

Researchr

INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015

Abstract

Table of Contents