Journal: IEEE Transactions on Audio, Speech & Language Processing

Volume 25, Issue 9

1741 -- 1750Jakob Abeßer, Gerald Schuller. Instrument-Centered Music Transcription of Solo Bass Guitar Recordings
1751 -- 1761Thomas Le Cornu, Ben Milner. Generating Intelligible Audio Speech From Visual Speech
1762 -- 1772Lemao Liu, Atsushi Fujita, Masao Utiyama, Andrew M. Finch, Eiichiro Sumita. Translation Quality Estimation Using Only Bilingual Corpora
1773 -- 1783Emad M. Grais, Gerard Roma, Andrew J. R. Simpson, Mark D. Plumbley. Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks
1784 -- 1798Giuliano Bernardi, Toon van Waterschoot, Jan Wouters, Marc Moonen. Adaptive Feedback Cancellation Using a Partitioned-Block Frequency-Domain Kalman Filter Approach With PEM-Based Signal Prewhitening
1799 -- 1808Vinal Patel, Jordan Cheer, Nithin V. George. Modified Phase-Scheduled-Command FxLMS Algorithm for Active Sound Profiling
1809 -- 1820Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato de Mori. Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis
1821 -- 1835Nikolaos Stefanakis, Despoina Pavlidi, Athanasios Mouchtaris. Perpendicular Cross-Spectra Fusion for Sound Source Localization With a Planar Microphone Array
1836 -- 1845Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. Simultaneous Optimization of Multiple Tree-Based Factor Analyzed HMM for Speech Synthesis
1846 -- 1858Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon. Note Value Recognition for Piano Transcription Using Markov Random Fields

Volume 25, Issue 8

1566 -- 1578Francis Stevens, Damian T. Murphy, Lauri Savioja, Vesa Välimäki. Modeling Sparsely Reflecting Outdoor Acoustic Scenes Using the Waveguide Web
1579 -- 1591Ferdinando Olivieri, Filippo Maria Fazi, Simone Fontana, Dylan Menzies, Philip Arthur Nelson. Generation of Private Sound With a Circular Loudspeaker Array and the Weighted Pressure Matching Method
1592 -- 1605Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt. Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation
1606 -- 1617Paavo Alku, Rahim Saeidi. The Linear Predictive Modeling of Speech From Higher-Lag Autocorrelation Coefficients Applied to Noise-Robust Speaker Recognition
1618 -- 1632Cheng Pang, Hong Liu, Jie Zhang, Xiaofei Li. Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping
1633 -- 1643Somanath Pradhan, Vinal Patel, Dipen Somani, Nithin V. George. An Improved Proportionate Delayless Multiband-Structured Subband Adaptive Feedback Canceller for Digital Hearing Aids
1644 -- 1656Szymon Drgas, Tuomas Virtanen, Jörg Lücke, Antti Hurmalainen. Binary Non-Negative Matrix Deconvolution for Audio Dictionary Learning
1657 -- 1667Fatemeh Saki, Nasser Kehtarnavaz. Real-Time Unsupervised Classification of Environmental Noise Signals
1668 -- 1679Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen. Automatic Sentiment Detection in Naturalistic Audio
1680 -- 1693Ofer Schwartz, Sharon Gannot, Emanuel A. P. Habets. Cramér-Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction
1694 -- 1708Seyran Khademi, Richard C. Hendriks, W. Bastiaan Kleijn. Intelligibility Enhancement Based on Mutual Information
1709 -- 1717Yuta Hatano, Chuang Shi, Yoshinobu Kajikawa. Compensation for Nonlinear Distortion of the Frequency Modulation-Based Parametric Array Loudspeaker
1718 -- 1730Yu-Ren Chien, Daryush D. Mehta, Jón Guðnason, Matías Zanartu, Thomas F. Quatieri. Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer

Volume 25, Issue 7

1409 -- 1420Yu-An Chen, Ju-Chiang Wang, Yi-Hsuan Yang, Homer H. Chen. Component Tying for Mixture Model Adaptation in Personalization of Music Emotion Recognition
1421 -- 1435Hossein Zeinali, Hossein Sameti, Lukás Burget. HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification
1436 -- 1449Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang, Chen Wu, Li Zhao, Björn W. Schuller. A Two-Dimensional Framework of Multiple Kernel Subspace Learning for Recognizing Emotion in Speech
1450 -- 1461Mandy Korpusik, James R. Glass. Spoken Language Understanding for a Nutrition Dialogue System
1462 -- 1476Mahmoud Fakhry, Piergiorgio Svaizer, Maurizio Omologo. Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization
1477 -- 1491Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot. Semi-Supervised Source Localization on Multiple Manifolds With Distributed Microphones
1492 -- 1501Donald S. Williamson, DeLiang Wang. Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising
1502 -- 1511Liang Lu, Steve Renals. Small-Footprint Highway Deep Neural Networks for Speech Recognition
1512 -- 1525Ina Kodrasi, Simon Doclo. Signal-Dependent Penalty Functions for Robust Acoustic Multi-Channel Equalization
1526 -- 1534Jung Hee Kim, Jin Kim, Jae Hyeon Jeon, Sang-Won Nam. Delayless Individual-Weighting-Factors Sign Subband Adaptive Filter With Band-Dependent Variable Step-Sizes
1535 -- 1546Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee. A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks
1547 -- 1561Giacomo Vairetti, Enzo De Sena, Michael Catrysse, Søren Holdt Jensen, Marc Moonen, Toon van Waterschoot. A Scalable Algorithm for Physically Motivated and Sparse Approximation of Room Impulse Responses With Orthonormal Basis Functions

Volume 25, Issue 6

1169 -- 1171Gaël Richard, Tuomas Virtanen, Juan Pablo Bello, Nobutaka Ono, Hervé Glotin. Introduction to the Special Section on Sound Scene and Event Analysis
1172 -- 1182Hector A. Sanchez-Hevia, David Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera. Maximum Likelihood Decision Fusion for Weapon Classification in Wireless Acoustic Sensor Networks
1183 -- 1192Nithin Rao Koluguri, G. Nisha Meenakshi, Prasanta Kumar Ghosh. Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection
1193 -- 1206Dan Stowell, Emmanouil Benetos, Lisa F. Gill. On-Bird Sound Recordings: Automatic Acoustic Recognition of Activities and Contexts
1207 -- 1215Brandon T. Carroll, Bradley M. Whitaker, Wayne Daley, David V. Anderson. Outlier Learning via Augmented Frozen Dictionaries
1216 -- 1229Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard. Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification
1230 -- 1241Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley. Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
1242 -- 1252Rene Grzeszick, Axel Plinge, Gernot A. Fink. Bag-of-Features Methods for Acoustic Event Detection and Classification
1253 -- 1265Alain Rakotomamonjy. Supervised Representation Learning for Audio Scene Classification
1266 -- 1277Emmanouil Benetos, Grégoire Lafay, Mathieu Lagrange, Mark D. Plumbley. Polyphonic Sound Event Tracking Using Linear Dynamical Systems
1278 -- 1290Huy Phan, Lars Hertel, Marco Maaß, Philipp Koch, Radoslaw Mazur, Alfred Mertins. Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks
1291 -- 1303Emre Çakir, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
1304 -- 1314Jens Schröder, Niko Moritz, Jörn Anemüller, Stefan Goetze, Birger Kollmeier. Classifier Architectures for Acoustic Scenes and Events: Implications for DNNs, TDNNs, and Perceptual Features from DCASE 2016
1315 -- 1321Wenjun Yang, Sridhar Krishnan. Combining Temporal Features by Local Binary Pattern for Acoustic Scene Classification
1322 -- 1334David Dov, Ronen Talmon, Israel Cohen. Multimodal Kernel Method for Activity Detection of Sound Sources
1335 -- 1343Keisuke Imoto, Nobutaka Ono. Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis
1344 -- 1356Ivo Trowitzsch, Johannes Mohr, Youssef Kashef, Klaus Obermayer. Robust Detection of Environmental Sounds in Binaural Auditory Scenes
1357 -- 1370Abu Shafin Mohammad Mahdee Jameel, Shaikh Anowarul Fattah, Rajib Goswami, Wei-Ping Zhu, M. Omair Ahmad. Noise Robust Formant Frequency Estimation Method Based on Spectral Model of Repeated Autocorrelation of Speech
1371 -- 1383Na Li, Man-Wai Mak, Jen-Tzung Chien. DNN-Driven Mixture of PLDA for Robust Speaker Verification
1384 -- 1397Kai Wu, Vaninirappuputhenpurayil Gopalan Reju, Andy W. H. Khong, Shu Ting Goh. Swarm Intelligence Based Particle Filter for Alternating Talker Localization and Tracking Using Microphone Arrays

Volume 25, Issue 5

929 -- 939Manu Airaksinen, Tomas Bäckström, Paavo Alku. Quadratic Programming Approach to Glottal Inverse Filtering by Joint Norm-1 and Norm-2 Optimization
940 -- 951Ofer Schwartz, Sharon Gannot, Emanuel A. P. Habets. Multispeaker LCMV Beamformer and Postfilter for Source Separation and Noise Reduction
952 -- 964Dongmei Wang, Chengzhu Yu, John H. L. Hansen. Robust Harmonic Features for Classification-Based Pitch Estimation
965 -- 979Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Bo Li, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew W. Senior, Kean K. Chin, Ananya Misra, Chanwoo Kim. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition
980 -- 995Hanieh Khalilian, Ivan V. Bajic, Rodney G. Vaughan. A Simulation Study of a Three-Dimensional Sound Field Reproduction System for Immersive Communication
996 -- 1010Andreas Franck, Wenwu Wang, Filippo Maria Fazi. 1-Optimal Multiloudspeaker Panning and Its Relation to Vector Base Amplitude Panning
1011 -- 1022Songbin Li, Yizhen Jia, C. C. Jay Kuo. Steganalysis of QIM Steganography in Low-Bit-Rate Speech Signals
1023 -- 1034Naoyuki Kanda, Xugang Lu, Hisashi Kawai. Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models
1035 -- 1047Navid Shokouhi, John H. L. Hansen. Teager-Kaiser Energy Operators for Overlapped Speech Detection
1048 -- 1060Yi-Chin Huang, Chung-Hsien Wu, Yan-You Chen, Ming-Ge Shie, Jhing-Fa Wang. Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech
1061 -- 1074Jeongsoo Park, Jaeyoung Shin, Kyogu Lee. Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation
1075 -- 1084Xueliang Zhang, DeLiang Wang. Deep Learning Based Binaural Speech Separation in Reverberant Environments
1085 -- 1094Masood Delfarah, DeLiang Wang. Features for Masking-Based Monaural Speech Separation in Reverberant Conditions
1095 -- 1106Feiran Yang, Gerald Enzner, Jun Yang. Statistical Convergence Analysis for Optimal Control of DFT-Domain Adaptive Echo Canceler
1107 -- 1116Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga. Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis
1117 -- 1127Gergely Firtha, Peter Fiala, Frank-Schultz, Sascha Spors. Improved Referencing Schemes for 2.5D Wave Field Synthesis Driving Functions
1128 -- 1139Esteban Maestre, Gary P. Scavone, Julius O. Smith. Joint Modeling of Bridge Admittance and Body Radiativity for Efficient Synthesis of String Instrument Sound by Digital Waveguides
1140 -- 1153Gongping Huang, Jacob Benesty, Jingdong Chen. On the Design of Frequency-Invariant Beampatterns With Uniform Circular Microphone Arrays
1154 -- 1164Zdenek Prusa, Péter Balázs, Peter L. Søndergaard. A Noniterative Method for Reconstruction of Phase From STFT Magnitude

Volume 25, Issue 4

692 -- 730Sharon Gannot, Emmanuel Vincent, Shmulik Markovich Golan, Alexey Ozerov. A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
731 -- 744Dongwen Ying, Ruohua Zhou, Junfeng Li, Yonghong Yan 0002. Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization
745 -- 755Sean U. N. Wood, Jean Rouat, Stéphane Dupont, Gueorgui Pironkov. Blind Speech Separation and Enhancement With GCC-NMF
756 -- 767Constantin Spille, Birger Kollmeier, Bernd T. Meyer. Combining Binaural and Cortical Features for Robust Speech Recognition
768 -- 779Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Hitoshi Ohmuro. Informative Acoustic Feature Selection to Maximize Mutual Information for Collecting Target Sources
780 -- 793Takuya Higuchi, Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Marc Delcroix, Tomohiro Nakatani. Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR
794 -- 806Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama. Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices
807 -- 817Omid Ghahabi, Javier Hernando. Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition
818 -- 828Penny Karanasou, Chunyang Wu, Mark J. F. Gales, Philip C. Woodland. I-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models
829 -- 838G. Aneeja, B. Yegnanarayana. Extraction of Fundamental Frequency From Degraded Speech Using Temporal Envelopes at High SNR Frequencies
839 -- 851Seyyed Saeed Sarfjoo, Cenk Demiroglu, Simon King. Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data
852 -- 862Yung-Yue Chen, Jia-Hao Zhang. Background Noise Reduction Design for Dual Microphone Cellular Phones: Robust Approach
863 -- 870Liner Yang, Xinxiong Chen, Zhiyuan Liu, Maosong Sun. Improving Word Representations with Document Labels
871 -- 884Shiliang Zhang, Cong Liu, Hui Jiang 0001, Si Wei, Li-Rong Dai, Yu Hu. Nonrecurrent Neural Structure for Long-Term Dependence
885 -- 894Xuefeng Yang, Kezhi Mao. Task Independent Fine Tuning for Word Embeddings
895 -- 907Huawei Chen. Design of Robust Broadband Beamformers Using Worst-Case Performance Optimization: A Semidefinite Programming Approach
908 -- 919Sandro Cumani, Pietro Laface. Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition

Volume 25, Issue 3

457 -- 468Qi He, Feng Bao, Changchun Bao. Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement
469 -- 480Zhongqing Wang, Sophia Yat Mei Lee, Shoushan Li, Guodong Zhou. Emotion Analysis in Code-Switching Text With Joint Factor Graph Model
481 -- 492Ashwin Bellur, Mounya Elhilali. Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection
493 -- 504Zhiyuan Tang, Lantian Li, Dong Wang, Ravichander Vipperla. Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition
505 -- 518Bidisha Sharma, S. R. Mahadeva Prasanna. Sonority Measurement Using System, Source, and Suprasegmental Information
519 -- 530Hung-yi Lee, Bo-Hsiang Tseng, Tsung-Hsien Wen, Yu Tsao. Personalizing Recurrent-Neural-Network-Based Language Model by Social Network
531 -- 543Ji Ming, Danny Crookes. Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition
544 -- 556Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura. Preserving Word-Level Emphasis in Speech-to-Speech Translation
557 -- 571Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen, Meishan Zhang, Guohong Fu. Coupled POS Tagging on Heterogeneous Annotations
572 -- 587Clement S. J. Doire, Mike Brookes, Patrick A. Naylor, Christopher M. Hicks, Dave Betts, Mohammad A. Dmour, Søren Holdt Jensen. Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise
588 -- 597Aleksandr Sizov, Kong-Aik Lee, Tomi Kinnunen. Direct Optimization of the Detection Cost for I-Vector-Based Spoken Language Recognition
598 -- 610Imran A. Sheikh, Dominique Fohr, Irina Illina, Georges Linarès. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition
611 -- 623Mojtaba Farmani, Michael Syskind Pedersen, Zheng-Hua Tan, Jesper Jensen. Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications
624 -- 636Vikram C. M., S. R. Mahadeva Prasanna. Epoch Extraction From Telephone Quality Speech Using Single Pole Filter
637 -- 650Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi. Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation
651 -- 661Dani Cherkassky, Sharon Gannot. Blind Synchronization in Wireless Acoustic Sensor Networks
662 -- 673Laurent Girin, Thomas Hueber, Xavier Alameda-Pineda. Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping
674 -- 686Mohamad Hasan Bahari, Alexander Bertrand, Marc Moonen. Blind Sampling Rate Offset Estimation for Wireless Acoustic Sensor Networks Through Weighted Least-Squares Coherence Drift Estimation
687 -- 0Adam Kuklasinski, Simon Doclo, Søren Holdt Jensen, Jesper Rindom Jensen. Correction to "Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise"

Volume 25, Issue 2

226 -- 237Hanchi Chen, Thushara Dheemantha Abhayapala, Prasanga N. Samarasinghe, Wen Zhang 0002. Direct-to-Reverberant Energy Ratio Estimation Using a First-Order Microphone
238 -- 247Peter Bell 0001, Pawel Swietojanski, Steve Renals. Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models
248 -- 260Rui Zhao, Kezhi Mao. Topic-Aware Deep Compositional Models for Sentence Classification
261 -- 272Dalia El Badawy, Ngoc Q. K. Duong, Alexey Ozerov. On-the-Fly Audio Source Separation - A Novel User-Friendly Framework
273 -- 284Filip Elvander, Johan Sward, Andreas Jakobsson. Online Estimation of Multiple Harmonic Signals
285 -- 295Vincent Renkens, Hugo Van Hamme. Weakly Supervised Learning of Hidden Markov Models for Spoken Language Acquisition
296 -- 309Luca Remaggi, Philip J. B. Jackson, Philip Coleman, Wenwu Wang. Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods
310 -- 319Prasanga N. Samarasinghe, Thushara D. Abhayapala, Hanchi Chen. Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model
320 -- 332Shmulik Markovich Golan, Sharon Gannot, Walter Kellermann. Combined LCMV-TRINICON Beamforming for Separating Multiple Speech Sources in Noisy and Reverberant Environments
333 -- 343Shakeel Ahmed, Muhammad Tahir Akhtar. Gain Scheduling of Auxiliary Noise and Variable Step-Size for Online Acoustic Feedback Cancellation in Narrow-Band Active Noise Control Systems
344 -- 358Gabriel Sargent, Frédéric Bimbot, Emmanuel Vincent. Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints
359 -- 373Jordan Cheer, Stephen Daley. An Investigation of Delayless Subband Adaptive Filtering for Multi-Input Multi-Output Active Noise Control Applications
374 -- 383Sebastian J. Schlecht, Emanuel A. P. Habets. Feedback Delay Networks: Echo Density and Mixing Time
384 -- 396Johannes Abel, Magdalena Kaniewska, Cyril Guillaume, Wouter Tirry, Tim Fingscheidt. An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals
397 -- 408Robert Rehr, Timo Gerkmann. An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation
409 -- 419Emilio Granell, Carlos D. Martínez-Hinarejos. Multimodal Crowdsourcing for Transcribing Handwritten Documents
420 -- 434Yaping Ma, Yegui Xiao. A New Strategy for Online Secondary-Path Modeling of Narrowband Active Noise Control
435 -- 447Jose A. Belloch, Alberto González, Enrique S. Quintana-Ortí, Miguel Ferrer, Vesa Välimäki. GPU-Based Dynamic Wave Field Synthesis Using Fractional Delay Filters and Room Compensation

Volume 25, Issue 12

2254 -- 2256Tanja Schultz, Thomas Hueber, Dean J. Krusienski, J. S. Brumberg. Introduction to the Special Issue on Biosignal-Based Spoken Communication
2257 -- 2271Tanja Schultz, Michael Wand, Thomas Hueber, Dean J. Krusienski, Christian Herff, Jonathan S. Brumberg. Biosignal-Based Spoken Communication: A Survey
2272 -- 2280Christopher Dromey, Katherine M. Black. Effects of Laryngeal Activity on Articulation
2281 -- 2291Michal Borsky, Daryush D. Mehta, Jarrad H. Van Stan, Jón Guðnason. Modal and Nonmodal Voice Quality Classification Using Acoustic and Electroglottographic Features
2292 -- 2300Alborz Rezazadeh Sereshkeh, Robert Trott, Aurélien Bricout, Tom Chau. EEG Classification of Covert Speech Using Regularized Neural Networks
2301 -- 2312Reza Sahraeian, Dirk Van Compernolle. Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
2313 -- 2322Dorde T. Grozdic, Slobodan T. Jovicic. Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
2323 -- 2336Myung Jong Kim, Beiming Cao, Ted Mau, Jun Wang. Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network
2337 -- 2350Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda. Articulatory Controllable Speech Modification Based on Statistical Inversion and Production Mappings
2351 -- 2361Ingmar Steiner, Sébastien Le Maguer, Alexander Hewer. Synthesis of Tongue Motion and Acoustics From Text Using a Multimodal Articulatory Database
2362 -- 2374José A. González 0001, Lam Aun Cheah, Angel M. Gomez, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth. Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning
2375 -- 2385Matthias Janke, Lorenz Diener. EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals
2386 -- 2398Geoffrey S. Meltzner, James T. Heaton, Yunbin Deng, Gianluca De Luca, Serge H. Roy, Joshua C. Kline. Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy
2399 -- 2409Fei Chen, Lan Wang, Hui Chen, Gang Peng. Investigations on Mandarin Aspiratory Animations Using an Airflow Model
2410 -- 2423Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Michael L. Seltzer, Andreas Stolcke, Dong Yu, Geoffrey Zweig. Toward Human Parity in Conversational Speech Recognition
2424 -- 2432Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan. A Context-Aware Recurrent Encoder for Neural Machine Translation
2433 -- 2443Afsaneh Asaei, Milos Cernak, Hervé Bourlard. Perceptual Information Loss due to Impaired Speech Production
2444 -- 2453Ning Ma, Tobias May, Guy J. Brown. Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

Volume 25, Issue 11

2045 -- 2058Qinghua Huang, Lin Zhang, Yong Fang. Two-Stage Decoupled DOA Estimation Based on Real Spherical Harmonics for Spherical Arrays
2059 -- 2070Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda. Duration-Controlled LSTM for Polyphonic Sound Event Detection
2071 -- 2084Monisankha Pal, Goutam Saha. Spectral Mapping Using Prior Re-Estimation of i-Vectors and System Fusion for Voice Conversion
2085 -- 2097Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo. Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies
2098 -- 2111Hannah Muckenhirn, Pavel Korshunov, Mathew Magimai-Doss, Sébastien Marcel. Long-Term Spectral Statistics for Voice Presentation Attack Detection
2112 -- 2124Brian Hamilton, Stefan Bilbao. FDTD Methods for 3-D Room Acoustics Simulation With High-Order Accuracy in Space and Time
2125 -- 2137Pejman Mowlaee, Martin Blass, W. Bastiaan Kleijn. New Results in Modulation-Domain Single-Channel Speech Enhancement
2138 -- 2151Dylan Menzies, Filippo Maria Fazi. Decoding and Compression of Channel and Scene Objects for Spatial Audio
2152 -- 2161Eunwoo Song, Frank K. Soong, Hong-Goo Kang. Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems
2162 -- 2175Pulkit Sharma, Vinayak Abrol, Anil Kumar Sao. Deep-Sparse-Representation-Based Features for Speech Recognition
2176 -- 2187Iynkaran Natgunanathan, Yong Xiang, Guang Hua, Gleb Beliakov, John Yearwood. Patchwork-Based Multilayer Audio Watermarking
2188 -- 2198Chengzhu Yu, John H. L. Hansen. Active Learning Based Constrained Clustering For Speaker Diarization
2199 -- 2208Emil Solsbæk Ottosen, Monika Dörfler. A Phase Vocoder Based on Nonstationary Gabor Frames
2209 -- 2222Boaz Schwartz, Sharon Gannot, Emanuel A. P. Habets. Two Model-Based EM Algorithms for Blind Source Separation in Noisy Environments
2223 -- 2236Maja Taseska, Emanuel A. P. Habets. Nonstationary Noise PSD Matrix Estimation for Multichannel Blind Speech Extraction
2237 -- 2250Bruno Di Giorgi, Simon Dixon, Massimiliano Zanoni, Augusto Sarti. A Data-Driven Model of Tonal Chord Sequence Complexity
2251 -- 0Nikolaos Stefanakis, Despoina Pavlidi, Athanasios Mouchtaris. Corrections to "Perpendicular Cross-Spectra Fusion for Sound Source Localization With a Planar Microphone Array"

Volume 25, Issue 10

1863 -- 1876Xiaohai Tian, Siu Wa Lee, Zhizheng Wu, Eng Siong Chng, Haizhou Li. An Exemplar-Based Approach to Frequency Warping for Voice Conversion
1877 -- 1889Siying Wang, Sebastian Ewert, Simon Dixon. Identifying Missing and Extra Notes in Piano Recordings Using Score-Informed Dictionary Learning
1890 -- 1900Sandro Cumani, Pietro Laface. Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors
1901 -- 1913Morten Kolbaek, Dong Yu, Zheng-Hua Tan, Jesper Jensen 0001. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
1914 -- 1928Cheng-Tao Chung, Cheng-Yu Tsai, Chia-Hsiang Liu, Lin-Shan Lee. Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection
1929 -- 1941Niccolo Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor, Toon van Waterschoot. Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field
1942 -- 1955Yanmin Qian, Nanxin Chen, Heinrich Dinkel, Zhizheng Wu. Deep Feature Engineering for Noise Robust Spoofing Detection
1956 -- 1968Sina Hafezi, Alastair H. Moore, Patrick A. Naylor. Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain
1969 -- 1984Byeongho Jo, Jung-Woo Choi. Spherical Harmonic Smoothing for Localizing Coherent Sound Sources
1985 -- 1996Emma Jokinen, Ulpu Remes, Paavo Alku. Intelligibility Enhancement of Telephone Speech Using Gaussian Process Regression for Normal-to-Lombard Spectral Tilt Conversion
1997 -- 2012Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot. Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization
2013 -- 2023Marc Arnela, Oriol Guasch. Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts
2024 -- 2035Deepak Baby, Hugo Van Hamme. Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint

Volume 25, Issue 1

1 -- 14Jin Chu Wu, Alvin F. Martin, Craig S. Greenberg, Raghu N. Kacker. The Impact of Data Dependence on Speaker Recognition Evaluation
15 -- 30Hélène Papadopoulos, George Tzanetakis. Models for Music Analysis From a Markov Logic Networks Perspective
31 -- 45Ahmed Al-Tmeme, Wai Lok Woo, Satnam Singh Dlay, Bin Gao. Underdetermined Convolutive Source Separation Using GEM-MU With Variational Approximated Optimum Model Order NMF2D
46 -- 59Mark A. Hasegawa-Johnson, Preethi Jyothi, Daniel McCloy, Majid Mirbagheri, Giovanni M. di Liberto, Amit Das, Bradley Ekin, Chunxi Liu, Vimal Manohar, Hao Tang, Edmund C. Lalor, Nancy F. Chen, Paul Hager, Tyler Kekona, Rose Sloan, Adrian K. C. Lee. ASR for Under-Resourced Languages From Probabilistic Transcription
60 -- 71Zhen Huang, Sabato Marco Siniscalchi, Chin-Hui Lee. Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition
72 -- 85Simon Durand, Juan Pablo Bello, Bertrand David, Gaël Richard. Robust Downbeat Tracking Using an Ensemble of Convolutional Networks
86 -- 97Zhehuai Chen, Yimeng Zhuang, Yanmin Qian, Kai Yu. Phone Synchronous Speech Recognition With CTC Lattices
98 -- 107Bo Wu, Kehuang Li, Minglei Yang, Chin-Hui Lee. A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks
108 -- 119Hongjie Chen, Lei Xie, Cheung Chi Leung, Xiaoming Lu, Bin Ma, Haizhou Li. Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News
120 -- 132Hua Xing, John H. L. Hansen. Single Sideband Frequency Offset Estimation and Correction for Quality Enhancement and Speaker Recognition
133 -- 148Andreas I. Koutrouvelis, Richard Christian Hendriks, Richard Heusdens, Jesper Jensen. Relaxed Binaural LCMV Beamforming
149 -- 163Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems
168 -- 177Jakob Abeßer, Klaus Frieler, Estefanía Cano, Martin Pfleiderer, Wolf-Georg Zaddach. Score-Informed Analysis of Tuning, Intonation, Pitch Modulation, and Dynamics in Jazz Solos
178 -- 192Alastair H. Moore, Christine Evers, Patrick A. Naylor. Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors
193 -- 207Kun Li, Xiaojun Qian, Helen M. Meng. Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
208 -- 221Yoonchang Han, Jaehun Kim, Kyogu Lee. Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music