Journal: EURASIP J. Audio, Speech and Music Processing

Volume 2024, Issue 1

1 -- 0Yunfei Shao, Xinxin Ma, Yong Ma, Weiqiang Zhang 0001. Deep semantic learning for acoustic scene classification
2 -- 0Khomdet Phapatanaburi, Longbiao Wang, Meng Liu, Seiichi Nakagawa, Talit Jumphoo, Peerapong Uthansakul. Significance of relative phase features for shouted and normal speech classification
3 -- 0Junya Koguchi, Masanori Morise. Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control
5 -- 0Stijn Kindt, Jenthe Thienpondt, Luca Becker, Nilesh Madhu. Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios
6 -- 0Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael W. Kimwele, Adane Letta Mamuye. Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder
7 -- 0Lingyun Xie, Yuehong Wang, Yan Gao. Acoustical feature analysis and optimization for aesthetic recognition of Chinese traditional music
8 -- 0Sivaramakrishna Yechuri, Sunny Dayal Vanambathina. Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement
8 -- 0Sivaramakrishna Yechuri, Sunny Dayal Vanambathina. Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement
9 -- 0Reemt Hinrichs, Kevin Gerkens, Alexander Lange, Jörn Ostermann. Blind extraction of guitar effects through blind system inversion and neural guitar effect modeling
10 -- 0Priyanka Gupta, Hemant A. Patil, Rodrigo Capobianco Guido. Vulnerability issues in Automatic Speaker Verification (ASV) systems
11 -- 0Huda Barakat, Oytun Türk, Cenk Demiroglu. Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
12 -- 0Marcos Lazaro Alvarez, Laura Arjona, Miguel Enrique Iglesias Martínez, Alfonso Bahillo. Automatic classification of the physical surface in sound uroflowmetry using machine learning methods
13 -- 0Zining Liang, Wen Zhang 0002, Thushara D. Abhayapala. Sound field reconstruction using neural processes with dynamic kernels
14 -- 0Serhat Hizlisoy, Recep Sinan Arslan, Emel Çolakoglu. Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning
15 -- 0Javier Tejedor, Doroteo T. Toledano. Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge
16 -- 0Shivam Saini, Isaac Engel, Jürgen Peissig. An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment
17 -- 0Luca Comanducci, Fabio Antonacci, Augusto Sarti. Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks
18 -- 0Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan. DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection
19 -- 0Sandeep Reddy Kothinti, Mounya Elhilali. Multi-rate modulation encoding via unsupervised learning for audio event detection
21 -- 0Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan. Correction: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
22 -- 0Usama Saqib, Mads Græsbøll Christensen, Jesper Rindom Jensen. Robust acoustic reflector localization using a modified EM algorithm
23 -- 0Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin 0003. Exploring the power of pure attention mechanisms in blind room parameter estimation
24 -- 0Tomasz Wojnar, Jaroslaw Hryszko, Adam Roman. Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models
25 -- 0David Gimeno-Gómez, Carlos David Martínez-Hinarejos. Continuous lipreading based on acoustic temporal alignments
26 -- 0Otto Mikkonen, Alec Wright, Vesa Välimäki. Sampling the user controls in neural modeling of audio devices
27 -- 0Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann. Towards multidimensional attentive voice tracking - estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling
28 -- 0Zhiyong Chen, Zhiqi Ai, Youxuan Ma, Xinnuo Li, Shugong Xu. Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis
29 -- 0Yunpeng Liu, Xukui Yang, Dan Qu. Exploration of Whisper fine-tuning strategies for low-resource ASR
30 -- 0Jeremiah Abimbola, Daniel Kostrzewa, Pawel Kasprowski. Music time signature detection using ResNet18
31 -- 0Marcin Lewandowski. Estimating the first and second derivatives of discrete audio data
32 -- 0Adam Kujawski, Art J. R. Pelling, Ennes Sarradj. MIRACLE - a microphone array impulse response dataset for acoustic learning
33 -- 0Shaik Sajiha, Kodali Radha, Dhulipalla Venkata Rao, Nammi Sneha, Gunnam Suryanarayana, Durga Prasad Bavirisetti. Automatic dysarthria detection and severity level assessment using CWT-layered CNN model
34 -- 0Mengzhen Ma, Ying Hu, Liang He, Hao Huang. GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration
35 -- 0Tahira Kanwal, Rabbia Mahum, AbdulMalik Al-Salman, Mohamed Sharaf 0001, Haseeb Hassan. Fake speech detection using VGGish with attention block
36 -- 0Xin Feng, Yue Zhao, Wei Zong, Xiaona Xu. Adaptive multi-task learning for speech to text translation
37 -- 0Yigang Liu, Yue Zhao, Xiaona Xu, Liang Xu, Xubei Zhang, Qiang Ji. Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition
38 -- 0Samuel Poirot, Stefan Bilbao, Richard Kronland-Martinet. A simplified and controllable model of mode coupling for addressing nonlinear phenomena in sound synthesis processes
39 -- 0Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji. The whole is greater than the sum of its parts: improving music source separation by bridging networks
40 -- 0Daiki Mori, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka. Recognition of target domain Japanese speech using language model replacement
41 -- 0Samuel A. Verburg, Filip Elvander, Toon van Waterschoot, Efren Fernandez-Grande. Optimal sensor placement for the spatial reconstruction of sound fields
42 -- 0Marco Olivieri, Xenofon Karakonstantis, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti, Efren Fernandez-Grande. Physics-informed neural network for volumetric sound field reconstruction of speech signals
43 -- 0Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari. Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach
44 -- 0Zijin Li, Wenwu Wang 0001, Kejun Zhang, Mengyao Zhu. Guest editorial: AI for computational audition - sound and music processing
45 -- 0Martin Jälmby, Filip Elvander, Toon van Waterschoot. Compression of room impulse responses for compact storage and fast low-latency convolution
46 -- 0Yuma Kinoshita, Nobutaka Ono. End-to-end training of acoustic scene classification using distributed sound-to-light conversion devices: verification through simulation experiments
47 -- 0Xiao Zeng, Shiyun Xu, Mingjiang Wang. A time-frequency fusion model for multi-channel speech enhancement
48 -- 0Chaoyang Zhang, Yan-Hua. Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos
49 -- 0Stefano Damiano, Luca Bondi, Andre Guntoro, Toon van Waterschoot. A framework for the acoustic simulation of passing vehicles using variable length delay lines
50 -- 0Ayal Schwartz, Ofer Schwartz, Shlomo E. Chazan, Sharon Gannot. Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction
51 -- 0Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini. Data-driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines
52 -- 0Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Shoji Makino. DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations
53 -- 0Pawel Antoniuk, Slawomir K. Zielinski, Hyunkook Lee. Ensemble width estimation in HRTF-convolved binaural music recordings using an auditory model and a gradient-boosted decision trees regressor
54 -- 0Usama Irshad, Rabbia Mahum, Ismaila Ganiyu, Faisal Shafique Butt, Lotfi Hidri, Tamer G. Ali, Ahmed M. El-Sherbeeny. UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition
55 -- 0Xuyi Zhuang, Yukun Qian, Mingjiang Wang. SVQ-MAE: an efficient speech pre-training framework with constrained computational resources
56 -- 0Hanwen Bi, Thushara D. Abhayapala. Point neuron learning: a new physics-informed neural network architecture
57 -- 0Changtao Li, Yi Wan, Feiran Yang 0001, Jun Yang 0004. Multi-scale Information Aggregation for Spoofing Detection
58 -- 0Carlotta Anemüller, Oliver Thiergart, Emanuël A. P. Habets. Multi-channel neural audio decorrelation using generative adversarial networks
59 -- 0Eric Grinstein, Elisa Tengan, Bilgesu Çakmak, Thomas Dietzen, Leonardo Nunes, Toon van Waterschoot, Mike Brookes, Patrick A. Naylor. Steered Response Power for Sound Source Localization: a tutorial review
60 -- 0Behnam Faghih, Amin Shoari Nejad, Joseph Timoney. Modelling note's pitch and duration in trained professional singers
61 -- 0Annika Briegleb, Walter Kellermann. Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics
62 -- 0Frantisek Kynych, Petr Cerva, Jindrich Zdánský, Torbjørn Svendsen, Giampiero Salvi. A lightweight approach to real-time speaker diarization: from audio toward audio-visual data streams
63 -- 0Ragini Sinha, Christian Rollwage, Simon Doclo. Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction
64 -- 0Han Wang, Mingrui He, Mingjun Zhang, Changzhi Luo, Longting Xu. Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification
65 -- 0Takao Kawamura, Yuma Kinoshita, Nobutaka Ono, Robin Scheibler. Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
66 -- 0Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai. Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?