| 1 | -- | 0 | Yunfei Shao, Xinxin Ma, Yong Ma, Weiqiang Zhang 0001. Deep semantic learning for acoustic scene classification |
| 2 | -- | 0 | Khomdet Phapatanaburi, Longbiao Wang, Meng Liu, Seiichi Nakagawa, Talit Jumphoo, Peerapong Uthansakul. Significance of relative phase features for shouted and normal speech classification |
| 3 | -- | 0 | Junya Koguchi, Masanori Morise. Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control |
| 5 | -- | 0 | Stijn Kindt, Jenthe Thienpondt, Luca Becker, Nilesh Madhu. Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios |
| 6 | -- | 0 | Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael W. Kimwele, Adane Letta Mamuye. Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder |
| 7 | -- | 0 | Lingyun Xie, Yuehong Wang, Yan Gao. Acoustical feature analysis and optimization for aesthetic recognition of Chinese traditional music |
| 8 | -- | 0 | Sivaramakrishna Yechuri, Sunny Dayal Vanambathina. Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement |
| 8 | -- | 0 | Sivaramakrishna Yechuri, Sunny Dayal Vanambathina. Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement |
| 9 | -- | 0 | Reemt Hinrichs, Kevin Gerkens, Alexander Lange, Jörn Ostermann. Blind extraction of guitar effects through blind system inversion and neural guitar effect modeling |
| 10 | -- | 0 | Priyanka Gupta, Hemant A. Patil, Rodrigo Capobianco Guido. Vulnerability issues in Automatic Speaker Verification (ASV) systems |
| 11 | -- | 0 | Huda Barakat, Oytun Türk, Cenk Demiroglu. Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources |
| 12 | -- | 0 | Marcos Lazaro Alvarez, Laura Arjona, Miguel Enrique Iglesias Martínez, Alfonso Bahillo. Automatic classification of the physical surface in sound uroflowmetry using machine learning methods |
| 13 | -- | 0 | Zining Liang, Wen Zhang 0002, Thushara D. Abhayapala. Sound field reconstruction using neural processes with dynamic kernels |
| 14 | -- | 0 | Serhat Hizlisoy, Recep Sinan Arslan, Emel Çolakoglu. Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning |
| 15 | -- | 0 | Javier Tejedor, Doroteo T. Toledano. Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge |
| 16 | -- | 0 | Shivam Saini, Isaac Engel, Jürgen Peissig. An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment |
| 17 | -- | 0 | Luca Comanducci, Fabio Antonacci, Augusto Sarti. Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks |
| 18 | -- | 0 | Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan. DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection |
| 19 | -- | 0 | Sandeep Reddy Kothinti, Mounya Elhilali. Multi-rate modulation encoding via unsupervised learning for audio event detection |
| 21 | -- | 0 | Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan. Correction: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection |
| 22 | -- | 0 | Usama Saqib, Mads Græsbøll Christensen, Jesper Rindom Jensen. Robust acoustic reflector localization using a modified EM algorithm |
| 23 | -- | 0 | Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin 0003. Exploring the power of pure attention mechanisms in blind room parameter estimation |
| 24 | -- | 0 | Tomasz Wojnar, Jaroslaw Hryszko, Adam Roman. Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models |
| 25 | -- | 0 | David Gimeno-Gómez, Carlos David Martínez-Hinarejos. Continuous lipreading based on acoustic temporal alignments |
| 26 | -- | 0 | Otto Mikkonen, Alec Wright, Vesa Välimäki. Sampling the user controls in neural modeling of audio devices |
| 27 | -- | 0 | Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann. Towards multidimensional attentive voice tracking - estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling |
| 28 | -- | 0 | Zhiyong Chen, Zhiqi Ai, Youxuan Ma, Xinnuo Li, Shugong Xu. Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis |
| 29 | -- | 0 | Yunpeng Liu, Xukui Yang, Dan Qu. Exploration of Whisper fine-tuning strategies for low-resource ASR |
| 30 | -- | 0 | Jeremiah Abimbola, Daniel Kostrzewa, Pawel Kasprowski. Music time signature detection using ResNet18 |
| 31 | -- | 0 | Marcin Lewandowski. Estimating the first and second derivatives of discrete audio data |
| 32 | -- | 0 | Adam Kujawski, Art J. R. Pelling, Ennes Sarradj. MIRACLE - a microphone array impulse response dataset for acoustic learning |
| 33 | -- | 0 | Shaik Sajiha, Kodali Radha, Dhulipalla Venkata Rao, Nammi Sneha, Gunnam Suryanarayana, Durga Prasad Bavirisetti. Automatic dysarthria detection and severity level assessment using CWT-layered CNN model |
| 34 | -- | 0 | Mengzhen Ma, Ying Hu, Liang He, Hao Huang. GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration |
| 35 | -- | 0 | Tahira Kanwal, Rabbia Mahum, AbdulMalik Al-Salman, Mohamed Sharaf 0001, Haseeb Hassan. Fake speech detection using VGGish with attention block |
| 36 | -- | 0 | Xin Feng, Yue Zhao, Wei Zong, Xiaona Xu. Adaptive multi-task learning for speech to text translation |
| 37 | -- | 0 | Yigang Liu, Yue Zhao, Xiaona Xu, Liang Xu, Xubei Zhang, Qiang Ji. Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition |
| 38 | -- | 0 | Samuel Poirot, Stefan Bilbao, Richard Kronland-Martinet. A simplified and controllable model of mode coupling for addressing nonlinear phenomena in sound synthesis processes |
| 39 | -- | 0 | Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji. The whole is greater than the sum of its parts: improving music source separation by bridging networks |
| 40 | -- | 0 | Daiki Mori, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka. Recognition of target domain Japanese speech using language model replacement |
| 41 | -- | 0 | Samuel A. Verburg, Filip Elvander, Toon van Waterschoot, Efren Fernandez-Grande. Optimal sensor placement for the spatial reconstruction of sound fields |
| 42 | -- | 0 | Marco Olivieri, Xenofon Karakonstantis, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti, Efren Fernandez-Grande. Physics-informed neural network for volumetric sound field reconstruction of speech signals |
| 43 | -- | 0 | Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari. Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach |
| 44 | -- | 0 | Zijin Li, Wenwu Wang 0001, Kejun Zhang, Mengyao Zhu. Guest editorial: AI for computational audition - sound and music processing |
| 45 | -- | 0 | Martin Jälmby, Filip Elvander, Toon van Waterschoot. Compression of room impulse responses for compact storage and fast low-latency convolution |
| 46 | -- | 0 | Yuma Kinoshita, Nobutaka Ono. End-to-end training of acoustic scene classification using distributed sound-to-light conversion devices: verification through simulation experiments |
| 47 | -- | 0 | Xiao Zeng, Shiyun Xu, Mingjiang Wang. A time-frequency fusion model for multi-channel speech enhancement |
| 48 | -- | 0 | Chaoyang Zhang, Yan-Hua. Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos |
| 49 | -- | 0 | Stefano Damiano, Luca Bondi, Andre Guntoro, Toon van Waterschoot. A framework for the acoustic simulation of passing vehicles using variable length delay lines |
| 50 | -- | 0 | Ayal Schwartz, Ofer Schwartz, Shlomo E. Chazan, Sharon Gannot. Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction |
| 51 | -- | 0 | Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini. Data-driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines |
| 52 | -- | 0 | Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Shoji Makino. DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations |
| 53 | -- | 0 | Pawel Antoniuk, Slawomir K. Zielinski, Hyunkook Lee. Ensemble width estimation in HRTF-convolved binaural music recordings using an auditory model and a gradient-boosted decision trees regressor |
| 54 | -- | 0 | Usama Irshad, Rabbia Mahum, Ismaila Ganiyu, Faisal Shafique Butt, Lotfi Hidri, Tamer G. Ali, Ahmed M. El-Sherbeeny. UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition |
| 55 | -- | 0 | Xuyi Zhuang, Yukun Qian, Mingjiang Wang. SVQ-MAE: an efficient speech pre-training framework with constrained computational resources |
| 56 | -- | 0 | Hanwen Bi, Thushara D. Abhayapala. Point neuron learning: a new physics-informed neural network architecture |
| 57 | -- | 0 | Changtao Li, Yi Wan, Feiran Yang 0001, Jun Yang 0004. Multi-scale Information Aggregation for Spoofing Detection |
| 58 | -- | 0 | Carlotta Anemüller, Oliver Thiergart, Emanuël A. P. Habets. Multi-channel neural audio decorrelation using generative adversarial networks |
| 59 | -- | 0 | Eric Grinstein, Elisa Tengan, Bilgesu Çakmak, Thomas Dietzen, Leonardo Nunes, Toon van Waterschoot, Mike Brookes, Patrick A. Naylor. Steered Response Power for Sound Source Localization: a tutorial review |
| 60 | -- | 0 | Behnam Faghih, Amin Shoari Nejad, Joseph Timoney. Modelling note's pitch and duration in trained professional singers |
| 61 | -- | 0 | Annika Briegleb, Walter Kellermann. Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics |
| 62 | -- | 0 | Frantisek Kynych, Petr Cerva, Jindrich Zdánský, Torbjørn Svendsen, Giampiero Salvi. A lightweight approach to real-time speaker diarization: from audio toward audio-visual data streams |
| 63 | -- | 0 | Ragini Sinha, Christian Rollwage, Simon Doclo. Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction |
| 64 | -- | 0 | Han Wang, Mingrui He, Mingjun Zhang, Changzhi Luo, Longting Xu. Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification |
| 65 | -- | 0 | Takao Kawamura, Yuma Kinoshita, Nobutaka Ono, Robin Scheibler. Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array |
| 66 | -- | 0 | Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai. Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance? |