| 1 | -- | 0 | Jakob Abeßer, Zhiwei Liang, Bernhard U. Seeber. Sound recurrence analysis for acoustic scene classification |
| 2 | -- | 0 | Arda Özdogru, Frantisek Rund, Karel Fliegel. Performance evaluation of perceptible impulsive noise detection methods based on auditory models |
| 3 | -- | 0 | Yan Li, Yapeng Wang, Lap-Man Hoi, Dingcheng Yang, Sio Kei Im. A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models |
| 4 | -- | 0 | Dong Li, Zhenfang Liu. A big data dynamic approach for adaptive music instruction with deep neural fuzzy logic control |
| 5 | -- | 0 | Reza Varzandeh, Simon Doclo, Volker Hohmann. Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks |
| 6 | -- | 0 | Hengbo Hu, Tong Niu, Zhenhua He. A speech recognition method with enhanced transformer decoder |
| 7 | -- | 0 | Nils Poschadel, Stephan Preihs, Jürgen Peissig. Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization |
| 8 | -- | 0 | Moxi Cao, Jiaxiang Zheng, Chongbin Zhang. AI-based Chinese-style music generation from video content: a study on cross-modal analysis and generation methods |
| 9 | -- | 0 | Pinyan Li, Lap-Man Hoi, Yapeng Wang, Xu Yang 0010, Sio Kei Im. Enhancing Speaker Recognition with CRET Model: a fusion of CONV2D, RESNET and ECAPA-TDNN |
| 10 | -- | 0 | Palash Jain, Anirban Bhowmick. Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India's linguistic landscape |
| 11 | -- | 0 | Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, Boaz Rafaely. Design and analysis of binaural signal matching with arbitrary microphone arrays and listener head rotations |
| 12 | -- | 0 | Jiawen Huang, Emmanouil Benetos. Singing to speech conversion with generative flow |
| 13 | -- | 0 | Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki. Optimizing tiny colorless feedback delay networks |
| 14 | -- | 0 | Mina Mounir, Giuliano Bernardi, Toon van Waterschoot. Robust and early howling detection based on a sparsity measure |
| 15 | -- | 0 | Michael Taenzer. Multi-pitch estimation with polyphony per instrument information for Western classical and electronic music |
| 16 | -- | 0 | Chongchong Yu, Xuening Wang, Zhaopeng Qian. Silent speech recognition using visual cascading fusion of tongue-lip movements based on pre-trained and fine-tuned model |
| 17 | -- | 0 | Lianyu Zhou, Liang Yin, Yukun Qian, Mingjiang Wang. MLAT: a multi-level attention transformer capturing multi-level information in compound word encoding for symbolic music generation |
| 18 | -- | 0 | Renana Opochinsky, Mordehay Moradi, Sharon Gannot. Single-microphone speaker separation and voice activity detection in noisy and reverberant environments |
| 19 | -- | 0 | Tong Niu, Yaqi Chen, Dan Qu, Hengbo Hu, ChengRan Liu. Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition |
| 20 | -- | 0 | Jackie Lin, Georg Götz, Sebastian J. Schlecht. Deep room impulse response completion |
| 21 | -- | 0 | Yinghan Cao, Shiyun Xu, Wenjie Zhang, Mingjiang Wang, Yun Lu. Hybrid lightweight temporal-frequency analysis network for multi-channel speech enhancement |
| 22 | -- | 0 | Bing Sun, Chenglong Liu, Shuguo Yang, Wenwu Wang, Yiduo Mei. ResCapsnet: a capsule network with CRAM and BiGRU for sound event detection |
| 23 | -- | 0 | Yanzhen Ren, Wuyang Liu, Chenyu Liu, Tingting Zhu. Group feature calibration for sound event detection |
| 24 | -- | 0 | Francisco David González Martínez, Juan De La Torre Cruz, Julio J. Carabias-Orti, Francisco J. Cañadas-Quesada, Alejandro Antonio Salvador-Navarro, José Ranilla, Lyam Lamrini-H. Laarbi. Polygraph and audio synchronization applied to apnea event analysis based on non-negative matrix factorization |
| 25 | -- | 0 | Yukun Qian, Xuyi Zhuang, Mingjiang Wang. Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning |
| 26 | -- | 0 | Mengshan Li. Design and implementation of piano audio automatic music transcription algorithm based on convolutional neural network |
| 27 | -- | 0 | Mateo Cámara, José-Luis Blanco, Joshua D. Reiss. Parameter optimisation for a physical model of the vocal system |
| 28 | -- | 0 | Domenico Stefani, Luca Turchet. Real-time playing technique recognition embedded in a smart acoustic guitar |
| 29 | -- | 0 | Haixin Zhao, Nilesh Madhu. Coded speech enhancement using auxiliary utterance-level information |
| 30 | -- | 0 | Riccardo Simionato, Stefano Fasciani. Comparative study of state-based neural networks for virtual analog audio effects modeling |
| 31 | -- | 0 | Abigail Wiafe, Sami Sieranoja, Abedin Bhuiyan, Pasi Fränti. Emotional response to music: the Emotify + dataset |
| 32 | -- | 0 | Mattes Ohlenbusch, Christian Rollwage, Simon Doclo. Speech-dependent data augmentation for own voice reconstruction with hearable microphones in noisy environments |
| 33 | -- | 0 | Clara Luzon-Alvarez, Maximo Cobos, Jesús López Ballester, Ana M. Torres-Aranda, Francesc J. Ferri. Acoustic virtual sensors for industrial process monitoring using non-negative matrix factorization |
| 34 | -- | 0 | Priyanka Muruganandham, Sangeetha Jayaraman, Ramesh Raman. Continuous speech recognition for Tamil language using a novel semantic verification integrated with the transformer model |
| 35 | -- | 0 | Andrea Gulli, Federico Fontana, Carlo Drioli, Daniele Salvati, Giovanni Ferrin. Enhancing drone audition with rotor-conditioned deep models |
| 36 | -- | 0 | Jiaxiang Zheng, Moxi Cao, Chongbin Zhang. Chinese instrument music source separation with frequency-attentive multi-band neural networks |
| 37 | -- | 0 | Nayereh Seyed Afiuny, Amir lakizadeh. ICRCycleGAN-VC: a robust one-to-one voice conversion method based on CycleGAN and inception-ResNet blocks |
| 38 | -- | 0 | Michele Rossi, Giovanni Iacca, Luca Turchet. Advancing guitar emotion recognition through audio data augmentation to enhance smart musical instruments |
| 39 | -- | 0 | Ignacio Martin-Salinas, Gema Piñero, Jose A. Belloch, Adrian Amor-Martin. Enhanced U-Net architectures for accurate room impulse response generation via differential-phase learning |
| 40 | -- | 0 | Jaehee Jung, Wooil Kim. Speaker embedding loss for end-to-end speaker diarization without external embedding networks |
| 41 | -- | 0 | Tirthankar Banerjee, V. Ramasubramanian 0001. Accent-robust speech recognition for English in low-resource settings using Manifold Mixup |
| 42 | -- | 0 | Wen-Hsing Lai, Wei-Lun Chen, Siou-Lin Wang. An audio generation model based on empirical mode decomposition and generative adversarial networks for enhancing voice quality and diversity |