Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning

Changin Choi, Sungjun Lim, Wonjong Rhee. Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning. In Odette Scharenborg, Catharine Oertel, Khiet Truong, editors, 26th Annual Conference of the International Speech Communication Association, Interspeech 2025, Rotterdam, The Netherlands, 17-21 August 2025. ISCA, 2025. [doi]

Abstract

Abstract is missing.