DistillCaps: Enhancing Audio-Language Alignment in Captioning via Retrieval-Augmented Knowledge Distillation

Thinh Pham, Nghiem Tuong Diep, Lizi Liao, Binh T. Nguyen 0001. DistillCaps: Enhancing Audio-Language Alignment in Captioning via Retrieval-Augmented Knowledge Distillation. In Meeyoung Cha, Chanyoung Park 0001, Noseong Park, Carl Yang 0001, Senjuti Basu Roy, Jessie Li, Jaap Kamps, Kijung Shin, Bryan Hooi, Lifang He 0001, editors, Proceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM 2025, Seoul, Republic of Korea, November 10-14, 2025. pages 2346-2356, ACM, 2025. [doi]

Abstract

Abstract is missing.