CAT+: Investigating and Enhancing Audio-Visual Understanding in Large Language Models

Qilang Ye, Zitong Yu, Rui Shao 0001, Yawen Cui, Xiangui Kang, Xin Liu 0012, Philip Torr 0001, Xiaochun Cao. CAT+: Investigating and Enhancing Audio-Visual Understanding in Large Language Models. IEEE Trans. Pattern Anal. Mach. Intell., 47(10):8674-8690, October 2025. [doi]

Abstract

Abstract is missing.