Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models

Rui Hu, Delai Qiu, Shuyu Wei, Jiaming Zhang, Yining Wang, Shengping Liu, Jitao Sang. Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025. pages 7452-7463, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.