Audio-centric Video Understanding Benchmark without Text Shortcut

Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li 0119, Zejun Ma 0001, Chao Zhang 0031. Audio-centric Video Understanding Benchmark without Text Shortcut. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 6569-6587, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.