Do Video Language Models really understand the video contexts?

Jeongwan Shin, Jinhyeong Lim, Hyeyoung Park. Do Video Language Models really understand the video contexts?. In Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein, editors, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2025 - Volume 4: Student Research Workshop, Albuquerque, NM, USA, April 30 - May 1, 2025. pages 408-417, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.