Do Video Language Models really understand the video contexts?

Jeongwan Shin, Jinhyeong Lim, Hyeyoung Park. Do Video Language Models really understand the video contexts?. In Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein, editors, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2025 - Volume 4: Student Research Workshop, Albuquerque, NM, USA, April 30 - May 1, 2025. pages 408-417, Association for Computational Linguistics, 2025. [doi]

@inproceedings{ShinLP25,
  title = {Do Video Language Models really understand the video contexts?},
  author = {Jeongwan Shin and Jinhyeong Lim and Hyeyoung Park},
  year = {2025},
  url = {https://aclanthology.org/2025.naacl-srw.40/},
  researchr = {https://researchr.org/publication/ShinLP25},
  cites = {0},
  citedby = {0},
  pages = {408-417},
  booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2025 - Volume 4: Student Research Workshop, Albuquerque, NM, USA, April 30 - May 1, 2025},
  editor = {Abteen Ebrahimi and Samar Haider and Emmy Liu and Sammar Haider and Maria Leonor Pacheco and Shira Wein},
  publisher = {Association for Computational Linguistics},
  isbn = {979-8-89176-192-6},
}