Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models

Mong Yuan Sim, Wei Emma Zhang, Xiang Dai, Biaoyan Fang. Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025. pages 24452-24470, Association for Computational Linguistics, 2025. [doi]

Bibliographies