Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models

Mong Yuan Sim, Wei Emma Zhang, Xiang Dai, Biaoyan Fang. Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025. pages 24452-24470, Association for Computational Linguistics, 2025. [doi]

Authors

Mong Yuan Sim

This author has not been identified. Look up 'Mong Yuan Sim' in Google

Wei Emma Zhang

This author has not been identified. Look up 'Wei Emma Zhang' in Google

Xiang Dai

This author has not been identified. Look up 'Xiang Dai' in Google

Biaoyan Fang

This author has not been identified. Look up 'Biaoyan Fang' in Google