Multimodal Language Models See Better When They Look Shallower

Haoran Chen, Junyan Lin, Xinghao Chen 0009, Yue Fan, Jianfeng Dong, Xin Jin, Hui Su, JinLan Fu, Xiaoyu Shen 0001. Multimodal Language Models See Better When They Look Shallower. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 6677-6695, Association for Computational Linguistics, 2025. [doi]

Authors

Haoran Chen

This author has not been identified. Look up 'Haoran Chen' in Google

Junyan Lin

This author has not been identified. Look up 'Junyan Lin' in Google

Xinghao Chen 0009

This author has not been identified. Look up 'Xinghao Chen 0009' in Google

Yue Fan

This author has not been identified. Look up 'Yue Fan' in Google

Jianfeng Dong

This author has not been identified. Look up 'Jianfeng Dong' in Google

Xin Jin

This author has not been identified. Look up 'Xin Jin' in Google

Hui Su

This author has not been identified. Look up 'Hui Su' in Google

JinLan Fu

This author has not been identified. Look up 'JinLan Fu' in Google

Xiaoyu Shen 0001

This author has not been identified. Look up 'Xiaoyu Shen 0001' in Google