Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, JinLan Fu, Xiaoyu Shen. Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pages 4156-4166, Computer Vision Foundation / IEEE, 2025. [doi]

Authors

Junyan Lin

This author has not been identified. Look up 'Junyan Lin' in Google

Haoran Chen

This author has not been identified. Look up 'Haoran Chen' in Google

Yue Fan

This author has not been identified. Look up 'Yue Fan' in Google

Yingqi Fan

This author has not been identified. Look up 'Yingqi Fan' in Google

Xin Jin

This author has not been identified. Look up 'Xin Jin' in Google

Hui Su

This author has not been identified. Look up 'Hui Su' in Google

JinLan Fu

This author has not been identified. Look up 'JinLan Fu' in Google

Xiaoyu Shen

This author has not been identified. Look up 'Xiaoyu Shen' in Google