VILA: On Pre-training for Visual Language Models

Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov 0001, Mohammad Shoeybi, Song Han. VILA: On Pre-training for Visual Language Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 26679-26689, IEEE, 2024. [doi]

Authors

Ji Lin

This author has not been identified. Look up 'Ji Lin' in Google

Hongxu Yin

This author has not been identified. Look up 'Hongxu Yin' in Google

Wei Ping

This author has not been identified. Look up 'Wei Ping' in Google

Pavlo Molchanov 0001

This author has not been identified. Look up 'Pavlo Molchanov 0001' in Google

Mohammad Shoeybi

This author has not been identified. Look up 'Mohammad Shoeybi' in Google

Song Han

This author has not been identified. Look up 'Song Han' in Google