Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov 0001, Mohammad Shoeybi, Song Han. VILA: On Pre-training for Visual Language Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 26679-26689, IEEE, 2024. [doi]
Abstract is missing.