VinVL: Revisiting Visual Representations in Vision-Language Models

Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang 0001, Lijuan Wang, Yejin Choi, Jianfeng Gao. VinVL: Revisiting Visual Representations in Vision-Language Models. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. pages 5579-5588, Computer Vision Foundation / IEEE, 2021. [doi]

Abstract

Abstract is missing.