Enhancing Vision-Language Pre-Training with Rich Supervisions

Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Zhuowen Tu, Vijay Mahadevan, Stefano Soatto. Enhancing Vision-Language Pre-Training with Rich Supervisions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 13480-13491, IEEE, 2024. [doi]

Abstract

Abstract is missing.