COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

Haoyu Lu, Nanyi Fei, Yuqi Huo, Yizhao Gao, Zhiwu Lu 0001, Ji-Rong Wen. COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 15671-15680, IEEE, 2022. [doi]

Abstract

Abstract is missing.