ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang 0001. ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 5174-5183, IEEE, 2022. [doi]

Abstract

Abstract is missing.