ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang 0001. ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 5174-5183, IEEE, 2022. [doi]

Authors

Mengjun Cheng

This author has not been identified. Look up 'Mengjun Cheng' in Google

Yipeng Sun

This author has not been identified. Look up 'Yipeng Sun' in Google

Longchao Wang

This author has not been identified. Look up 'Longchao Wang' in Google

Xiongwei Zhu

This author has not been identified. Look up 'Xiongwei Zhu' in Google

Kun Yao

This author has not been identified. Look up 'Kun Yao' in Google

Jie Chen

This author has not been identified. Look up 'Jie Chen' in Google

Guoli Song

This author has not been identified. Look up 'Guoli Song' in Google

Junyu Han

This author has not been identified. Look up 'Junyu Han' in Google

Jingtuo Liu

This author has not been identified. Look up 'Jingtuo Liu' in Google

Errui Ding

This author has not been identified. Look up 'Errui Ding' in Google

Jingdong Wang 0001

This author has not been identified. Look up 'Jingdong Wang 0001' in Google