The following publications are possibly variants of this publication:
- AGREE: Aligning Cross-Modal Entities for Image-Text Retrieval Upon Vision-Language Pre-trained ModelsXiaodan Wang, Lei Li, Zhixu Li, Xuwu Wang, Xiangru Zhu, Chengyu Wang 0001, Jun Huang, Yanghua Xiao. wsdm 2023: 456-464 [doi]
- Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-trainingZejun Li, Zhihao Fan, Jingjing Chen, Qi Zhang, Xuanjing Huang 0001, Zhongyu Wei. acl 2023: 5939-5958 [doi]
- UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-TrainingMingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng 0001, Linjie Li, Zhou Yu, Jingjing Liu. cvpr 2021: 4155-4165 [doi]