The following publications are possibly variants of this publication:
- Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge MemoryZiniu Hu, Ahmet Iscen, Chen Sun 0002, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi. cvpr 2023: 23369-23379 [doi]
- Learning Semantic Alignment with Global Modality Reconstruction for Video-Language Pre-training towards RetrievalMingchao Li, Xiaoming Shi, Haitao Leng, Wei Zhou, Hai-Tao Zheng 0002, Kuncai Zhang. AAAI 2023: 1377-1385 [doi]
- Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity RegulationChaoya Jiang, Wei Ye 0004, Haiyang Xu, Songfang Huang, Fei Huang 0004, Shikun Zhang. acl 2023: 14660-14679 [doi]
- VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMixTeng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, ChengGuo Yin, Ping Luo. icml 2022: 22680-22690 [doi]