The following publications are possibly variants of this publication:
- Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-trainingZejun Li, Zhihao Fan, Jingjing Chen, Qi Zhang, Xuanjing Huang 0001, Zhongyu Wei. acl 2023: 5939-5958 [doi]
- SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language ProcessingJunyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei. acl 2022: 5723-5738 [doi]
- Towards Unifying Medical Vision-and-Language Pre-training via Soft PromptsZhihong Chen, Shizhe Diao, Benyou Wang, Guanbin Li, Xiang Wan. iccv 2023: 23346-23356 [doi]
- VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMixTeng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, ChengGuo Yin, Ping Luo. icml 2022: 22680-22690 [doi]
- Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity RegulationChaoya Jiang, Wei Ye 0004, Haiyang Xu, Songfang Huang, Fei Huang 0004, Shikun Zhang. acl 2023: 14660-14679 [doi]
- UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-TrainingMingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng 0001, Linjie Li, Zhou Yu, Jingjing Liu. cvpr 2021: 4155-4165 [doi]
- Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-trainingZhihong Chen, Yuhao Du, Jinpeng Hu, Yang Liu, Guanbin Li, Xiang Wan, Tsung-Hui Chang. miccai 2022: 679-689 [doi]
- DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-trainingLuyang Huang, Guocheng Niu, Jiachen Liu, Xinyan Xiao, Hua Wu 0003. acl 2022: 2552-2566 [doi]