VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, ChengGuo Yin, Ping Luo. VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu 0001, Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Volume 162 of Proceedings of Machine Learning Research, pages 22680-22690, PMLR, 2022. [doi]

Abstract

Abstract is missing.