MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling

Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao 0005, Zehuan Yuan, Jing Liu. MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023. pages 1528-1538, ACM, 2023. [doi]

Abstract

Abstract is missing.