MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling

Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao 0005, Zehuan Yuan, Jing Liu. MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023. pages 1528-1538, ACM, 2023. [doi]

Authors

Zijia Zhao

This author has not been identified. Look up 'Zijia Zhao' in Google

Longteng Guo

This author has not been identified. Look up 'Longteng Guo' in Google

Xingjian He

This author has not been identified. Look up 'Xingjian He' in Google

Shuai Shao 0005

This author has not been identified. Look up 'Shuai Shao 0005' in Google

Zehuan Yuan

This author has not been identified. Look up 'Zehuan Yuan' in Google

Jing Liu

This author has not been identified. Look up 'Jing Liu' in Google