mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang 0004, Jingren Zhou. mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 38728-38748, PMLR, 2023. [doi]

Abstract

Abstract is missing.