MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Haozhe Zhao, Zefan Cai, Shuzheng Si, Xiaojian Ma, Kaikai An, Liang Chen 0024, Zixuan Liu, Sheng Wang, Wenjuan Han, Baobao Chang. MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [doi]

Abstract

Abstract is missing.