Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Mohammad Shoeybi, Ming-Yu Liu 0001, Yuke Zhu, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar. Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 11844-11857, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.