Learning a Contextualized Multimodal Embedding for Zero-shot Cooking Video Caption Generation

Lin Wang, Hongyi Zhang, Xingfu Wang, Yan Xiong. Learning a Contextualized Multimodal Embedding for Zero-shot Cooking Video Caption Generation. In Wen-Huang Cheng, Wei-Ta Chu, Min-Chun Hu 0001, Jiaying Liu 0001, Munchurl Kim, Wei Zhang 0031, editors, ACM Multimedia Asia 2023, MMAsia 2023, Tainan, Taiwan, December 6-8, 2023. ACM, 2023. [doi]

Abstract

Abstract is missing.