Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning

Jing Wang, Jinhui Tang, Jiebo Luo. Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning. In Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua 0001, Guo-Jun Qi, Elisa Ricci 0001, Zhengyou Zhang, Roger Zimmermann, editors, MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. pages 4337-4345, ACM, 2020. [doi]

Abstract

Abstract is missing.