A Text-Image Pair Is Not Enough: Language-Vision Relation Inference with Auxiliary Modality Translation

Wenjie Lu, Dong Zhang, Shoushan Li, Guodong Zhou. A Text-Image Pair Is Not Enough: Language-Vision Relation Inference with Auxiliary Modality Translation. In Fei Liu, Nan Duan, Qingting Xu, Yu Hong, editors, Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Foshan, China, October 12-15, 2023, Proceedings, Part II. Volume 14303 of Lecture Notes in Computer Science, pages 457-468, Springer, 2023. [doi]

Abstract

Abstract is missing.