VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

Soyeon Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon. VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks. In Donia Scott, NĂºria Bel, Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020. pages 3107-3117, International Committee on Computational Linguistics, 2020. [doi]