The multi-modal fusion in visual question answering: a review of attention mechanisms

Siyu Lu, Mingzhe Liu, Lirong Yin, Zhengtong Yin, Xuan Liu, Wenfeng Zheng. The multi-modal fusion in visual question answering: a review of attention mechanisms. PeerJ Computer Science, 9, 2023. [doi]

Abstract

Abstract is missing.