The following publications are possibly variants of this publication:
- ALSA: Adversarial Learning of Supervised Attentions for Visual Question AnsweringYun Liu, Xiaoming Zhang 0001, Zhiyun Zhao, Bo Zhang, Lei Cheng, Zhoujun Li 0001. tcyb, 52(6):4520-4533, 2022. [doi]
- Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question AnsweringZhou Yu, Jun Yu, Jianping Fan 0001, Dacheng Tao. iccv 2017: 1839-1848 [doi]
- Remote sensing visual question answering with a self-attention multi-modal encoderJoão Daniel Silva, João Magalhães, Devis Tuia, Bruno Martins 0001. gis 2022: 40-49 [doi]
- The multi-modal fusion in visual question answering: a review of attention mechanismsSiyu Lu, Mingzhe Liu, Lirong Yin, Zhengtong Yin, Xuan Liu, Wenfeng Zheng. peerj-cs, 9, 2023. [doi]
- Erasing-based Attention Learning for Visual Question AnsweringFei Liu, Jing Liu, Richang Hong, Hanqing Lu. mm 2019: 1175-1183 [doi]