The following publications are possibly variants of this publication:
- Temporally Multi-Modal Semantic Reasoning with Spatial Language Constraints for Video Question AnsweringMingyang Liu, Ruomei Wang 0001, Fan Zhou 0001, Ge Lin. symmetry, 14(6):1133, 2022. [doi]
- Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question AnsweringYun Liu, Xiaoming Zhang 0001, Feiran Huang, Bo Zhang, Zhoujun Li 0001. TIP, 31:1684-1696, 2022. [doi]
- MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question AnsweringDifei Gao, Luowei Zhou, Lei Ji 0001, Linchao Zhu, Yi Yang, Mike Zheng Shou. cvpr 2023: 14773-14783 [doi]
- Video question answering via multi-granularity temporal attention network learningShaoning Xiao, Yimeng Li, Yunan Ye, Zhou Zhao, Jun Xiao 0001, Fei Wu 0001, Jiang Zhu, Yueting Zhuang. icimcs 2018: [doi]
- Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question AnsweringShaoning Xiao, Yimeng Li, Yunan Ye, Long Chen 0016, Shiliang Pu, Zhou Zhao, Jian Shao, Jun Xiao 0001. npl, 52(2):993-1003, 2020. [doi]
- Multi-modal multi-view Bayesian semantic embedding for community question answeringLei Sang, Min Xu 0001, Shengsheng Qian, Xindong Wu 0001. ijon, 334:44-58, 2019. [doi]