The following publications are possibly variants of this publication:
- Multimodal architecture for video captioning with memory networks and an attention mechanismWei Li, Dashan Guo, Xiangzhong Fang. prl, 105:23-29, 2018. [doi]
- MIVCN: Multimodal interaction video captioning network based on semantic association graphYing Wang, Guoheng Huang, Lin Yuming, Haoliang Yuan, Chi-Man Pun, Wing-kuen Ling, Lianglun Cheng. apin, 52(5):5241-5260, 2022. [doi]
- Hierarchical attention-based multimodal fusion for video captioningChunlei Wu, Yiwei Wei, Xiaoliang Chu, Weichen Sun, Fei Su, Leiquan Wang. ijon, 315:362-370, 2018. [doi]