The following publications are possibly variants of this publication:
- MIVCN: Multimodal interaction video captioning network based on semantic association graphYing Wang, Guoheng Huang, Lin Yuming, Haoliang Yuan, Chi-Man Pun, Wing-kuen Ling, Lianglun Cheng. apin, 52(5):5241-5260, 2022. [doi]
- Research on Feature Extraction and Multimodal Fusion of Video Caption Based on Deep LearningHongjun Chen, Hengyi Li, Xueqin Wu. icmss 2020: 73-76 [doi]
- Multimodal Pretraining for Dense Video CaptioningGabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut. ijcnlp 2020: 470-490 [doi]