Vision-Text Cross-Modal Fusion for Accurate Video Captioning

Kaouther Ouenniche, Ruxandra Tapu, Titus B. Zaharia. Vision-Text Cross-Modal Fusion for Accurate Video Captioning. IEEE Access, 11:115477-115492, 2023. [doi]

Abstract

Abstract is missing.