MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering

Mobeen Ahmad, Geonwoo Park, Dongchan Park, Sanguk Park. MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering. In IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Workshops, Paris, France, October 2-6, 2023. pages 4659-4664, IEEE, 2023. [doi]

Abstract

Abstract is missing.