Multimodal Analysis for Deep Video Understanding with Video Language Transformer

Beibei Zhang, Yaqun Fang, Tongwei Ren, Gangshan Wu. Multimodal Analysis for Deep Video Understanding with Video Language Transformer. In João Magalhães, Alberto Del Bimbo, Shin'ichi Satoh 0001, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, Laura Toni, editors, MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022. pages 7165-7169, ACM, 2022. [doi]

Abstract

Abstract is missing.