Everything at Once - Multi-modal Fusion Transformer for Video Retrieval

Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas 0001, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne. Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 19988-19997, IEEE, 2022. [doi]

Abstract

Abstract is missing.