Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning

Mandela Patrick, Po-Yao Huang 0001, Ishan Misra, Florian Metze, Andrea Vedaldi, Yuki M. Asano, João F. Henriques. Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 10540-10552, IEEE, 2021. [doi]

Abstract

Abstract is missing.