Language-guided Multi-Modal Fusion for Video Action Recognition

Jenhao Hsiao, Yikang Li, Chiuman Ho. Language-guided Multi-Modal Fusion for Video Action Recognition. In IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021, Montreal, BC, Canada, October 11-17, 2021. pages 3151-3155, IEEE, 2021. [doi]

Abstract

Abstract is missing.