STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding

Rui Su, Qian Yu, Dong Xu 0001. STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 1513-1522, IEEE, 2021. [doi]

Authors

Rui Su

This author has not been identified. Look up 'Rui Su' in Google

Qian Yu

This author has not been identified. Look up 'Qian Yu' in Google

Dong Xu 0001

This author has not been identified. Look up 'Dong Xu 0001' in Google