Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

Mingxing Zhang, Yang Yang 0002, Xinghan Chen, Yanli Ji, Xing Xu 0001, Jingjing Li 0001, Heng Tao Shen. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. pages 12669-12678, Computer Vision Foundation / IEEE, 2021. [doi]

Abstract

Abstract is missing.