Focus and Align: Learning Tube Tokens for Video-Language Pre-Training

Yongqing Zhu, Xiangyang Li 0002, Mao Zheng, Jiahao Yang, Zihan Wang, Xiaoqian Guo, Zifeng Chai, Yuchen Yuan, Shuqiang Jiang. Focus and Align: Learning Tube Tokens for Video-Language Pre-Training. IEEE Transactions on Multimedia, 25:8036-8050, 2023. [doi]

Abstract

Abstract is missing.