Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang 0001, Jianfeng Gao. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 2978-2988, IEEE, 2021. [doi]

Abstract

Abstract is missing.