Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang 0001, Jianfeng Gao. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 2978-2988, IEEE, 2021. [doi]

Authors

Pengchuan Zhang

This author has not been identified. Look up 'Pengchuan Zhang' in Google

Xiyang Dai

This author has not been identified. Look up 'Xiyang Dai' in Google

Jianwei Yang

This author has not been identified. Look up 'Jianwei Yang' in Google

Bin Xiao

This author has not been identified. Look up 'Bin Xiao' in Google

Lu Yuan

This author has not been identified. Look up 'Lu Yuan' in Google

Lei Zhang 0001

This author has not been identified. Look up 'Lei Zhang 0001' in Google

Jianfeng Gao

This author has not been identified. Look up 'Jianfeng Gao' in Google