Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Li Yuan 0007, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis E. H. Tay, Jiashi Feng, Shuicheng Yan. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 538-547, IEEE, 2021. [doi]

Abstract

Abstract is missing.