Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Li Yuan 0007, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis E. H. Tay, Jiashi Feng, Shuicheng Yan. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 538-547, IEEE, 2021. [doi]

@inproceedings{0007CWYSJTFY21,
  title = {Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet},
  author = {Li Yuan 0007 and Yunpeng Chen and Tao Wang and Weihao Yu and Yujun Shi and Zihang Jiang and Francis E. H. Tay and Jiashi Feng and Shuicheng Yan},
  year = {2021},
  doi = {10.1109/ICCV48922.2021.00060},
  url = {https://doi.org/10.1109/ICCV48922.2021.00060},
  researchr = {https://researchr.org/publication/0007CWYSJTFY21},
  cites = {0},
  citedby = {0},
  pages = {538-547},
  booktitle = {2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021},
  publisher = {IEEE},
  isbn = {978-1-6654-2812-5},
}