Scaling Vision Transformers to 22 Billion Parameters

Mostafa Dehghani 0001, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Peter Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang 0038, Carlos Riquelme Ruiz, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin Fathy Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Collier, Alexey A. Gritsenko, Vighnesh Birodkar, Cristina Nader Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov 0003, Filip Pavetic, Dustin Tran, Thomas Kipf, Mario Lucic, Xiaohua Zhai, Daniel Keysers, Jeremiah J. Harmsen, Neil Houlsby. Scaling Vision Transformers to 22 Billion Parameters. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 7480-7512, PMLR, 2023. [doi]

@inproceedings{0001DMPHGSCGAJB23,
  title = {Scaling Vision Transformers to 22 Billion Parameters},
  author = {Mostafa Dehghani 0001 and Josip Djolonga and Basil Mustafa and Piotr Padlewski and Jonathan Heek and Justin Gilmer and Andreas Peter Steiner and Mathilde Caron and Robert Geirhos and Ibrahim Alabdulmohsin and Rodolphe Jenatton and Lucas Beyer and Michael Tschannen and Anurag Arnab and Xiao Wang 0038 and Carlos Riquelme Ruiz and Matthias Minderer and Joan Puigcerver and Utku Evci and Manoj Kumar and Sjoerd van Steenkiste and Gamaleldin Fathy Elsayed and Aravindh Mahendran and Fisher Yu and Avital Oliver and Fantine Huot and Jasmijn Bastings and Mark Collier and Alexey A. Gritsenko and Vighnesh Birodkar and Cristina Nader Vasconcelos and Yi Tay and Thomas Mensink and Alexander Kolesnikov 0003 and Filip Pavetic and Dustin Tran and Thomas Kipf and Mario Lucic and Xiaohua Zhai and Daniel Keysers and Jeremiah J. Harmsen and Neil Houlsby},
  year = {2023},
  url = {https://proceedings.mlr.press/v202/dehghani23a.html},
  researchr = {https://researchr.org/publication/0001DMPHGSCGAJB23},
  cites = {0},
  citedby = {0},
  pages = {7480-7512},
  booktitle = {International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA},
  editor = {Andreas Krause 0001 and Emma Brunskill and KyungHyun Cho and Barbara Engelhardt and Sivan Sabato and Jonathan Scarlett},
  volume = {202},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}