Sparse is Enough in Scaling Transformers

Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Lukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva. Sparse is Enough in Scaling Transformers. In Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. pages 9895-9907, 2021. [doi]

@inproceedings{JaszczurCMKGMK21,
  title = {Sparse is Enough in Scaling Transformers},
  author = {Sebastian Jaszczur and Aakanksha Chowdhery and Afroz Mohiuddin and Lukasz Kaiser and Wojciech Gajewski and Henryk Michalewski and Jonni Kanerva},
  year = {2021},
  url = {https://proceedings.neurips.cc/paper/2021/hash/51f15efdd170e6043fa02a74882f0470-Abstract.html},
  researchr = {https://researchr.org/publication/JaszczurCMKGMK21},
  cites = {0},
  citedby = {0},
  pages = {9895-9907},
  booktitle = {Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual},
  editor = {Marc'Aurelio Ranzato and Alina Beygelzimer and Yann N. Dauphin and Percy Liang and Jennifer Wortman Vaughan},
}