Sparse is Enough in Scaling Transformers

Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Lukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva. Sparse is Enough in Scaling Transformers. In Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. pages 9895-9907, 2021. [doi]

Authors

Sebastian Jaszczur

This author has not been identified. Look up 'Sebastian Jaszczur' in Google

Aakanksha Chowdhery

This author has not been identified. Look up 'Aakanksha Chowdhery' in Google

Afroz Mohiuddin

This author has not been identified. Look up 'Afroz Mohiuddin' in Google

Lukasz Kaiser

This author has not been identified. Look up 'Lukasz Kaiser' in Google

Wojciech Gajewski

This author has not been identified. Look up 'Wojciech Gajewski' in Google

Henryk Michalewski

This author has not been identified. Look up 'Henryk Michalewski' in Google

Jonni Kanerva

This author has not been identified. Look up 'Jonni Kanerva' in Google