Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Rui Liu 0013, Young-Jin Kim, Alexandre Muzio, Hany Hassan. Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu 0001, Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Volume 162 of Proceedings of Machine Learning Research, pages 13782-13792, PMLR, 2022. [doi]

Authors

Rui Liu 0013

This author has not been identified. Look up 'Rui Liu 0013' in Google

Young-Jin Kim

This author has not been identified. Look up 'Young-Jin Kim' in Google

Alexandre Muzio

This author has not been identified. Look up 'Alexandre Muzio' in Google

Hany Hassan

This author has not been identified. Look up 'Hany Hassan' in Google