Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani 0001, Neil Houlsby. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [doi]

Abstract

Abstract is missing.