Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani 0001, Neil Houlsby. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [doi]

Authors

Aran Komatsuzaki

This author has not been identified. Look up 'Aran Komatsuzaki' in Google

Joan Puigcerver

This author has not been identified. Look up 'Joan Puigcerver' in Google

James Lee-Thorp

This author has not been identified. Look up 'James Lee-Thorp' in Google

Carlos Riquelme Ruiz

This author has not been identified. Look up 'Carlos Riquelme Ruiz' in Google

Basil Mustafa

This author has not been identified. Look up 'Basil Mustafa' in Google

Joshua Ainslie

This author has not been identified. Look up 'Joshua Ainslie' in Google

Yi Tay

This author has not been identified. Look up 'Yi Tay' in Google

Mostafa Dehghani 0001

This author has not been identified. Look up 'Mostafa Dehghani 0001' in Google

Neil Houlsby

This author has not been identified. Look up 'Neil Houlsby' in Google