Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts

Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Sambit Sahu, Stephen Rawls, Supriyo Chakraborty, Tom Goldstein. Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts. In Mehdi Rezagholizadeh, Peyman Passban, Soheila Samiee, Vahid Partovi Nia, Yu Cheng, Yue Deng, Qun Liu, Boxing Chen, editors, NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada. Volume 262 of Proceedings of Machine Learning Research, pages 81-101, PMLR, 2024. [doi]

Authors

Ashwinee Panda

This author has not been identified. Look up 'Ashwinee Panda' in Google

Vatsal Baherwani

This author has not been identified. Look up 'Vatsal Baherwani' in Google

Zain Sarwar

This author has not been identified. Look up 'Zain Sarwar' in Google

Benjamin Thérien

This author has not been identified. Look up 'Benjamin Thérien' in Google

Sambit Sahu

This author has not been identified. Look up 'Sambit Sahu' in Google

Stephen Rawls

This author has not been identified. Look up 'Stephen Rawls' in Google

Supriyo Chakraborty

This author has not been identified. Look up 'Supriyo Chakraborty' in Google

Tom Goldstein

This author has not been identified. Look up 'Tom Goldstein' in Google