Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Sambit Sahu, Stephen Rawls, Supriyo Chakraborty, Tom Goldstein. Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts. In Mehdi Rezagholizadeh, Peyman Passban, Soheila Samiee, Vahid Partovi Nia, Yu Cheng, Yue Deng, Qun Liu, Boxing Chen, editors, NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada. Volume 262 of Proceedings of Machine Learning Research, pages 81-101, PMLR, 2024. [doi]
Abstract is missing.