On the Benefits of Learning to Route in Mixture-of-Experts Models

Nishanth Dikkala, Nikhil Ghosh, Raghu Meka, Rina Panigrahy, Nikhil Vyas 0001, Xin Wang 0016. On the Benefits of Learning to Route in Mixture-of-Experts Models. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. pages 9376-9396, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.